ARTICLE IN PRESS
Journal of Econometrics 132 (2006) 461–489 www.elsevier.com/locate/jeconom
Nonresponse in dynamic panel data models Cheti Nicoletti ISER, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK Available online 24 March 2005
Abstract To verify whether data are missing at random (MAR) we need to observe the missing data. There are only two exceptions: when the relationship between the probability of responding and the missing variables is either imposed by introducing untestable assumptions or recovered using additional data sources. In this paper, we briefly review the estimation and test procedures for selectivity in panel data. Furthermore, by extending the MAR definition from a static setting to the case of dynamic panel data models, we prove that some tests for selectivity are not verifying the MAR condition. r 2005 Elsevier B.V. All rights reserved. JEL classification: C23; C24; C33; C34 Keywords: Dynamic panel model; Attrition; Nonresponse; Missing at random; Missing completely at random; Statistical model reduction
1. Introduction Statistical inference which ignores missing data may be affected by selection bias when response probabilities depend on observed and/or unobserved variables. If the response probability is independent of the unobserved variables given a set of observed ones, then we say that the data are missing at random (MAR).1 Tel.: +44 1206 873 536; fax: +44 1206 873 151.
E-mail address:
[email protected]. The MAR condition is equivalent to the weak unconfoundness, ignorability or conditional independence assumptions (CIA) for treatment assignment (or program participation). 1
0304-4076/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2005.02.008
ARTICLE IN PRESS 462
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
In that case there is a possible bias due to sample selection on observables (Heckman and Hotz, 1989). Selection on observables can be eliminated by adequately controlling for the observed variables. If, however, the data are not MAR, a consistent statistical inference must take account of selection on unobservables too. It is therefore important to know if the MAR condition is true. The main claim of this paper is that the MAR condition can be verified only when the missing data are observed. Observing missing data is obviously a contradiction. Nevertheless, there are two exceptions to this claim: (i) when additional information is available to recover the distribution of variables affected by nonresponse, (ii) when some untestable assumptions are imposed on the relationship between the missing variables and the probability of responding. Even if the above claim seems obvious, there are some tests which claim to verify the MAR condition in multivariate data and which cannot be included into the above two exceptions. These are the tests presented in Little (1988) and Diggle (1989). Furthermore, some other tests have been proposed to check for selection bias in panel data, without explicitly saying which type of selection they are checking for. These are the tests of Park and Davis (1993) and the variable addition tests and the quasi-Hausman tests described in Verbeek and Nijman (1992b). Since the last two types of tests are easy to be implemented, they are extensively used in panel data applications to investigate the sample selection problem with the persuasion to check to some extent the MAR condition. Following the above claim these tests are not verifying the MAR condition, they are instead checking for possible bias due to sample selection on observables. To prove formally this intuitive conclusion and to explicitly define the null hypothesis of the above tests, we extend the MAR concept from a static setting, as introduced by Rubin (1976), to the case of dynamic panel data models.2 This extension requires the introduction of some concepts used in the theory of model reduction and defined in Florens et al. (1980), Engle et al. (1983) and Florens and Mouchart (1985).3 Like most applied econometric work on panel data we focus on dynamic models and attrition.4 Attrition occurs when individuals5 participate in a panel survey for one or more consecutive waves but then drop out the panel without re-entering it, although they remain eligible panel members. Attrition is usually the most common type of (non)response pattern in household panel surveys. This is in part due to the behavior of people, who, when they decide
2
Dynamic models are models where the dependent variable is explained by its past and/or the present and past values of other variables. 3 Two more recent textbooks exposing systematically the theory of reduction are Florens et al. (1990), who especially focus attention on the Bayesian approach, and Hendry (1995), who focuses on the classical statistical approach as we do in this paper. 4 See for example the papers in the special issue on ‘‘Attrition in Longitudinal Surveys’’ published by The Journal of Human Resources in 1998. 5 We call either individuals or units the sample units.
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
463
not to cooperate in a wave, stick to their decision for the following waves, and it is in part due to following rules adopted by panel surveys.6 Focusing on attrition it is possible to explain the response probability at individual level in a specific wave using individual variables observed in previous waves. Moreover, focusing on dynamic models explaining current variables using lagged variables allows the nonresponse to be narrowed to a problem of missing on dependent variables. More in general, we consider datasets for which a set of variables x is always observable and a variable y is affected by nonresponse. Furthermore, we assume to be interested in a marginal or conditional model for y given x. In other words, we do not consider missing on explanatory variables and we focus attention on missing on dependent variables. The rest of the paper is organized in three parts. Section 2 reviews briefly different estimation and test methods for sample selection, emphasizing the underlying untestable assumptions. Section 3 gives a formal definition of MAR for crosssectional and panel data. In Section 4 we emphasize the limits of some tests for selectivity in panel data. Finally, some conclusions are drawn in Section 5.
2. Brief review of estimation and test methods for sample selection Different estimation approaches have been proposed to take account of nonresponse in model estimation. We can classify them into five broad categories: 1. propensity score methods, whose theoretical foundations were introduced by the statisticians Rosenbaum and Rubin (1983) for the evaluation of treatment effects; 2. imputation methods, which are used by survey statisticians to solve the nonresponse problem in sample surveys; 3. econometric sample selection correction methods, adopted by econometricians since the seminal paper of Heckman (1979); 4. methods using external data sources such as population registers and refreshment samples; 5. partial identification methods introduced by Manski (1989). The first three types of estimation method are based on assumptions whose validity can only be verified when the missing data are observed. Propensity score and imputation methods are based on the assumption that the data are MAR. On the other hand, econometric sample selection correction methods relax the MAR condition by allowing the sample selection rule to depend on both observed and unobserved data. 6
For instance, the Panel Study on Income Dynamics (which is a household panel survey carried out in USA) did not contact households refusing to participate in one year in the following years until 1990. In the more recent equivalent survey for European Union countries, the European Community Household Panel, people are not followed after a final refusal or two consecutive wave refusals.
ARTICLE IN PRESS 464
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
When the set of observed variables is small and inadequate to describe the selection mechanism,7 it seems reasonable to reject the MAR condition and to prefer the econometric sample selection correction methods to other types of estimations. Nevertheless, sample selection correction methods relaxing the MAR condition impose some other untestable assumptions. In particular, parametric sample selection correction methods impose a specific joint distribution for the errors in the model of interest and in the selection process, whereas semiparametric sample selection correction methods impose the separability condition. The separability condition allows controlling for the selection bias in a regression model by including an additive correction term, which is an unknown function of the propensity score. The assumptions on the joint distribution and the separability condition are not in general nested within the assumptions imposed by the propensity score and the imputation methods. It is therefore impossible to indicate an order of preference among these estimators. Moreover econometric selection correction methods usually require exclusion restrictions, that is some variables, called instrumental variables, must be relevant for the selection process but irrelevant for the model of interest. In the case of attrition we would suggest, as instrumental variables, characteristics of the data collection process, such as the interviewer workload, the interview mode in the previous wave, the use of the same interviewer across waves and the number of callbacks in the previous wave. These variables affect the response probability but should be irrelevant for models of interest such as models describing socio-economic individual behaviors. Notice that these variables are not usually considered in imputation procedures adopted by panel surveys, which implies an implicit exclusion assumption when estimating regression models using imputed values. Finally, we would like to emphasize that, under MAR and in the absence of instrumental variables, we can always control for sample selection by conditioning on the common set of explanatory variables affecting both the selection process and the model of interest. In other words, in the absence of instrumental variables and under MAR, propensity score and imputation methods are no longer necessary to correct for sample selection. To our knowledge there are only two estimation procedures which are not based on untestable assumptions or at least relax some untestable assumptions without replacing them with new untestable assumptions. The first one consists in using additional information from external data sources, such as population registers, to recover the unknown distribution of missing data; the second one consists in the partial identification approach introduced by Manski (1989). Additional data can be sometimes matched at individual level with data coming form the main source. In that case it is possible to redefine what is observable and unobservable considering jointly the main data source and the additional one, so 7
Henceforth, for brevity, we will call the ‘missing data generating mechanism’ the ‘missing data process’ or ‘selection process’ (or ‘mechanism’), whereas we call the probability of observing a variable of interest for a specific individual either the response probability or the propensity score as in Rosenbaum and Rubin (1983).
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
465
that the missing problem does not exist anymore. However, other times additional data cannot be matched at individual level, but can be used to recover some distributions of the population of interest. Even in this second case, additional data can help in relaxing some untestable assumptions. Hirano et al. (2001), for example, relax some untestable assumptions in panel data model estimation by using refreshment samples. However, discussion of how to use external sources is beyond the scope of this paper. The partial identification approach consists instead of the computation of bounds (henceforth Manski bounds) instead of point estimates for the specific statistics of interest, generally a conditional mean or quantile. Recently the partial identification theory has been extended in: (1) Manski (1995) to causal inference; (2) Horowitz and Manski (1998) to consider missing dependent variables and/or explanatory variables; (3) Manski and Pepper (2000) to narrow the Manski bounds in the evaluation of a treatment effect by considering instrumental, monotone instrumental variables or monotone treatment; (4) Vazquez Alvarez et al. (1999, 2001) to estimate income quantiles in presence of item nonresponse, narrowing the Manski bounds by using unfolding bracket information; and in (5) Nicoletti (2003) to compare different estimation procedures for the poverty analysis in presence of item and unit nonresponse, using partial information on income to narrow the Manski bounds. For a recent and thorough presentation of partial identification theory we refer to Manski (2003). In the following we briefly review estimation methods which take account of nonresponse but impose untestable assumptions. We especially focus attention on panel data model estimation methods taking account of attrition. Section 2.1 describes estimation methods which impose the MAR condition and take account of selection only on observables. In particular we describe propensity score methods, which have been recently applied to take account of nonresponse in model estimation. Furthermore, we present the idea behind imputation methods. We then briefly review econometric sample selection correction methods in Section 2.2. Finally, we discuss when it is possible to verify selectivity, that is when it is possible to verify whether selection bias, in the estimation of a model, is irrelevant.
2.1. Estimation methods imposing the MAR condition Propensity score methods have recently been applied by, among others, Heckman et al. (1997), Dehejia and Wahba (1999, 2002) and Lechner (1999) and have been extended to the multi-valued treatment case by Imbens (1999). Assuming MAR, the problem of selection on unobservables disappears, while the problem of selection on observables remains. Propensity score methods solve it by controlling for the propensity score. There are three types of propensity score estimation methods—matching, stratification and weighting propensity score methods—for the estimation of the marginal mean of a response variable. For the estimation of the conditional models, only the weighting estimation methods have been used.
ARTICLE IN PRESS 466
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
The propensity score weighting estimation method has its roots in the inverse probability weighted estimator proposed by Horvitz and Thompson (1952). Horvitz and Thompson’s estimator allows computation of population means taking account that sample units have different probabilities of belonging to the sample and responding. This idea has recently been reconsidered by Robins and Rotnitzky (1995), Robins et al. (1995), Abowd et al. (2001) and Inkmann (2001) for the estimation of conditional means in the presence of missing data for multivariate or panel data and by Imbens (1999) and Hirano et al. (2003) for the evaluation of treatment effects. They consider a parametric conditional model and generalized method of moment (GMM) estimation to compute its parameters. This GMM estimator is assumed to be consistent in the absence of missing data. Then, it is easy to prove that, when data are MAR, the GMM estimator with weights given by the inverse propensity score is still consistent on the subsample of responding units.8 The basic idea behind imputation methods is to replace the missing values of a variable y, for which we want to estimate the mean, with values computed using a set of variables, say w, observed for all units in the sample. Imputation methods are based on the MAR assumption. In this specific case, MAR is equivalent to the independence of the response probability of y conditioning on w. Rosenbaum and Rubin (1983) show that, when the MAR condition is valid, then the probability of responding is independent of the unobserved variables given the propensity score. A reasonable imputation procedure then consists in matching each nonrespondent to a respondent with a close propensity score. This type of imputation reduces to the propensity score matching estimation method. In the case of panel data, imputation can use information on lagged variables for individuals responding in last wave but not responding in the current wave. This type of imputation is called longitudinal (see Rubin (1989) and Rubin (1996) for further details on imputation). Under the above definition of MAR, imputation methods which control for w or for the propensity score, say pðwÞ; to estimate the marginal or conditional mean of y, are consistent. The same result also holds for propensity score estimation methods. Nevertheless, if we want to estimate the mean of y conditioning on a set of variables not included in w, then consistency is not in general satisfied but when these additional variables are irrelevant for the missing data process. 2.2. Econometric sample selection correction methods As already noted, econometric sample selection correction methods relax the MAR assumption, but impose new untestable assumptions. Econometric sample selection correction methods can be divided into parametric and semiparametric methods. The basic idea behind parametric correction methods is to specify jointly the model of interest together with the selection mechanism, allowing the errors in the 8
For a proof of consistency see Imbens (1999); for a detailed presentation of the inverse probability weighting theory see Wooldridge (2002).
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
467
two models to be correlated. In other words parametric correction methods do not impose the MAR assumption, but specify a joint model for the dependent variable and the dummy indicating selection given a set of explanatory variables. This approach has been criticized for the restrictive assumptions on the joint distribution of the errors, which are untestable just as MAR is. In other words the parametric sample selection correction approach relaxes one untestable assumption by replacing it by another untestable assumption. The choice between either accepting the MAR condition or imposing a joint distributional assumption is not easy. Any decision is to some extent arbitrary and cannot be submitted to a test procedure. Semiparametric selection correction methods relax the joint distributional assumption by replacing it with the additive separability condition. This last condition is still an untestable assumption, but it is in general weaker than the underlying assumptions of the parametric sample selection correction approach.9 The additive separability condition allows correction for missing data by adding to the right-hand side of the regression model of interest an additive correction term, which is an unknown function of the propensity score. To avoid assumptions on this unknown function, different semiparametric methods have been proposed. These methods consist of two steps: estimating nonparametrically or semiparametrically the propensity score, say pðwÞ; estimating the regression model of interest for the subsample of responding individuals by controlling for the additive bias correction term which is an unknown function of the propensity score. The semiparametric sample selection correction methods, such the ones described in Robinson (1988), Ahn and Powell (1993) and Cosslett (1991), are similar to the twostep Heckman (1979) estimation, except for the fact that they do not impose a functional form on the correction term. For a review of the two-step estimation procedures taking account of sample selection we refer to Vella (1998). Unfortunately, semiparametric estimation does not allow identifying the intercept of the equation of interest.10 Moreover, semiparametric methods require instrumental variable exclusion restrictions, which are not theoretically required by the parametric methods. Parameter identification in parametric methods is ensured through functional form assumptions, but empirically it is suggested to have some exclusion restrictions. Extensions of the Heckman procedure to panel models with individual effects have been proposed by Hausman and Wise (1979), Ridder (1990), Verbeek and Nijman (1992b), Wooldridge (1995), Kyriazidou (1997) and Vella and Verbeek (1999). Verbeek and Nijman (1992a) compare different methods of testing for sample selection due to attrition in panel data model by a Monte Carlo simulation exercise, while Jensen et al. (2002) compare different estimation methods. 9
For a proof see Angrist (1997). For the identification of the intercept see Heckman (1990) and Andrews and Schafgans (1998).
10
ARTICLE IN PRESS 468
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
2.3. Is it possible to verify selectivity? Tests to verify selectivity are supposed to check whether the sample selection bias in the estimation of a model is irrelevant. We can distinguish between two types of sample selection, selection on observables and selection on unobservables, consequently two types of bias and two types of test. Imputation and propensity score methods take account only of selection on observables and they impose the MAR condition. They then allow verification of whether the selection process is independent of observed variables (selection on observables) under the MAR condition. On the other hand, the econometric sample selection correction methods take account of both selection on observables and unobservables and allow verification of the MAR condition (selection on unobservable given the observables). However they impose some other sort of untestable assumptions, in particular assumptions on the relationship between the dependent variable of interest and the probability of responding. Consequently, testing selectivity is always conditional on the validity of some untestable assumptions. Nevertheless, there are some tests proposed to verify the MAR condition in multivariate data. A ‘‘[t]est of missing completely at random for multivariate incomplete data’’ is proposed by Little (1988) and a procedure for ‘‘[t]esting for random dropouts in repeated measurement data’’ is proposed by Diggle (1989). Those two tests do not impose any assumption on the relationship between the dependent variable of interest and the probability of responding and do not use any additional information to recover the distribution of the unobserved variables. Therefore a reasonable conclusion is that those tests are not verifying the MAR condition. They are instead verifying selection on observables. To prove formally this intuitive conclusion and to explicitly define the null hypothesis of the above tests, we extend in the next section the MAR concept from a static setting, as introduced by Rubin (1976), to the case of panel data models. The extension of the MAR concept helps also defining properly the null hypotheses verified by some other tests for selectivity. In particular, we emphasize that the tests of Park and Davis (1993), the variable addition tests and the quasiHausman tests described in Verbeek and Nijman (1992b) are not checking MAR. They are instead checking a possible bias due to selection on observables or correlation between the random effects in the model of interest and in the selection process.
3. Definitions of MAR and MCAR In this part of the paper, after some preliminary definitions and general notation given in Section 3.1, we define the conditions of MAR and missing completely at random (MCAR). These conditions must be properly redefined for different types of models. For this reason, separate sections cover the definitions of MAR and MCAR for different types of model: Section 3.2 for marginal models; Section 3.3, for
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
469
conditional models; Section 3.4 for dynamic panel models with general response patterns; Section 3.5 for dynamic panel models with attrition; and Section 3.6 for dynamic panel models with explanatory variables. Furthermore, for the multivariate data case, we examine the definitions given by Robins and various co-authors in Section 3.7. Finally, in Section 3.8, we conclude by describing some further possible extensions of the MAR and MCAR concepts. 3.1. General statement and notation We begin by considering the cross-sectional data case and focus our attention on a model for the variable y, fY; f ðy; yÞ; y 2 Yg; where Y is the sample space, f ðy; yÞ is a family of probability distributions indexed by y; a vector of parameters of interest, and Y is the parameter space. We consider a random sample of N units. Let r be a dummy variable observed for all individuals belonging to the sample and let r be equal to 1 if y is observed and 0 otherwise. y is therefore a latent random variable. We assume that y has a common support for responding and nonresponding units. Sometimes, for convenience, we indicate with yo the latent variable for units which are responding and with ym the latent variable for nonresponding units. The distribution of the latent variable is assumed to be f ðy; yÞ ¼ f ðyo ; yÞ ¼ f ðym ; yÞ; whether it is observed or not. We assume that fY R; f ðy; r; fÞ; f 2 Fg is the joint model for ðy; rÞ: Finally, let fR; f ðr j y; gÞ; g 2 Gg be the selection mechanism or the missing data process, where f ðr j y; gÞ is the probability function for r, conditional on the variable y, and g is a vector of nuisance parameters belonging to the space G: Without loss of generality, since we assume a random sample of N individuals, let the first n units be those with complete observations, while let the last ðN nÞ units be those with missing y. We can define two types of likelihood function that we could use to make inference on a model in the presence of missing data. The first likelihood, say the truncated likelihood, is LT ðyÞ ¼
n Y
f ðyi ; yÞ,
(1)
i¼1
where i is the index for the ith unit in the sample. It ignores the missing data and considers only the truncated sample of the first n units for which y is observed. The second likelihood function, say the likelihood with informative missing data, considers jointly the model of interest and the selection mechanism for all units and ‘‘integrates out’’ the missing variables in the following way: Z n N Y Y LI ðy; gÞ ¼ f ðyi ; yÞf ðri j yi ; gÞ dyi . f ðyi ; yÞf ðri j yi ; gÞ (2) i¼1
i¼nþ1
If instead all variables wereQobserved for all N units, then joint likelihood for ðy; rÞ would be given by Lðy; gÞ ¼ N i¼1 f ðyi ; yÞf ðri j yi ; gÞ: If ðyi ; ri Þ were not identically and
ARTICLE IN PRESS 470
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
independently distributed (i:i:d:) then the joint likelihood would not be equal to the product of N independent factors, one for each unit. In the following, we assume that the data are i:i:d: if not otherwise specified, and we say that the selection mechanism is weakly ignorable if we can make correct and efficient inference based on the likelihood (1), which ignores the selection process. Whereas we say that the selection mechanism is strongly ignorable if any type of inference can be made correctly and efficiently without considering the selection process.11 3.2. Definitions of MAR and MCAR for a marginal model Following Rubin (1976), Little and Rubin (1987) and Heitjan and Rubin (1991), the inference on y can be made ignoring the data mechanism if 1. f ðyi ; ri ; fÞ factorizes in f ðyi ; yÞf ðri j yi ; gÞ; where y and g are variation free,12 2. yi is independent of ri ; yi @ri ; that is f ðri j yi ; gÞ ¼ f ðri ; gÞ: In accordance with the theory of model reduction, we call (1) the statistical cut assumption. Obviously, it is implicitly assumed that the joint model, f ðy; r; fÞ is a reduced model resulting from an admissible reduction of the data generating process (see Engle et al., 1983 for other details). We call instead the condition (2) the MAR condition. Since we are assuming a random sample so that ðyi ; ri Þ are i:i:d:; the MCAR and the MAR conditions are equivalent. Weak and strong ignorability are also equivalent in this specific case and both require conditions (1) and (2). If ðyi ; ri Þ are not i:i:d: then the probability that the ith unit responds may depend on variables of other units. In this case the order of the elements of the vector ðy1 ; . . . ; yN Þ is relevant and we cannot assume without loss of generality that the observed variables are in the first n positions of the vector. For this reason we use the superscripts o and m to distinguish between observed and missing variables, when necessary. ðyo1 ; . . . ; yon Þ and m ðym 1 ; . . . ; yNn Þ are then the subvectors of ðy1 ; . . . ; yN Þ consisting of the n variables for the observed individuals and of the N n variables for the nonresponding individuals. In that case the joint likelihood is given by L ¼ f ðy1 ; . . . ; yN ; r1 ; . . . ; rN ; fÞ; and MAR and MCAR are two different conditions. The likelihood-based inference on y for the joint model f ðy1 ; . . . ; yN ; yÞ then requires the following conditions:
a1: f ðy1 ; . . . ; yN ; r1 ; . . . ; rN ; fÞ ¼ f ðy1 ; . . . ; yN ; yÞf ðr1 ; . . . ; rN j y1 ; . . . ; yN ; gÞ where y and g are variation free; a2: the MAR condition f ðr1 ; . . . ; rN j y1 ; . . . ; yN ; gÞ ¼ f ðr1 ; . . . ; rN j yo1 ; . . . ; yon ; gÞ;that is m o o the independence condition ðr1 ; . . . ; rN Þ@ðym 1 ; . . . ; yNn Þ j ðy1 ; . . . ; yn Þ: 11 The definition of strong ignorability used in this paper coincides with the definition of Verbeek and Nijman (1992b), whilst the definition of (weak) ignorability is not equivalent to their definition. 12 The variation free condition is satisfied if the parameter space for ðy; gÞ is given by the product Y G: In a Bayesian approach the variation free condition is replaced by the a priori independence of the parameters y and g:
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
471
When conditions (a1) and (a2) are satisfied, we say that the missing data mechanism is weakly ignorable or, more briefly, ignorable. Whereas we say the missing data process is strongly ignorable if, in addition to (a1) and (a2), the following condition is satisfied: a3: f ðr1 ; . . . ; rN j yo1 ; . . . ; yon ; gÞ ¼ f ðr1 ; . . . ; rN ; gÞ; which we call the observed at random (OAR) condition. Conditions (a2) and (a3) together constitute the MCAR condition: f ðr1 ; . . . ; rN j y1 ; . . . ; yN ; gÞ ¼ f ðr1 ; . . . ; rN ; gÞ.
(3)
3.3. MAR and MCAR for a conditional model As remarked by Shih (1992), some authors do not explicitly mention the variation-free condition (see condition 1 in Section 3.2). This condition is often implicitly assumed to be valid in the econometric literature; in particular, econometricians usually implicitly assume that the conditional or marginal model of interest is the result of an admissible reduction of the data generating process. In this section, to avoid any misunderstanding, we explicitly state all the conditions necessary to ignore the selection mechanism when the model of interest is a conditional one. Let us assume that we are interested in the conditional model for the variable y, given a set of variables x belonging to the space X; fY; f ðy j x; yÞ; y 2 Yg; where Y is the sample space, f ðy j x; yÞ is a family of conditional probability distributions indexed by the parameter y; and Y is the parameter space. Furthermore, let us assume that the true data generating process is given by the joint model fY X R; f ðy; x; r; fÞ; f 2 Fg: We assume again a random sample. Then, to make consistent likelihood-based inference on the conditional model of interest neglecting the selection process, fR; f ðr j y; x; gÞ; g 2 Gg; the following conditions must be satisfied:
b1: the statistical cuts f ðyi ; xi ; ri ; fÞ ¼ f ðy; ri j xi ; cÞf ðxi ; jÞ; and f ðy; ri j xi ; cÞ ¼ f ðyi j xi ; yÞf ðri j yi ; xi ; gÞ;
b2: the MAR condition, that is the independence of ri from yi ; given xi ; b3: the OAR condition that is the independence of ri from xi : Again, we say that the selection mechanism is weakly ignorable if conditions (b1) and the MAR condition are satisfied, while we say that the selection mechanism is strongly ignorable if condition (b1) and the MCAR are satisfied. The MCAR condition is satisfied if and only if both MAR and OAR conditions hold.
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
472
3.4. MAR and MCAR for a dynamic panel data model In this and the following sections we consider the case of panel data. In particular we assume that a random sample of units is drawn on a single occasion (t ¼ 1), and observed repeatedly over time, say in T consecutive waves (t ¼ 1; . . . ; T). Let us begin by considering a panel model for y without explanatory variables, f ðyi;1 ; . . . yi;T ; yÞ; where yi;t is the variable y observed for the ith unit at the tth wave. We use bold letters to indicate vectors of variables for T consecutive waves, yi ¼ ðyi;1 ; . . . ; yi;T Þ: Subvectors are indicated by yi;t ¼ ðyi;1 ; . . . ; yi;t Þ or by yTi;t ¼ ðyi;t ; . . . ; yi;T Þ: Let ri be the vector associated with the response pattern of the ith unit, that is the vector of the dummies ri;t , taking value 1 when the variable yi;t is observed, and 0 otherwise. Finally let yoi and ym be the subvectors of yi i corresponding to the observed and missing variables. Reordering the vector ðyoi ; ym i Þ is obviously always possible to return to yi : The use of the superscript o and m is a notational simplification to distinguish between observed and unobserved variables yi;t : The definition of weak and strong ignorability can be easily extended to the case of a panel, considering a joint model for y, f ðyi ; yÞ: Condition (1) in Section 3.2 is replaced by a condition of initial cut: c1: f ðyi ; ri ; fÞ ¼ f ðyi ; yÞf ðri j yi ; gÞ: Conditions (2) is replaced by the following equivalent assumptions: o o c2: the MAR condition ri @ym i j yi ; i.e. f ðri j yi ; gÞ ¼ f ðri j yi ; gÞ; o o c3: the OAR condition ri @yi ; i.e. f ðri j yi ; gÞ ¼ f ðri ; gÞ: The variable observed for a unit in wave t is likely to be dependent on its past; that is, the factorisation f ðyi ; ri ; fÞ ¼
T Y
f ðyi;t ri;t ; fÞ
(4)
t¼1
is not valid and we have to use the following sequential factorisation: f ðyi ; ri ; fÞ ¼
T Y
f ðyi;t ; ri;t j yi;t1 ; ri;t1 ; fÞ.
(5)
t¼1
To simplify notation in the sequential models, we implicitly condition on the set of initial conditions. We assume that ðyi;t ; ri;t j yi;t1 ; ri;t1 Þ are i:i:d: across units and time. In this case, a more appropriate model of interest is a dynamic one, in which yi;t is explained by its past. Then it is useful to restate the conditions (c1), (c2) and (c3) in terms of sequential models. Condition (c1) requires that:
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
473
d11: the following sequential cut must be applicable T Y
f ðyi;t ; ri;t j yi;t1 ; ri;t1 ; fÞ
t¼1
¼
T Y
f ðyi;t j yi;t1 ; ri;t1 ; yÞ
t¼1
T Y
f ðri;t j yi;t ; ri;t1 ; gÞ;
t¼1
d12: r does not Granger cause y,13that is, yi;t @ri;t1 j yi;t1 : Conditions (c2) and (c3) require instead that d2: ri;t @yi;t j ri;t1 ; which we call the sequential MCAR condition. The above condition (d2) can be broken down into two parts: o d21: ri;t @ym i;t j ri;t1 ; yi;t ; say the sequential MAR condition; o d22: ri;t @yi;t j ri;t1 ; say the sequential OAR condition. yoi;t and ym i;t are the subvectors selecting missing and observed variables from the vector yi;t : Notice that yoi;t and ym i;t do not necessarily consist of a sequence of variables for consecutive waves, and only after proper reordering of the vector ðyoi;t ; ym i;t Þ we can return to time sequence of variables yi;t : Conditions (d11), (d12) and (d21) ensure that the missing data mechanism is weakly ignorable for maximum likelihood estimation of a dynamic model of interest, f ðyi;t j yi;t1 ; yÞ; while conditions (d11), (d12), (d21) and (d22) ensure strong ignorability in any inference. 3.5. MAR and MCAR conditions in a dynamic panel model with attrition In this section, we present a proposition which gives a set of necessary and sufficient conditions for weak ignorability of the selection mechanism; that is, for conditions (c1) and (c2), in the case of attrition. Proposition 1. Let ðyi;t ; ri;t j yi;t1 ; ri;t1 Þ be i.i.d. across units and time, and let f ðyi ; ri ; fÞ ¼
T Y
f ðyi;t ri;t j yi;t1 ; ri;t1 ; fÞ
(6)
t¼1
be the associated data generating process. Let yi;t be observed when ri;t takes value 1, and missing when ri;t ¼ 0: Let ri;1 ¼ 1 and whenever ri;t ¼ 0; let ri;s ¼ 0 for any s4t: Then, if condition (d12) (r does not Granger cause y) is true, a set of necessary and sufficient conditions for weak ignorability of the selection mechanism is given by:
13
granger noncausality was introduced by Granger (1969).
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
474
d11. the sequential cut T Y
f ðyi;t ; ri;t j yi;t1 ; ri;t1 ; fÞ
t¼1
¼
T Y
f ðyi;t j yi;t1 ; ri;t1 ; yÞ
t¼1
T Y
f ðri;t j yi;t ; ri;t1 ; gÞ;
t¼1
s1: the sequential MAR condition ri;t @yi;t j yi;t1 ; ri;t1 :14 The theorem states that, in the case of dynamic panel data with attrition, the condition y does not Granger cause, r, ri;t @yi;t1 j ri;t1 ; is neither a necessary nor a sufficient condition for the MAR assumption. This Granger noncausality is instead a necessary but not sufficient condition for MCAR. The theorem also proves that the sequential MAR condition is given by (s1), ri;t @yi;t j yi;t1 ; ri;t1 ; in the case of the problem of attrition. In other words, in the case of attrition, conditions (d11), (d12) and (s1) ensure correct likelihood-based inference on the dynamic model of interest, i.e. weak ignorability. It is easy to prove that strong ignorability for a dynamic panel model with attrition requires the sequential MCAR condition, ri;t @yi;t j ri;t1 ; instead of the sequential MAR one. 3.6. MAR and MCAR conditions in a dynamic panel model with explanatory variables The definitions of MAR and MCAR can easily be modified to cover conditional models of the form, f ðyi;t j yi;t1 ; xi;t ; yÞ; where explanatory variables x are added to the dynamic panel model. We implicitly assume that the covariates xi;t are observed for all t and i. In the case of attrition it is sufficient to assume that only lagged x affect yi;t and ri;t : Let f ðyi;t j yi;t1 ; xi;t ; yÞ be the model of interest, let Y ðyi ; ri ; xi ; fÞ ¼ f ðyi;t ; ri;t ; xi;t j yi;t1 ; ri;t1 ; xi;t1 ; fÞ (7) i
be the associated data generating process and let the missing data problem again be narrowed down to the attrition problem; then, it is easy to prove that weak ignorability requires the following conditions: f1: weak exogeneity of x, that is the following sequential cut Y f ðyi;t ; ri;t ; xi;t j yi;t1 ; ri;t1 ; xi;t1 ; fÞ i
¼
Y
f ðyi;t ; ri;t j yi;t1 ; ri;t1 ; xi;t ; f1 Þf ðxi;t j yi;t1 ; ri;t1 ; xi;t1 ; f2 Þ,
i
14
The proof is given in the Appendix A.
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
475
f2: the sequential cut Y f ðyi;t ; ri;t j yi;t1 ; ri;t1 ; xi;t ; f1 Þ i
¼
Y
f ðyi;t j yi;t1 ; ri;t1 ; xi;t ; yÞ
i
Y
f ðri;t j yi;t ; ri;t1 ; xi;t ; gÞ,
i
f3: Granger noncausality yi;t @ri;t1 j yi;t1 ; xi;t , f4: the sequential MAR condition ri;t @yi;t j yi;t1 ; ri;t1 ; xi;t : In the case of a conditional dynamic panel model with general response patterns, weak irrelevance is more stringent: (f4) must be replaced by sequential MAR o condition ri;t @ym i;t j yi;t ; ri;t1 ; xi;t and the following additional condition is required: o f5: xi;t @ym i;t1 j yi;t1 ; ri;t1 ; xi;t1 : Strong ignorability for a conditional dynamic panel model with attrition requires instead the conditions (f1) to (f3), and f6: sequential MCAR ri;t @yi;t ; xi;t j ri;t1 . We emphasize that weak and strong ignorability for the joint model, f ðyi j xi ; yÞ; is not equivalent to weak and strong ignorability for the sequential model, f ðyi;t j yi;t1 ; xi;t ; yÞ: In the former case ignorability requires the following conditions: g1: two initial cuts f ðyi ; ri ; xi ; fÞ ¼ f ðyi ; ri j xi ; f1 Þf ðxi ; f2 Þ, f ðyi ; ri j xi ; f1 Þ ¼ f ðyi j xi ; yÞf ðri j yi ; xi gÞ, o g2: the MAR condition ri @ym i j yi ; xi to ensure weak ignorability, or m o g3: the MCAR condition ri @yi ; yi ; xi to ensure strong ignorability.
The equivalence of ignorability defined for the joint model and that for the sequential model is true only if x is strongly exogenous for the parameters of the dynamic model of interest. We use the definition of strong exogeneity introduced by Engle et al. (1983); that is, ðy; rÞ does not Granger cause x, and x is weakly exogenous for the parameter of interest. Therefore, strong exogeneity of x includes conditions (f1) and (f5). We note that if the model, f ðyi;t j yi;t1 ; xi;t ; yÞ; is used to predict y given the value of x, then we need the strong exogeneity of x. For example, this is the case with causal inference, when a counterfactual response yi;t is predicted conditioning on ðyi;t1 ; xi;t Þ to assess the average effect of a treatment. In this case, ri;t is equal to 1 if a person is treated in time period t, and 0 otherwise. In causal inference, we should be aware that any conditioning variable, x, should be strongly exogenous. In other
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
476
words, the Granger noncausality condition, f ðxi;t j yi;t1 ; ri;t1 ; xi;t1 ; f2 Þ ¼ f ðxi;t j xi;t1 ; f2 Þ
(8)
must be satisfied. 3.7. The MAR condition according to Robins et al. (1995) Robins and several different co-authors (Robins et al., 1995; Gill and Robins, 1997; Gill et al., 1997; Robins and Gill, 1997) have given definitions of MAR and MCAR for multivariate data. The definitions of MAR for attrition in Robins and Gill (1997) and Robins et al. (1995) are both equivalent to the sequential MAR definition given in Section 3.5 for the attrition case. The k-sequential coarsening at random (denoted briefly by ‘‘k-sequential CAR’’) definition, given by Gill and Robins (1997) and adapted for the attrition case, is again equivalent to sequential MAR. In the Appendix B, we prove this claim and we present the definitions of a k-sequential coarsening and of k-sequential CAR given by Gill and Robins (1997). We note that these definitions are not sufficient to ensure correct likelihood-based inference on the parameters of the conditional model, f ðyi;t j yi;t1 ; yÞ: Two additional conditions are necessary: the sequential cut (d11) and Granger noncausality (d12). Moreover, we emphasize that the above MAR conditions defined for sequential models, which we call sequential MAR conditions, and the MAR condition for joint models, which consider jointly T observations, are not equivalent. As a matter of fact, Robins and Gill (1997) find examples in which the sequential MAR condition does not ensure the MAR condition for the joint model. In borrowing from model reduction theory, it is possible to define conditions such that the sequential MAR condition is equivalent to the MAR condition for the joint model defined for T consecutive periods. What is missing in the work of Robins et al. (1995) is that the MAR condition for the joint model is not enough to ensure the weak ignorability condition; indeed, the initial cut in (c1) must also be satisfied. In terms of conditions on the sequential models, the initial cut is satisfied if and only if the sequential cut (d11) and the Granger noncausality (d12) are satisfied (see Engle et al., 1983). Model reduction theory allows us to prove that when the initial cut (c1) is satisfied (or the sequential cut in d11 and Granger noncausality in d12) are satisfied, then sequential MAR and MAR for the joint model are equivalent. In presence of explanatory variables x the equivalence requires that x be strictly exogenous. When the response pattern is not monotone,15 Robins et al. (1995) show that the sequential MAR for attrition is not sufficient. In addition to ri;t @yi;t j yi;t1 ; ri;t1 it is necessary to have the following additional condition:16 T o Prðri;t ¼ 0 j yoi;t1 ; ym i;t1 ; ri;t1 ; yi;tþ1 Þ ¼ Prðri;t ¼ 0 j yi;t1 ; ri;t1 Þ.
(9)
15 Robins and Gill (1997) define monotone missing data as we define attrition. Nonmonotone response patterns occur instead when ri;t ¼ 0 does not exclude that ri;s ¼ 1 for s4t: 16 Pr means probability.
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
477
We emphasize that the above additional condition can be rewritten as the following two conditions: 1. ri;t @yTi;tþ1 j yi;t1 ; ri;t1 ; or equivalently, yi;t @ri;t1 j yi;t1 ; o 2. ri;t @ym i;t1 j yi;t1 ; ri;t1 : Condition (1) is the Granger noncausality condition (d12) in Section 3.4, which is a necessary condition to ensure weak ignorability, even in the case of monotone response. Condition (2), together with ri;t @yi;t j yi;t1 ; ri;t1 ; is equivalent to the sequential MAR condition given in Section 3.4 for a nonmonotone response pattern. In conclusion, the sequential MAR given by Robins et al. (1995) for nonmonotone response patterns is equivalent to our definition. 3.8. Further extensions of the MAR and MCAR conditions The concepts of Granger causality, sequential cut, and strong and weak exogeneity are meaningful in time series analysis. In the previous sections, we have shown that these concepts are very useful for panel data too, which can be viewed as a set of time series. In particular, we have shown their usefulness in extending the definitions of MAR and MCAR from cross-sectional data to panel data. By analogy, the same extension applies to the definition of coarsening at random, given in Heitjan and Rubin (1991). A similar extension is also useful for causal inference when treatments or risk exposures, the effects of which are to be evaluated, are time varying. Moreover, this last extension helps in disentangling some of the misunderstandings between Holland and Granger (Holland, 1986). Holland’s (1986) attempt to apply the definition of Granger causality to causal inference is misleading because he considers effects of a treatment lasting for a single period. Granger causality is only meaningful when there are repeated observations across time and when attention is focused on a sequential model, conditioning on past information (Granger, 1986). As noted in Holland’s (1986) reply to Granger, the Rubin’s model (see Rubin, 1974) may be extended from cross-sectional data to situations in which there are time series data for each unit or so-called panel or longitudinal data. As Holland (1986) remarks, in the 1980s, there were no applications of causal inference to longitudinal data, but now there are numerous examples of such studies (see, for example, Robins et al., 1999). In these applications, the Granger causality concept helps clarify which conditions are necessary for correct causal inference, as well as the difference between the causal concepts developed by Granger and Rubin.
4. Limits of some tests for MAR and MCAR in longitudinal data Both the MAR and MCAR conditions require the selection mechanism to be independent of unobserved variables given observed variables. Clearly it is hard to verify dependence on unobserved variables, whose values are unknown. We have
ARTICLE IN PRESS 478
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
already emphasized that this is possible only by imposing untestable assumptions or by using additional information to recover the distribution of the missing variables. In this section, we outline the limitations of the variable addition tests and the quasi-Hausman tests often used in panel data analysis to check selectivity. We also emphasize the limits of some other test procedures proposed by Diggle (1989), Little (1988) and Park and Davis (1993). These procedures are only able partially to detect the MCAR condition, and they cannot check MAR at all. These procedures verify the dependence of the selection mechanism on some observed variables, given the MAR condition. 4.1. Limits of the variable addition and quasi-Hausman tests A type of test that has been suggested to verify the relevance of the selection mechanism is the variable addition test. This is a simple test that verifies the significance of variables associated with the (non)response patterns ri in the regression model of interest. These variables are added to the regression model of interest as explanatory variables. If these added variables are not significant, then the selection mechanism is considered ignorable. Notice that one should be careful in choosing these additional variables. In particular, when considering a panel affected only by attrition, it is useless to add ri;t1 to a regression equation expressed at time t including a constant, since ri;t1 always P takes value 1. Furthermore, if there are time effects, it is also inappropriate to use Tt¼1 ri;t : In the specific case of attrition, it is impossible to use a variable addition test to verify the sequential MAR condition yi;t @ri;t j yi;t1 ; ri;t1 ; xi;t : This is because we have information on yi;t only when ri;t ¼ 1: We are only able to verify if yi;t @rTi;tþ1 j xi;t ; ri;t ; yi;t1 ; that is, if ri;t @yi;t1 j xi;t ; ri;t ; which is not sufficient to ensure the MCAR and MAR conditions. On the other hand, when the panel is characterized by both monotone and nonmonotone response patterns, then we cannot verify the sequential MAR o condition ri;t @ym i;t j yi;t ; ri;t1 ; xi;t : This is because we have information on yi;t only when ri;t ¼ 1: Verbeek and Nijman (1992a) presented the results of a Monte-Carlo analysis of the properties of variable addition tests and found that in some cases, variable addition tests PT have no QTpower. In particular, they found that each of the following variables, r ; t¼1 i;t t¼1 ri;t and ri;t1 ; added to equation of interest were not significant when they used the following model of interest and missing data mechanism for the simulation experiment: yi;t ¼ b0 xi;t b1 þ ai þ i;t ,
(10)
Prðri;t ¼ 1Þ ¼ Prðri;t 40Þ ¼ Prðg0 þ g1 xi;t þ mi þ ui;t 40Þ,
(11)
where b0 and b1 are the parameters of the model of interest; g0 and g1 are the parameter of the selection process; ri;t is a latent variable measuring the propensity to respond; i;t and ui;t are error terms i:i:d: Gaussian with zero means, variances s2 and
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
479
s2u and covariance s;u ; ai and mi are random effects i:i:d: with zero means, variances s2a and s2m and covariance sa;m ; and s2m þ s2u ¼ 1: In the following we prove that the additional variable tests proposed by Verbeek and Nijman (1992a) do not test selection on unobservables (correlation between the error terms), but only selection on observables. Notice that Verbeek and Nijman (1992a) do not allow for a severe selection bias caused by the correlation between random effects. In the reference experiment in Verbeek and Nijman (1992a), the correlation between a and m is 0.5, but the importance of the random effects in both equations is low: the ratios s2a =ðs2a þ s2 Þ and s2m =ðs2m þ s2u Þ are 0.1, so that the resulting selection bias due to the correlation between random effects is not severe. If either the correlation between random effects is 0 or the random effects are irrelevant, then the following independence conditions hold: yi;t @ri;t1 j yi;t1 ; xi;t
and
ri;t @yi;t1 j ri;t1 ; xi;t
(12)
(that is, yi;t @rTi;tþ1 j yi;t1 ; xi;t ). Consequently, Eq. (10) is not affected by ri;t1 and rTi;tþ1 ; but only by ri;t : Obviously, the dependence between yi;t and ri;t cannot be verified because we observe yi;t only when ri;t ¼ 1: The authors carried a similar simulation exercise for a quasi-Hausman test of whether the model coefficients for the balanced and unbalanced panels17 are equal. They found that its power is higher but still unsatisfactory. This is again because, ignoring the random effects because of their little importance, yi;t @ri;t1 j yi;t1 ; xi;t and yi;t @rTi;tþ1 j yi;t1 ; xi;t ; so that f ðyi;t j yi;t1 ; xi;t ; ri ¼ 1; yÞ ¼ f ðyi;t j yi;t1 ; xi;t ; ri;t ¼ 1; yÞ
(13)
and the balanced and unbalanced panels give the same results. However, when the authors simulated the following model for the missing data mechanism: Prðri;t ¼ 1Þ ¼ Prðri;t 40Þ ¼ Prðg0 þ g1 x¯ i þ mi þ ui;t 40Þ,
(14)
where x¯ i is the average over the T waves for the variable xi;t ; the power of the variable addition tests and of the quasi-Hausman tests increased. In this new case yi;t @ri;t1 j yi;t1 ; xi;t and ri;t @yi;t1 j ri;t1 ; x¯ i are valid but ri;t @yi;t1 j ri;t1 ; xi;t is not, so that yi;t @rTi;tþ1 j yi;t1 ; xi;t is not satisfied. This means that variables that are linked to the future response pattern rTi;tþ1 affect model (10). The results of Verbeek and Nijman (1992a) support this claim; in fact, Q P the power of the tests obtained by adding the variables Tt¼1 ri;t ; Tt¼1 ri;t is high, while the power is very low when the variable ri;t1 is added. The same type of reasoning implies that a quasi-Hausman test is more powerful when model (14) is used for simulation instead of model (11), and the results again support our conclusion. 17 Suppose we have a panel of individuals observed for T consecutive waves. The balanced panel then consists of the subsample of individuals observed for all T waves, whereas the unbalanced panel consists of the sample of individuals observed in at least one wave.
ARTICLE IN PRESS 480
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
Finally, Verbeek and Nijman (1992a) also computed the power of a Lagrange multiplier test of whether the correlation between the errors in the main equation and in the selection model is zero. They found that the power of the test is high in both simulations. This is because the Lagrange multiplier test correctly takes account of the joint specification of the model of interest and selection mechanism. The joint specification is correct because it is imposed by the simulation exercises, but it is unknown and untestable in general in the presence of missing data. On the other hand, quasi-Hausman and variable addition tests, which do not require a joint specification of the model of interest and selection mechanism, under-reject the null hypothesis of ignorability. Their usefulness in the detection of the selection problem is questionable. They can test for selectivity either due to correlation between the random effects or due to selection on observables, but they cannot test for MAR. Even when those tests are to some extent useful, we suggest adopting a more formal test for selectivity. In particular, when the selection problem reduces to a problem of correlation between individual random effects, then within fixed effect estimation control correctly for selectivity (Verbeek and Nijman, 1992b). When selection reduces to selection on observables, then it is possible to correct for selection bias either by conditioning on the observed variables affecting the selection process in the model of interest, or by weighting estimation of the model of interest by the inverse of the propensity score.
4.2. Limits of the Little and Park– Davis tests The Little (1988) and Park and Davis (1993) tests are based on a common idea: to divide units into groups according to the (non)response pattern, ri ;18 to estimate the model of interest for each group separately; then to test MCAR by determining whether the estimated parameters of the models, associated with each response pattern, are equal. Little considers the normal probability distribution for a variable, y, affected by nonresponse, and tests the MCAR condition by a likelihood ratio test. Park and Davis consider a discrete distribution for a variable, y, conditional on a set of explanatory variables, say x, and use a Wald test, instead of a likelihood ratio test, to verify MCAR. Both tests verify a condition that is only necessary but not sufficient to ensure MCAR. Suppose that T different repeated values are observed for the unit, i, for the variable, y, then the Little test verifies whether yi;t @rTi;tþ1 j ri;t ; yoi;t1 ; while the Park and Davis test verifies whether yi;t @rTi;tþ1 j ri;t ; yoi;t1 ; xi;t ; where xi;t are variables that are always observed. The null hypothesis used in both tests does not imply the MAR condition. The reason for this is more evident when the missing data problem is limited to attrition. Let y be a variable observed for N units repeatedly in consecutive waves up to attrition from the panel or up to T, the last wave of the panel. Little (1988) assumes that, under MCAR, yi is distributed as a multivariate Gaussian with mean m and 18
For example for a panel of T waves there are 2T 1 possible response patterns and consequently 2T corresponding groups to which a unit may belong.
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
481
variance S whatever the response pattern, ri : Then, Little (1988) tests MCAR verifying whether the sub-vector of the observed variables is distributed as a multivariate normal with mean equal to the corresponding sub-vector of m and submatrix of S; of the multivariate normal distribution for yi : In the case of attrition, the sub-vector of observed variables for a generic unit dropping out after t periods is yi;t and we denote by mðtÞ and SðtÞ the mean vector and the variance matrix corresponding to the sub-vector of the first t elements of m and the t t principal sub-matrix of S: Let mtPbe the number of units that drop out of the panel at period t ^ ðtÞ be ðt þ 1Þ; let y¯ ðtÞ ¼ ð1=mt Þ m j¼1 yj;t be the average vector over these units, and let m equal to the sub-vector of the first likelihood estimator of P t elements of the maximum 1 m: Then the Little test is T L ¼ Tt¼1 mt ð¯yðtÞ m^ ðtÞ Þ0 SðtÞ ð¯yðtÞ m^ ðtÞ Þ: Little claims that under the MCAR assumption, T L is distributed as a w2 with ðT ðT 1ÞÞ=2 degrees of freedom. This claim is true; however, the same distribution remains valid under the weaker assumption that yi;t @rTi;tþ1 j ri;t ; yi;t1 : Little’s test cannot verify whether the variable yi;t ; given its past values, is independent of ri;t ; in fact, if yi;t is observable, ri;t is always equal to a vector of ones. In other words, Little’s test cannot verify the condition yi;t @ri;t j yi;t1 ; but can only check the condition, yi;t @rTi;tþ1 j ri;t ; yi;t1 : It is possible to prove that the last condition is equivalent to the hypothesis that y does not Granger cause r, ri;t @yi;t1 j ri;t1 :19 In conclusion, the Little test verifies a condition that is necessary but not sufficient for MCAR, and that is neither necessary nor sufficient for the MAR condition. The same comments apply to the Park and Davis test, when a set of explanatory variables, x, is added to the conditioning variables and the variable y is discretely distributed. An equivalent reasoning is valid when the missing data problem is more general than attrition. Then the true null hypothesis of the Little test would be yi;t @rTi;tþ1 j ri;t ; yoi;t1 ; while the null hypothesis of the Park and Davis test would be yi;t @rTi;tþ1 j ri;t ; yoi;t1 ; xi;t ; which is sufficient neither for MAR nor for MCAR. 4.3. Limits of the Diggle test Diggle (1989) has proposed a class of tests for random attrition. Given a panel with T waves, we can observe a generic unit for a number of consecutive periods ranging from 1 to T. The tests proposed by Diggle verify whether units that drop out at the ðt þ 1Þth wave represent a random sample of units that drop out at the ðt þ 1Þ wave or later. He introduces a score function of the observed past variables, hðÞ; which should be linked to the drop out probability, and tests whether the score functions for the units dropping out at the ðt þ 1Þth wave are a random sample of the set of scores for units that drop out at the ðt þ 1Þth wave or later. Then a Kolmogorov–Smirnov test can be one of the possible tests. In other words, Diggle (1989) verifies whether the distribution of ðhðyi;t Þ j ri;t ¼ 1; ri;tþ1 ¼ 1Þ is equal to the distribution of ðhðyi;t Þ j ri;t ¼ 1; ri;tþ1 ¼ 0Þ; that is, whether 19
For a proof see Florens and Mouchart (1982).
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
482
the condition ðhðyi;t Þ@ri;tþ1 j ri;t ¼ 1Þ
(15)
holds. Let us assume that the function h is such that ðyi;t @ri;tþ1 j ri;t ¼ 1; hðyi;t ÞÞ; that is, given past information on r, h is a balancing score, as defined by Rosenbaum and Rubin (1983). Under that assumption, testing ðhðyi;t Þ@ri;tþ1 j ri;t ¼ 1Þ is then equivalent to testing ðyi;t @ri;tþ1 j ri;t ¼ 1Þ; which is equivalent to the condition that y does not Granger cause r and is not at all equivalent to the MAR condition. Diggle suggests choosing a function h that reflects the probability that ri;tþ1 ¼ 1 as a function of yi;t ; in other words he implicitly suggests using the propensity score, Prðri;tþ1 ¼ 1 j yi;t ; ri;t ¼ 1Þ: As proven by Rosenbaum and Rubin (1983), the propensity score is the coarsest balancing score; so that any other balancing score is a function of the propensity score. In conclusion, Diggle’s testing procedure verifies the Granger noncausality condition ðri;tþ1 @yi;t j ri;t Þ.
(16)
However, it is not able to verify if ðyi;tþ1 @ri;tþ1 j ri;t ¼ 1; yi;t Þ; so it is not a test for MAR or, as defined by Diggle, for random dropouts.
5. Conclusions In this paper, we have defined weak ignorability of the selection mechanism as the set of conditions necessary and sufficient to make correct and efficient inference based on the likelihood function. Rubin (1976) has proved that in order to make correct likelihood-based inference, we need two conditions: the MAR condition and the variation-free condition for the parameters of the model of interest and of the selection mechanism. Using the terminology of model reduction theory, we have shown that weak ignorability is satisfied if the model of interest and selection mechanism operate a statistical cut, and if the MAR condition is true. In borrowing from model reduction theory, in particular from Florens et al. (1980), Engle et al. (1983) and Florens and Mouchart (1985), we have extended the definitions of weak ignorability to panel data models. Two definitions of weak ignorability may be given: one in terms of a joint model, defined for T consecutive waves, and another in terms of a sequential model, corresponding to a dynamic model of interest and defined for a single time period. We have proved that weak ignorability for a joint model requires a MAR condition and an initial cut, whereas weak ignorability for a dynamic model requires a sequential cut, a Granger noncausality condition and a sequential MAR condition. Moreover, we have shown that, if the model of interest is conditional on a set of explanatory variables, then some additional conditions are necessary. Substituting MAR with MCAR in the definition of weak ignorability, we have obtained the strong ignorability definition, which is the condition for correct inference whether likelihood or not likelihood
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
483
based. Ignorability defined for a sequential model is equivalent to the one defined for a joint model in absence of explanatory variables or when the explanatory variables are strictly exogenous. The extension of weak and strong ignorability to the case of dynamic panel models has allowed us to emphasize the failure of some tests proposed in the literature to verify the MAR and/or MCAR conditions. Indeed, we have proved that the null hypothesis of some tests is given by an assumption that is not necessary for MAR and which is necessary but not sufficient for MCAR. Furthermore, the formal definition of weak and strong ignorability has helped us to disentangle some of the misunderstandings that occurred between Holland and Granger concerning the concept of causality. In conclusion, the main claim of the paper is that the MAR condition cannot be verified without observing the missing data or without imposing some untestable assumptions on the relationship between the selection process and the missing variables.
Acknowledgements Part of this paper is based on work carried out during a visit to the European Centre for Analysis in the Social Sciences (ECASS) at the Institute for Social and Economic Research, University of Essex supported by the Access to Research Infrastructure action under the EU Improving Human Potential Programme. Special thanks are given to Mark Bryan, Pierre Hoonhout and Franco Peracchi and to participants at the ISER seminar in the University of Essex, Colchester, the EC2 Conference, 2001, in Louvain la Neuve and the 10th International Conference on Panel Data, 2002, in Berlin for useful comments on this paper.
Appendix A. Proof of the proposition First, we prove that (d11) and (s1) are sufficient conditions to ensure (c1) and (c2), that is, weak ignorability. Applying the condition of Granger noncausality (d12) to the factorization (d11), we obtain: f ðyi ; ri ; fÞ ¼
T Y
f ðyi;t j yi;t1 ; ri;t1 ; yÞ
t¼1
T Y
f ðri;t j yi;t ; ri;t1 ; gÞ ¼ f ðyi ; yÞf ðri j yi ; gÞ,
t¼1
(17) so that (d11) and (d12) ensure the initial cut (c1). Let us assume that a generic ith unit drops out at the dth wave, and let us rewrite the joint model as the product of three factors: f ðyi ; ri ; fÞ ¼ L1
L2
L3 ,
(18)
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
484
where L1 ¼
dY 1
f ðyi;t j yi;t1 ; yÞ
t¼1
dY 1
f ðri;t j ri;t1 ; yi;t ; gÞ,
t¼2
L2 ¼ f ðym i;d j yi;d1 ; yÞf ðri;d j ri;d1 ; yi;d ; gÞ, L3 ¼
T Y
f ðyi;t j yi;t1 ; yÞf ðri;t j ri;t1 ; yi;t ; gÞ.
t¼dþ1
To make likelihood-based inference on y we must eliminate the unobserved variables by integration in the following way: Z Z T f ðyi ; ri ; fÞ dyi;d ¼ L1 L2 L3 dyTi;d . (19) The factor L1 does not depend on unobserved variables, so it can be taken out of the integral sign. The condition (s1) ensures that f ðri;d j ri;d1 ; yi;d ; gÞ ¼ f ðri;d j ri;d1 ; yi;d1 ; gÞ in L2 ; which again does not depend on unobserved variables and can be taken out of the integral sign. For any t4d; ðri;t j ri;d ¼ 0Þ is independent of any variable because Prðri;t ¼ 0 j ri;d ¼ 0Þ ¼ 1 when ri;d ¼ 0: Consequently ri;t becomes degenerate and the selection mechanism in L3 cancels out. Then the integrated likelihood becomes: Z Z Y T f ðyi ; ri ; fÞ dyTi;d ¼ L1 f ðri;d j ri;d1 ; yi;d1 ; gÞ f ðyi;t j yi;t1 ; yÞ dyTi;d . (20) Since
t¼d
R QT
dY 1
T t¼d f ðyi;t j yi;t1 ; yÞ dyi;d
f ðyi;t j yi;t1 ; yÞ
t¼1
d Y
¼ 1; we can rewrite the integrated likelihood as
f ðri;t j ri;t1 ; yi;t1 ; gÞ.
(21)
t¼2
Since y and g are variation free, we can make inference on the parameter y ignoring the selection mechanism, that is considering the truncated likelihood: dY 1
f ðyi;t j yi;t1 ; yÞ.
(22)
t¼1
Under conditions (s1) and (d12) and because the variable ri;t is degenerate for t4d; we can write f ðri j yi ; gÞ ¼
T Y t¼1
f ðri;t j yi;t ; ri;t1 ; gÞ ¼
T Y
f ðri;t j yi;t1 ; ri;t1 ; gÞ ¼ f ðri j yoi ; gÞ,
t¼1
(23) so that the condition (c2) holds. In the following, we prove that (d11) and (s1) are necessary conditions to ensure (c1) and (c2). We begin by proving that when the initial cut (c1) operates and
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
condition (d12) holds, then (d11) is true. Using condition (c1), we can write: Y Y f ðyi ; ri ; fÞ ¼ f ðyi ; yÞf ðri j yi ; gÞ ¼ f ðyi;t j yi;t1 ; yÞ f ðri;t j yi ; ri;t1 ; gÞ. t
485
(24)
t
Since condition (d12) may be restated as ri;t @yTi;tþ1 j ri;t1 ; yi;t ;20 we can rewrite the joint likelihood as Y Y f ðyi ; ri ; fÞ ¼ f ðyi;t j yi;t1 ; yÞ f ðri;t j yi;t ; ri;t1 ; gÞ, (25) t
t
so that the sequential cut (d11) operates. When conditions (c1) and (c2) hold we have Y Y f ðri j yi ; gÞ ¼ f ðri;t j yi;t ; ri;t1 ; gÞ ¼ f ðri j yoi ; gÞ ¼ f ðri;t j yoi;t ; ri;t1 ; gÞ. t
(26)
t
For t ¼ d the sequential selection process in (26) is independent of the potential unobserved yi;t if and only if (s1) is satisfied; hence the condition (s1) is necessary for the condition (c2).
Appendix B. The sequential CAR condition in Gill and Robins (1997) and our sequential MAR condition A variable X is said to be coarsened if we cannot observe its exact value, but we know the subset of the sample space to which it belongs. In other words we observe a random set w; instead of X, which defines the subset to which X belongs. Following Gill and Robins (1997) we assume that ‘‘... w is a coarsening of an underlying random variable X. We suppose that X takes values in a finite space E. Its power set (the set of all subset of E) is denoted by E: So w takes values in En; and X 2 w with probability one.’’ Definition of a k-sequential coarsening. (Gill and Robins, 1997) We say that the random sets w1 ; w2 ; . . . ; wk ; w with each wm and w 2 En; form a k-sequential coarsening of a random variable X if for m ¼ 0; . . . ; k þ 1; wm wmþ1 with probability 1 where w0 fwg and wkþ1 w: Definition of a k-sequential CAR. (Gill and Robins, 1997) A k-sequential coarsening is a k-sequential CAR if, for m ¼ 1; . . . ; k; the conditional distribution of wm given wm1 is independent of the particular realization of wm1 except through the fact that is compatible with wm : In the discrete case, this means Prðwm ¼ A j wm1 ¼ BÞ is the same for all B in the support of wm1 such that B A: When the coarsening is due to the attrition problem, we prove that the ksequential CAR definition of Gill and Robins (1997) is equivalent to the sequential MAR definition given in this work. 20
For a proof of this last equivalence see Florens and Mouchart (1982).
ARTICLE IN PRESS 486
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
Let us consider a random sample of N units, for which we observe the variable y for T consecutive waves. Let the sample space of y be Y: Let yi be the vector of T variables observed for the ith unit, whereas let yi;t be the variable observed for the ith unit at the tth wave. If yi;t is missing, then the successive variables, yi;tþ1 ; . . . ; yi;T ; are also missing. Each missing variable, y, takes value in Y; so that the coarsening ~ which defines the sub-space to which y belongs, is equal to the whole random set, y; sample space Y: Let X ¼ yi be coarsened because the last k variables, ½yi;Tkþ1 ; . . . ; yi;T ; are missing; then the coarsening of the multivariate variables X is given by the random set denoted by w ¼ ½yi;1 ; . . . ; yi;Tk ; y~ i;Tkþ1 ; . . . ; y~ i;T ¼ ½yi;1 ; . . . ; yi;Tk ; Y; . . . ; Y: If we define w0 ; w1 ; w2 ; . . . ; wk ; w in the following way: w0 ¼ ½yi;1 ; . . . ; yi;Tk ; yi;Tkþ1 ; . . . ; yi;T ¼ X ,
(27)
w1 ¼ ½yi;1 ; . . . ; yi;Tk ; yi;Tkþ1 ; . . . ; Y; . . . ; wk ¼ ½yi;1 ; . . . ; yi;Tk ; Y; . . . ; Y ¼ w, (28) then wm1 wm for any m ¼ 0; . . . ; k and w1 ; w2 ; . . . ; wk form a k-sequential coarsening. To prove that w1 ; w2 ; . . . ; wk form a k-sequential CAR, we have to show that Prðwm ¼ A j wm1 ¼ BÞ ¼ c; where c is a constant, for all B in the support of wm1 such that B A (see the above definition of k-sequential CAR). If the first ðT 1Þ elements of w1 are not equal to the corresponding observed elements of w0 ; then Prðw1 ¼ A j w0 ¼ BÞ ¼ 0; so that verifying Prðw1 ¼ A j w0 ¼ BÞ ¼ c
(29)
for any B A is equivalent to verify that Prðy~ i;T ¼ Y j yi;1 ; . . . ; yi;Tk ; yi;Tkþ1 ; . . . ; yi;T ; ri;T1 ¼ 1Þ ¼ c
(30)
for yi;T 2 Y: Using the fact that Prðy~ i;T ¼ YÞ ¼ Prðri;t ¼ 0Þ; the above equality is equivalent to Prðri;T ¼ 0 j yi;1 ; . . . ; yi;T ; ri;T1 ¼ 1Þ ¼ Prðri;T ¼ 0 j yi;1 ; . . . ; yi;T1 ; ri;T1 ¼ 1Þ, (31) where r is the dummy indicating response. By analogy Prðwm ¼ A j wm1 ¼ BÞ ¼ c for all B in the support of wm1 such that B A is true if and only if Prðri;t ¼ 0 j yi;1 ; . . . ; yi;t ; ri;t1 ¼ 1; rTi;tþ1 ¼ 0Þ ¼ Prðri;t ¼ 0 j yi;1 ; . . . ; yi;t1 ; ri;t1 ¼ 1; rTi;tþ1 ¼ 0Þ,
ð32Þ
where t ¼ T m þ 1 and 0 is a vector of zeros. Since Prðri;t ¼ 0 j yi;1 ; . . . ; yi;t ; ri;t1 ¼ 1; rTi;tþ1 a0Þ ¼ 0 in the case of attrition, the last equality is the sequential MAR condition given in Section 3.5, ri;t @yi;t j ri;t1 ; yi;t1 :
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
487
References Abowd, J., Cre´pon, B., Kramarz, F., 2001. Moment estimation with attrition. Journal of the American Statistical Association 96 (456), 1223–1231. Ahn, H., Powell, J.L., 1993. Semiparametric estimation of censored selection models with nonparametric selection mechanism. Journal of Econometrics 58, 3–29. Andrews, D., Schafgans, M., 1998. Semiparametric estimation of the intercept of a sample selection model. Review of Economic Studies 65, 497–517. Angrist, J.D., 1997. Conditional independence in sample selection models. Economic Letters 54, 103–112. Cosslett, S.R., 1991. Semiparametric estimation of a regression model with sample selectivity. In: Barnett, W.A., Powell, J., Tauchen, G.E. (Eds.), Nonparametric and Semiparametric Methods in Econometrics and Statistics. Cambridge University Press, New York. Dehejia, R.H., Wahba, S., 1999. Causal effects in nonexperimental studies: reevaluation of the evaluation of training programs. Journal of the American Statistical Association 94, 1053–1063. Dehejia, R.H., Wahba, S., 2002. Propensity score matching methods for nonexperimental causal studies. Review of Economics and Statistics 84 (1), 151–161. Diggle, P., 1989. Testing for random dropouts in repeated measurement data. Biometrics 45, 1255–1258. Engle, R.F., Hendry, D.F., Richard, J.-F., 1983. Exogeneity. Econometrica 51 (2), 277–304. Florens, J.P., Mouchart, M., 1982. A note on noncausality. Econometrica 50 (3), 583–591. Florens, J.P., Mouchart, M., 1985. Conditioning in dynamic models. Journal of Time Series Analysis 52 (1), 157–175. Florens, J.P., Mouchart, M., Rolin, J.-M., 1980. Re´ductions dans les exe´priences baye´siennes se´quentielles. Cahiers du Centre d’Etudes de Recherche Ope´rationelle 22, 3–4, 353–362, Universite´ Catholique de Louvain, Louvain-la-Neuve. Florens, J.P., Mouchart, M., Rolin, J.-M., 1990. Elements of Bayesian Statistics. Marcel Dekker, New York. Gill, R.D., Robins, J.M., 1997. Sequential models for coarsening and missingness. In: Lin, D.Y., Fleming, T.R. (Eds.), Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis. Springer, New York, pp. 295–305. Gill, R.D., van der Laan, M.J., Robins, J.M., 1997. Coarsening at random: characterizations, conjectures, counter-examples. In: Lin, D.Y., Fleming, T.R. (Eds.), Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis. Springer, New York, pp. 255–294. Granger, C.W.J., 1969. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 424–438. Granger, C.W.J., 1986. Statistics and causal inference: Comment. Journal of the American Statistical Association 81 (396), 967–968. Hausman, J.A., Wise, D.A., 1979. Attrition bias in experimental and panel data: the Gary income maintenance experiment. Econometrica 47, 455–473. Heckman, J., 1979. Sample selection as a specification error. Econometrica 47 (1), 153–161. Heckman, J., 1990. Variaties of selection bias. The American Economic Review 80 (2), 313–318. Heckman, J., Hotz, V.J., 1989. Choosing among alternative nonexperimental methods for estimating the impact of social programs: the case of manpower training. Journal of the American Statistical Association 84 (408), 862–874. Heckman, J., Ichimura, H., Todd, P., 1997. Matching as an econometric evaluation estimator: evidence from evaluating a job training program. Review of Economic Studies 64 (4), 261–294. Heitjan, D.F., Rubin, D., 1991. Ignorability and coarse data. Annals of Statistics 19 (4), 2244–2253. Hendry, D.F., 1995. Dynamic Econometrics. Oxford University Press, Oxford. Hirano, K., Imbens, G.W., Ridder, G., Rubin, D.R., 2001. Combining panels with attrition and refreshment samples. Econometrica 69 (6), 1645–1659. Hirano, K., Imbens, G.W., Ridder, G., 2003. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71 (4), 1161–1189. Holland, P.W., 1986. Statistics and causal inference. Journal of American Statistical Association 81 (396), 945–970.
ARTICLE IN PRESS 488
C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
Horowitz, J.L., Manski, C.F., 1998. Censoring of outcomes and regressors due to survey nonresponse: identification and estimation using weights and imputation. Journal of Econometrics 84, 37–58. Horvitz, D., Thompson, D., 1952. A generalization of sampling without replacement from a finite population. Journal of the American Statistical Association 47, 663–685. Imbens, G.W., 1999. The role of the propensity score in estimating dose-response functions. Biometrika 87 (3), 706–710. Inkmann, J., 2001. Accounting for nonresponse heterogeneity in panel data. Center of Finance and Econometrics, Discussion Paper Series, 01/03, University of Konstanze. Jensen, P., Rosholm, M., Verner, M., 2002. A comparison of different estimators for panel data sample selection models. Department of Economics, University of Aarhus, Working Papers, 2002-1. Kyriazidou, E., 1997. Estimation of a panel data sample selection model. Econometrica 65, 1335–1364. Lechner, M., 1999. Earnings and employment effects of continuous off-the job training in East Germany after unification. Journal of Business & Economic Statistics 17 (1), 74–90. Little, J.A., 1988. A test of missing completely at random for multivariate incomplete data. Journal of the American Statistical Association 77, 237–250. Little, J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. Wiley, New York. Manski, C.F., 1989. Anatomy of the selection bias. Journal of Human Resources 24, 343–360. Manski, C.F., 1995. Identification Problems in the Social Sciences. Harvard University Press, Cambridge, MA. Manski, C.F., 2003. Partial Identification of Probability Distributions. Springer, New York. Manski, C.F., Pepper, J.V., 2000. Monotone instrumental variables: with an application to return to schooling. Econometrica 68, 997–1010. Nicoletti, C., 2003. Correcting for sample selection bias: alternative estimators compared. Working Papers of the Institute for Social and Economic Research, 2003-20, University of Essex, Colchester. Park, T., Davis, C.S., 1993. A test of the missing data mechanism for repeated categorical data. Biometrics 49, 631–638. Ridder, G., 1990. Attrition in multi-wave panel data. In: Hartog, J., Ridder, G., Theeuwes, J. (Eds.), Panel Data and Labor Market Studies. Elsevier Sciences Publishers, Amsterdam. Robins, J.M., Gill, R.D., 1997. Non-response models for the analysis of non-monotone ignorable missing data. Statistics in Medicine 16, 39–56. Robins, J., Rotnitzky, A., 1995. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association 90, 122–129. Robins, J.M., Greenland, S., Hu, F.-C., 1999. Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome. Journal of American Statistical Association 94 (447), 687–712. Robins, J.M., Rotnitzky, A., Zhao, L.P., 1995. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90 (429), 106–121. Robinson, P., 1988. Root-N-consistent semiparametric regression. Econometrica 56, 931–954. Rosenbaum, P., Rubin, B.D., 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70 (1), 41–55. Rubin, D.B., 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688–701. Rubin, D.B., 1976. Inference and missing data. Biometrika 63, 581–592. Rubin, D.B., 1989. Multiple Imputation for Nonresponse in Surveys. Wiley, New York. Rubin, D.B., 1996. Multiple imputation after 18+ years. Journal of American Statistical Association 91, 473–520. Shih, W.J., 1992. On informative and random dropouts in longitudinal studies (Letter to the Editor). Biometrics 48, 971–972. Vazquez Alvarez, R., Melenberg, B., van Soest, A., 1999. Bounds on quantiles in the presence of full and item nonresponse. CentER Discussion Paper, 1999-38, Tilburg University.
ARTICLE IN PRESS C. Nicoletti / Journal of Econometrics 132 (2006) 461–489
489
Vazquez Alvarez, R., Melenberg, B., van Soest, A., 2001. Nonparametric bounds in the presence of item nonresponse, unfolding brackets, and anchoring. CentER Discussion Paper, 2001-67, Tilburg University. Vella, F., 1998. Estimating models with sample selection bias: a survey. The Journal of Human Resources 3, 127–169. Vella, F., Verbeek, M., 1999. Two-step estimation of panel data models with censored endogenous variables and selection bias. Journal of Econometrics 90, 239–263. Verbeek, M., Nijman, T., 1992a. Testing for selectivity bias in panel data models. International Economic Review 33 (3), 681–704. Verbeek, M., Nijman, T., 1992b. Incomplete panel and selection bias. In: Ma´tya´s, L., Sevestre, P. (Eds.), The Econometrics of Panel Data. Klu¨ewer Academic Publishers, New York. Wooldridge, J.M., 1995. Selection correction for panel data models under conditional mean independence assumptions. Journal of Econometrics 68, 115–132. Wooldridge, J.M., 2002. Inverse probability weighted M-estimators for sample selection, attrition and stratification. Cemmap Working Paper, 11, Institute for Fiscal Studies, London.