Computational Statistics and Data Analysis 53 (2009) 4095–4105
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Robust testing in the logistic regression model Ana M. Bianco a,b,∗ , Elena Martínez a a
Instituto de Cálculo, F. C. E. y N., Universidad de Buenos Aires, Argentina
b
CONICET, Argentina
article
info
Article history: Received 19 August 2008 Received in revised form 23 April 2009 Accepted 24 April 2009 Available online 5 May 2009
abstract We are interested in testing hypotheses that concern the parameter of a logistic regression model. A robust Wald-type test based on a weighted Bianco and Yohai [ Bianco, A.M., Yohai, V.J., 1996. Robust estimation in the logistic regression model. In: H. Rieder (Ed) Robust Statistics, Data Analysis, and Computer Intensive Methods In: Lecture Notes in Statistics, vol. 109, Springer Verlag, New York, pp. 17–34] estimator, as implemented by Croux and Haesbroeck [Croux, C., Haesbroeck, G., 2003. Implementing the Bianco and Yohai estimator for logistic regression. Computational Statististics and Data Analysis 44, 273–295], is proposed. The asymptotic distribution of the test statistic is derived. We carry out an empirical study to get a further insight into the stability of the p-value. Finally, a Monte Carlo study is performed to investigate the stability of both the level and the power of the test, for different choices of the weight function. © 2009 Elsevier B.V. All rights reserved.
1. Introduction In the binomial regression model we assume that the response variable Y has a Bernoulli distribution such that P (Y = 1|X = x) = F (x0 β),
(1)
where F is a strictly increasing cumulative distribution function, X ∈ Rp is the vector of explanatory variables and β ∈ Rp is the unknown regression parameter. When F (t ) =
exp(t ) 1 + exp(t )
(2)
we have the logistic regression model, which is the model we will consider from now on. However, our results can be extended to other link functions. It is well known that the maximum likelihood estimator (MLE) of β can be severely affected by outlying observations. Croux et al. (2002) discuss the breakdown behavior of the MLE in the logistic regression model and show that the MLE breaks down to zero when severe outliers are added to a data set. In the last few decades, a lot of work has been done in order to obtain robust estimates of the parameter in this model and also in the more general framework of generalized linear models. Among others, we can mention the proposals given by Pregibon (1982), Stefanski et al. (1986), Künsch et al. (1989), Morgenthaler (1992), Carroll and Pederson (1993), Christmann (1994) and Bianco and Yohai (1996) and more recently Croux and Haesbroeck (2003) and Bondell (2005, 2008).
∗ Corresponding address: Instituto de Cálculo, Ciudad Universitaria, Pabellón 2, Buenos Aires, C1428EHA, Argentina. Tel.: +54 11 45763375; fax: +54 11 45763375. E-mail address:
[email protected] (A.M. Bianco). 0167-9473/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2009.04.015
4096
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105
We are interested in testing parametric hypotheses about the regression parameter of the logistic regression model. Robust testing in this setting has received much less attention than robust estimation. Testing procedures based on classical estimates inherit the sensitivity of these estimators to atypical data, in the sense that a small amount of outlying observations can affect the level or the power of the tests. Testing procedures that, under contamination, retain a stable level and also a good power under specified alternatives, are desirable. The works of Heritier and Ronchetti (1994) and Cantoni and Ronchetti (2001) go in this direction. Heritier and Ronchetti (1994) introduce robust tests for a general parametric model, which includes logistic regression. Cantoni and Ronchetti (2001) define robust deviances based on generalizations of quasi–likelihood functions and propose a family of test statistics for model selection in generalized linear models. They also investigate the stability of the asymptotic level under contamination. In this paper we propose a Wald–type statistic based on a weighted version of the Bianco and Yohai (1996) estimator introduced by Croux and Haesbroeck (2003). Our proposal is a natural robustification of the classical Wald–type test, in the sense that the statistic of the test is a quadratic form based on robust estimators of the regression parameter and its asymptotic covariance matrix. We show that the asymptotic behavior of the proposed test is the same as that of its classical counterpart, that is, central χ 2 under the null hypothesis and noncentral χ 2 under contiguous alternatives. This paper is organized as follows. In Section 2 we briefly review some estimators related to the weighted estimator introduced by Croux and Haesbroeck (2003) and in Section 3 we state its asymptotic properties. In Section 4 we define the test statistic and we state its asymptotic distribution. In Section 5 we analyze the behavior of the p-values of the classical and proposed statistics when an outlying observation with increasing leverage is added to a data set. By means of a simulation study, we illustrate in Section 6 the performance of the proposed test in terms of both level and power. Finally, in Section 7 we provide some concluding remarks. 2. Preliminaries Consider the sample Zn = {Z1 , . . . , Zn }, with Zi = (X0i , Yi )0 and P (Yi = 1|Xi = x) = F (x0 β),
(3)
where F is given by (2). The maximum likelihood estimator of β is defined as
b βML = arg max log L(b, Zn ) = arg min b
b
n X
D(Yi , X0i b),
i=1
where L(b, Zn ) is the likelihood function and D(Yi , X0i b) = −Yi ln(F (X0i b))−(1−Yi ) ln(1−F (X0i b)) is the deviance component of the i-th observation. Pregibon (1982) proposes to modify the goal function using a monotone loss function ρ in order to give less weight to those observations which are poorly predicted by the model. Since Yi are indicator variables, the proposed estimators can be defined as arg min b
n X
Yi ρ(− ln(F (X0i b))) + (1 − Yi )ρ(− ln(1 − F (X0i b))).
i =1
He suggests to use a loss function ρ(t ) in the Huber type family, given by
ρ(t ) =
if t ≤ c
t 1
2(tc ) 2 − c
if t > c ,
where c is a positive constant. However, the resulting estimators are not consistent. Later on, Bianco and Yohai (1996) introduce a class of M-estimators which are a more robust version of them, and consistent, by including a bias correction term G and using a bounded loss function. These estimators are defined by the minimization of n X
φ(Yi , X0i b),
(4)
i=1
where
φ(y, t ) = yρ(− ln(F (t ))) + (1 − y)ρ(− ln(1 − F (t ))) + G(F (t )) + G(1 − F (t )) − G(1) with ρ a bounded, differentiable and nondecreasing function. The function G is defined as G(t ) =
t
Z
ψ(− ln u)du, 0
with ψ(t ) = ρ 0 (t ).
(5)
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105
4097
Croux and Haesbroeck (2003) give a criterion for the existence of the estimators defined by the minimization of (4) and provide a fast and stable algorithm to compute these estimators. They also suggest to use the loss function ρ in the family
( ρ (t ) =
te−
√
c
√ − t
−2e
1+
√
t +e
√ − c
2 1+
√
c +c
if t ≤ c
if t > c ,
(6)
where c is a positive constant. Besides, they introduce a weighted version of the above estimator that can be defined as n X
b β = arg min b
φ(Yi , X0i b)w(Xi ),
(7)
i=1
where w(·) is a weight function that depends only on a robust Mahalanobis distance of the explanatory variables. Downweighting outlying points in the space of the explanatory variables prevents high leverage points, and this modification is of special interest in robust testing since the presence of unweighted leverage points may inflate the estimator of the covariance matrix, and hence spoil the test statistic. For this reason, we will concentrate our proposal in the family of estimators defined by (7). Since the weights are based only on the x-values, the resulting estimator remains consistent without any further distributional assumptions, and it is also more robust than the weighted maximum likelihood estimator to vertical outliers. 3. The robust estimator and its asymptotic properties Throughout this section we will assume that Z1 = (X01 , Y1 )0 , . . . , Zn = (X0n , Yn )0 is a random sample following the logistic regression model (3). Assume that b β is the estimator defined by (7), that is, b β solves the differentiated equations system n X
Ψ (Yi , X0i b)w(Xi )Xi = 0,
(8)
i =1
where
∂φ(y, s) . Ψ (y, t ) = ∂ s s=t
(9)
Croux and Haesbroeck (2003) consider the hard rejection weight function w(·) applied to robust Mahalanobis distances derived from robust estimators of location and scatter, such as the Minimum Covariance Determinant (MCD) estimator (Rousseeuw, 1985). They prove that the influence function of this weighted version of the estimator is bounded. Besides hard rejection weights that completely delete observations beyond a fixed threshold – usually a quantile χp2,α – we can consider another type of weight that decreases to zero, but never vanishes completely, even beyond the threshold. In this paper we will consider a general class of weights applied to a norm of the covariates that includes hard rejection type weights and smoother ones. The asymptotic distribution of b β involves the matrices A(β) = Eβ B(β) = Eβ
∂ 2 φ(Y , X0 β) 0 w X X X ( ) ∂t2 ! 2 ∂φ(Y , X0 β) w2 (X) XX0 , ∂t
where (X0 , Y ) follows the logistic regression model given in (3). We derive the asymptotic behavior of b β under the following assumptions: A1. ρ is a bounded, strictly increasing and twice differentiable function with bounded derivatives ψ and ψ 0 and such that ρ(0) = 0. ψ 0 exists except for at most a finite number of points ci , 1 ≤ i ≤ k and it is continuous in [0, c1 ), (ci , ci+1 ), 1 ≤ i ≤ k − 1 and (ck , ∞). A2. w is a non–negative bounded function with support C ∈ R, such that 0 < P (X ∈ C ). A3. P (X0 α = 0) = 0 for all α ∈ Rp . A4. E (w 2 (X)kXk4 ) < ∞. A5. ψ(− ln(F (t ))) is Lipschitz in t of constant Kψ . Remark 3.1. Assumption A1 is a weak condition that includes the case in which the score function ψ is not differentiable. A2 will ensure the Fisher–consistency of the estimator and, together with A1, will imply the positive definiteness of matrix B. A4 is a weaker assumption on the distribution of the covariates than the existence of high order moments and, in the case of random carriers, it is equivalent to condition C1 (a) in Markatou and He (1994). Assumption A5 is weaker than the condition required in Bianco and Yohai (1996) on the score function, where it is assumed that ψ(− ln(F (t ))) is a Lipschitz function in F (t ). In this way, the class of ρ functions has been extended in order to include functions such as (6).
4098
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105
The next theorem states the Fisher consistency of the estimates defined in (7). Theorem 3.1. Assume that assumptions A1 and A2 hold and that for all α ∈ Rp , α 6= 0, P (X0 α = 0) < 1. Then for all b ∈ Rp , b 6= β we have Eβ φ(Y , X0 β)w(X) < Eβ φ(Y , X0 b)w(X) ,
where φ is defined in (5). The proof of this theorem follows as that of Theorem 2.2 in Bianco and Yohai (1996) taking into account that ψ is strictly positive and A2 holds. The following theorem states the almost sure consistency of the weighted estimator under general conditions. Analogous arguments to those used in Bianco and Yohai (1996) allow us to prove it. Theorem 3.2. Assume that A1 to A3 hold, then the estimate defined in (7) converges almost surely to the true parameter β. Finally, in the next theorem, the asymptotic normality of b β is stated.
√
Theorem 3.3. Assume that A1 to A5 hold. Let b β be the estimate defined in (7), then n(b β − β) converges in distribution to a p –dimensional normal distribution with 0 mean and covariance matrix 6(β) given by
6(β) = A−1 (β)B(β)A−1 (β). Its proof can be found in Bianco and Martínez (2009). 4. The test statistic and its asymptotic behavior We focus on the problem of testing, under model (3), the hypotheses H0 : β = βo vs. H0 : β 6= βo . It seems natural to test H0 through the Wald–type statistic which measures the discrepancy between the estimator obtained from (7) and the value of the parameter under the null hypothesis, using the following quadratic form −1 D (βo , b 6) = n(b β − βo )0 b 6 (b β − βo ),
(10)
where b 6 is a consistent estimator of the covariance matrix of b β, which can be obtained through consistent estimators of the matrices A and B. As in linear regression, one of the most frequent hypothesis testing problems involves only a subset of q regression 0
0
parameters. Let β = (β0(1) , β0(2) )0 , b β = (b β(1) , b β(2) )0 and X = (X0(1) , X0(2) )0 , where β(1) ∈ Rq . In order to test H0 : β(1) = β(1),o , β(2) unspecified, one may use the statistic −1 D1 (β(1),o , b 611 ) = n(b β(1) − β(1),o )0 b 611 (b β(1) − β(1),o ),
(11)
where b 611 denotes the q × q submatrix of b 6, corresponding to the coordinates of β(1) . Matrix A(β) can be estimated by
ˆ (β) = A
n 1 X ∂ 2 φ(Yi , X0i β)
n
∂t2
i =1
w (Xi ) Xi X0i ,
while matrix B(β) can be estimated through
ˆ (β) = B
n 1 X
n
F (X0i β) − Yi
2
M 2 (X0i β) w 2 (Xi ) Xi X0i ,
(12)
i =1
where M (t ) = (1 − F (t ))ψ (− ln(F (t ))) + F (t ) ψ (− ln(1 − F (t ))) .
(13)
ˆ (b Hence, we estimate 6 by A β)−1 Bˆ (b β)Aˆ (b β)−1 . To derive the consistency of this estimator, we will require the following condition, which is equivalent to condition C1 (b) in Markatou and He (1994) for the case of fixed carriers. A6. E (w(X)kXk3 ) < ∞. The following lemma states that the proposed estimates of matrices A and B can be estimated consistently.
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105
4099
p ˜ −→ Lemma 4.1. Let (X01 , Y1 ), . . . , (X0n , Yn ) be a random sample following the logistic regression model (3). Assume that β β and that A1, A2, A4 and A5 hold. Then, we have that p
˜ −→ A(β), if, in addition, ψ 0 (− log(F (t ))) is Lipschitz in t and A6 holds. ˆ (β) (a) A p
˜ −→ B(β). ˆ (β) (b) B In the next theorem we state the asymptotic distribution of the test statistic under the null hypothesis and under a fixed value in the alternative hypothesis. Theorem 4.1. Let (X0i , Yi ), 1 ≤ i ≤ n, be n independent random vectors that verify model (3). Let b β be the estimator defined in (7) and b 6 be a consistent sequence of estimators of 6, the asymptotic covariance matrix of b β. Then, if A1 to A5 hold, we have that D
(a) under H0 : β = βo , D (βo , b 6) −→ χp2 p
(b) under H1 : β 6= βo , D (βo , b 6) −→ ∞ for any fixed β. Analogously, we obtain the next theorem that states the asymptotic behavior of the test statistic when we are interested in finding out the impact of some components of the explanatory variables X on the response variable Y . Theorem 4.2. Let (X0i , Yi ), 1 ≤ i ≤ n, be n independent random vectors that verify model (3). Let b β be the estimator defined in (7) and b 6 be a consistent sequence of estimators of 6, the asymptotic covariance matrix of b β. Then, if A1 to A5 hold, we have that D
(a) under H0 : β(1) = β(1),o , D1 (β(1),o , b 611 ) −→ χq2 p
(b) under H1 : β(1) 6= β(1),o , D1 (β(1),o , b 611 ) −→ ∞ for any fixed β(1) . The proofs of the previous results can be found in Bianco and Martínez (2009). There, these authors also study the b asymptotic behavior of the test statistic under contiguous alternatives. They prove that, under suitable conditions, if √6 is a consistent sequence of estimators of 6, under contiguous alternatives of the form H1,n : β = βn = βo + c/ n, D
D (βo , b 6) −→ χp2 (θ ), where the non–centrality parameter is θ = c0 6−1 c for all c ∈ Rp .
From Theorems 4.1 and 4.2 we conclude that to test the null hypotheses at a given asymptotic significance level α , we have the following consistent tests:
• reject H0 : β = βo
if
and
• reject H0 : β(1) = β(1),o
D (βo , b 6) > χp2,α , if
D1 (β(1),o , b 611 ) > χq2,α .
5. Stability of the p-value In order to assess the effect on the p-value of outliers with increasing leverage, we carry out an empirical study based on the well–known vasoconstriction data set (Finney, 1947). Using these data Pregibon (1981) shows the effect of possible influential observations on the maximum likelihood estimator in the logistic regression setting. The binary responses indicate the presence or absence of vasoconstriction of the skin of the digits after air inspiration, while the two explanatory variables are the rate and the volume of inspired air (both in logarithms). Many authors agree in that observations 4 and 18 are influential points, but in deleting them the data are very close to the situation in which there is no overlap between successes and failures. Indeed, after deleting these two points, the overlap relies only on one observation. This makes this data set very difficult to handle. We compute the maximum likelihood estimator of β based on the whole sample, b βML , the maximum likelihood estimator evaluated without the two influential data points, b βML−{4,18}
and the robust estimator, b βB , and the corresponding test statistics: DML , DML−{4,18} and DB . The robust estimator b βB , defined by (7), is obtained using ρ in the family given in (6) with tuning constant c = 0.5 and weights based on the bisquare function with constant c = 4.685 applied to a robust Mahalanobis distance based on the MCD estimator as described in Croux and Haesbroeck (2003). Table 1 shows the three estimators of b β and the p-values of the three tests. DML detects the explicative variables and the intercept as significant, at the usual 5% level. On the other hand, the test based on DML−{4,18} and the proposed test based on DB find all the covariates and the intercept as non-significant, at 5% level. As in Croux et al. (2008), in order to asses the effect of contamination on the decision, one observation is added to the data at position (x1 , x2 , y) = (s, s, 0) for 0 < s ≤ 10 by steps of one. Between 0 and 2 the grid was refined. The line in Fig. 1 corresponds to the direction the added point moves along. The leverage of the added point increases with s, becoming a more serious outlier each time. We compute the p-values of the classical and the robust tests as s increases. In Fig. 2 the solid lines correspond to the p-values of the classical test, DML , while the lines with filled circles correspond to the robust test, DB . We also indicate the 5% significance level with a dashed line. The p-values corresponding to the robust test are more stable and, at a 5% significance level, we never change our decision. Instead, the p-values corresponding to the classical test increase rapidly as s moves towards 10, and the decision changes from rejection for small values of s to no rejection in the three cases.
4100
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105
Table 1 Vasoconstriction Data. Estimated parameters and p-values. Estimated values
Int log(Vol) log(Rate)
p-value
b βML
b βML−{4,18}
b βB
DML
DML−{4,18}
DB
−2.887
−24.590
−10.138
5.194 4.578
39.539 31.928
15.628 13.479
0.029 0.005 0.013
0.078 0.087 0.071
0.255 0.262 0.233
1 1
1.0
1 1
1 1 1
0
0 0
0.5
0
1
1
1
1
1
1
1
1
0
0
0
1
0
0.0
log(Rate)
1
1 1 0
1 0.0
0.5
1.0
log(Volume)
Fig. 1. Vasoconstriction data. The line corresponds to the direction of the contamination.
0
2
4
6 S
8
10
0.8 0.0
0.2
0.4
p-value
0.6
0.8 0.6 0.4 0.2 0.0
0.0
0.2
0.4
p-value
p-value
0.6
0.8
1.0
log(Rate)
1.0
log(Volume)
1.0
Intercept
0
2
4
6 S
8
10
0
2
4
6
8
10
S
Fig. 2. Sensitivity of the p-value. Solid lines correspond to the p-values of the classical test, and the lines with filled circles to those of the robust test. The dashed line indicates the 5% significance level.
6. Monte Carlo study We carry out a Monte Carlo study in order to assess the behavior of the proposed test in contaminated and non–contaminated samples. We also compare its performance with that of the classical Wald test.
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105
4101
Table 2 Observed frequencies of rejection under the null hypothesis in the non-contaminated case C0 . n
DML
D1
DH1
DH2
DB
DHR
Nominal level
100 50
0.041 0.031
0.066 0.089
0.064 0.087
0.060 0.080
0.059 0.075
0.065 0.085
0.05 0.05
We consider two different data schemes. In the first one, we follow the configurations introduced by Croux and Haesbroeck (2003) to study the stability of the level of the tests. In the second one, we follow a scheme close to that considered in Bianco and Yohai (1996) and we focus on power. 6.1. Level We generate 5000 samples of size n = 100 and 50 with p = 3, i.e. an intercept and two covariates. The explanatory variables U0i = (Ui1 , Ui2 ) are distributed according to a standard normal distribution N2 (0, I2 ) and the response variables Yi follow the model P (Yi = 1|Xi = xi ) =
exp(x0i βo ) 1 + exp(x0i βo )
,
where X0i = (1, U0i ) and β0o = (0, 2, 2). This case will be identified as the non-contaminated situation C0 for n = 100 and 50. For each sample size, we consider two contamination schemes in order to show that the level of the classical test breaks down, while the proposed tests are less sensitive to outliers. For n = 100 we consider the contaminations given by C1,100 : 3 misclassified observations are added on a hyperplane parallel to the true discriminating hyperplane X0 β with a √ shift equal to m × 2 and with the first covariate x1 around 3, C2,100 : analogous to C1,100 but with 5 outlying observations, while for n = 50 we consider the following contamination schemes C1,50 : similar to C1,100 but with 1 misclassified observation, C2,50 : similar to C1,100 but with 2 outlying observations. We compute the proposed test based on the loss function ρ in the family (6) with tuning constant c = 0.5 and choosing different weights:
• • • • •
w≡1 w based on the Huber ψ function with tuning constant c = χ22,0.975 w based on the Huber ψ function with tuning constant c = 1.345 w based on the Bisquare ψ function with tuning constant c = 4.685 w based on Hard Rejection with tuning constant c = χ22,0.975 .
These weights are applied to a robust Mahalanobis distance based on the MCD estimator, as described in Croux and Haesbroeck (2003). The corresponding test statistics are denoted D1 , DH1 , DH2 , DB and DHR , respectively, while the classical Wald statistic is denoted by DML . Table 2 shows the observed frequencies of rejection under Ho in the non–contaminated case Co for the two considered sample sizes. We observe that the asymptotic nominal level 5% is approximately retained by all the proposed tests and that DB has the closest value. It can be noticed that when n = 50 the frequencies of rejection are greater than those obtained for n = 100, but robust tests seem to be approximately reliable for this smaller sample size too. In Tables 3 and 4 we display the frequencies of rejection obtained for n = 100 with the considered tests for different values of m under contamination C1,100 and C2,100 , respectively. We observe that, under these contaminations, the classical test breaks in level since the frequencies of rejection under the null hypothesis are 1 or near to 1. Besides, the observed frequencies obtained for D1 are very far from the nominal level 0.05, particularly under C2,100 , showing that a weight on the explanatory variables is needed. On the other hand, the level of the proposed tests is more stable. In particular, the levels of the tests based on DB and DHR remain very stable, even under contamination C2,100 , where the percentage of outliers is higher. However, it is worth noting that the observed frequencies at which DB stabilizes are smaller than those at which DHR does. Similar results can be drawn from Tables 5 and 6 corresponding to n = 50. Even when the observed frequency of rejection of the classical test starts at a moderate value for m = 2, it grows rapidly to 1 (Table 5), showing the high sensitivity of this test to the presence of just one outlier. 6.2. Power We generate 5000 samples of size n = 100 and p = 2, i.e. an intercept and one covariate. The response variables Yi follow the model P (Yi = 1|Xi = xi ) =
exp(x0i β) 1 + exp(x0i β)
,
4102
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105 Non–contaminated Case
Contaminated Case
a
b
c
Fig. 3. Frequency of rejection in the non-contaminated and contaminated cases when P (Y = 1) = 0.65 for Ho : β = βo in (a), Ho : β1 = β1,o in (b) and Ho : β2 = β2,o in (c). Dashed lines correspond to the nominal level, thin lines to the classical Wald test and lines with filled circles to the proposed test. Table 3 Observed frequencies of rejection under the null hypothesis in the contamination case C1,100 . m
2
3
4
5
6
7
8
9
10
D1 DH1 DH2 DB DHR DML
0.206 0.153 0.108 0.071 0.066 0.928
0.193 0.144 0.099 0.067 0.066 0.999
0.177 0.120 0.087 0.060 0.066 1
0.168 0.101 0.078 0.060 0.066 1
0.149 0.083 0.070 0.060 0.066 1
0.119 0.073 0.065 0.060 0.066 1
0.108 0.067 0.062 0.060 0.066 1
0.110 0.061 0.058 0.060 0.066 1
0.125 0.060 0.056 0.060 0.066 1
Table 4 Observed frequencies of rejection under the null hypothesis in the contamination case C2,100 . m
2
3
4
5
6
7
8
9
10
D1 DH1 DH2 DB DHR DML
0.498 0.283 0.163 0.086 0.066 1
0.319 0.229 0.146 0.073 0.065 1
0.276 0.172 0.116 0.059 0.065 1
0.247 0.139 0.095 0.058 0.065 1
0.230 0.108 0.082 0.058 0.065 1
0.192 0.086 0.071 0.058 0.065 1
0.190 0.078 0.066 0.058 0.065 1
0.290 0.074 0.061 0.058 0.065 1
0.531 0.076 0.058 0.058 0.065 1
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105 Non–contaminated Case
a
4103
Contaminated Case
b
c
Fig. 4. Frequency of rejection in the non-contaminated and contaminated cases when P (Y = 1) = 0.5 for Ho : β = βo in (a), Ho : β1 = β1,o in (b) and Ho : β2 = β2,o in (c). Dashed lines correspond to the nominal level, thin lines to the classical Wald test and lines with filled circles to the proposed test. Table 5 Observed frequencies of rejection under the null hypothesis in the contamination case C1,50 . m
2
3
4
5
6
7
8
9
10
D1 DH1 DH2 DB DHR Dml
0.136 0.115 0.101 0.082 0.087 0.285
0.135 0.117 0.102 0.077 0.086 0.726
0.135 0.111 0.101 0.074 0.086 0.726
0.137 0.106 0.095 0.074 0.086 0.943
0.144 0.106 0.093 0.074 0.086 0.997
0.150 0.105 0.091 0.074 0.086 1
0.157 0.105 0.092 0.074 0.086 1
0.167 0.107 0.091 0.074 0.086 1
0.185 0.110 0.091 0.074 0.086 1
Table 6 Observed frequencies of rejection under the null hypothesis in the contamination case C2,50 . m
2
3
4
5
6
7
8
9
10
D1 DH1 DH2 DB DHR Dml
0.288 0.179 0.133 0.090 0.086 0.888
0.201 0.160 0.124 0.087 0.084 0.999
0.185 0.144 0.120 0.077 0.084 1
0.187 0.139 0.114 0.075 0.084 1
0.197 0.137 0.111 0.075 0.084 1
0.214 0.139 0.112 0.075 0.084 1
0.234 0.140 0.111 0.075 0.084 1
0.269 0.145 0.111 0.075 0.084 1
0.326 0.150 0.113 0.075 0.084 1
4104
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105 Non–contaminated Case
a
Contaminated Case
b
c
Fig. 5. Frequency of rejection in the non-contaminated and contaminated cases when P (Y = 1) = 0.2 for Ho : β = βo in (a), Ho : β1 = β1,o in (b) and Ho : β2 = β2,o in (c). Dashed lines correspond to the nominal level, thin lines to the classical Wald test and lines with filled circles to the proposed test.
where X0i = (1, Ui ), Ui have standard normal distribution and β = βo + ∆c. We consider three different values of βo , chosen so that the probability of success Pr(Y = 1) = 0.65, 0.5 and 0.2. The corresponding parameters are given in Table 7. In model 1 we choose the direction c = (c , −c /2)0 , while in models 2 and 3 we take c = (c , c )0 . In model 1 the chosen direction c is orthogonal to that corresponding to βo , while in model 3 this condition is approximately satisfied. It is worth noting that similar results were obtained for other directions. In model 3, values of ∆ smaller than -1 are not taken into account, since the arising models are very unstable. This is due to the fact that the corresponding probabilities of success are very small and so data sets close to the non-overlapping situation are frequent. For each model, we also consider contaminated samples where 5 outliers were added at points with y = 0 and u = 5. Three null hypotheses are considered for each model: H0 : β = βo , H0 : β1 = β1,o and H0 : β2 = β2,o . It is known that the classical Wald test for the logistic model may suffer from the Hauck–Donner effect, i.e. the power of the test may decrease to the significance level for alternatives very far from the null value (see Hauck and Donner, 1977). Some numerical results suggest that this effect may also affect robust Wald tests when dealing with very extreme designs, that is when few 00 s or 10 s are present in the sample, and with alternative hypotheses very far from the null one. For the sake of simplicity, according to the results obtained for the level of the different tests in the previous section, we decide to compare the classical Wald test and the proposed test based on DB , that is when weights based on the bisquare function are used. In Figs. 3–5 we plot the observed frequencies of rejection obtained for the three considered models in the non-contaminated and the contaminated cases. In all situations (ie. for the three models and the three null hypotheses considered) the classical and robust tests performances are, in the non-contaminated samples, very similar. However, under contamination, a completely different behavior is observed.
A.M. Bianco, E. Martínez / Computational Statistics and Data Analysis 53 (2009) 4095–4105
4105
Table 7 Parameter values of the three considered models. Model
Pr(Y = 1)
β0o = (β1,o , β2,o )
1 2 3
0.65 0.5 0.2
(1, 2) (0, 4.36) (−2.80, 2.82)
Under contamination, for H0 : β = βo and H0 : β2 = β2,o , we see that in the three models the classical test is noninformative, since the power function equals 1 for any value ∆, even under the null hypotheses, i.e., ∆ = 0. For H0 : β1 = β1,o we see that the effect of the contamination on the classical test is different in the three considered models. In model 1, the contamination introduces a shift in the power that makes the classical test almost useless. On the other hand, in models 2 and 3 the observed power function of the classical test follows a completely different pattern than that obtained under non-contamination, resulting again in a non-informative test. On the contrary, the proposed test is very stable in the three models. In fact, the observed frequencies of rejection obtained are close to those achieved in the non-contaminated samples. 7. Conclusions In this paper, a robust version of the Wald-type test statistic is introduced. This proposal is based on a weighted Bianco and Yohai (1996) estimator, as defined by Croux and Haesbroeck (2003). We investigate the asymptotic distributions for the proposed testing procedure and we show that the asymptotic behavior of the proposed test is the same as that for its classical counterpart. The simulation study shows that the proposed tests keep their asymptotic level for moderate sample sizes and that they are still reliable, even in the presence of outliers, especially when the weights are based on the bisquare or the hard rejection functions, suggesting that these two families may be a good choice for the practitioner. Acknowledgments The authors would like to thank two anonymous referees for their valuable comments and suggestions that led to improve the presentation of the paper. This research was partially supported by Grants pid 5505 from conicet, pict 21407 from anpcyt and X-018 from the Universidad de Buenos Aires at Buenos Aires, Argentina. References Bianco, A.M., Martínez, E.J., 2009. Robust testing in the logistic regression model. Available at http://www.ic.fcen.uba.ar/preprints/. Bianco, A.M., Yohai, V.J., 1996. Robust estimation in the logistic regression model. In: Rieder, H. (Ed.), Robust Statistics, Data Analysis, and Computer Intensive Methods. In: Lecture Notes in Statistics, vol. 109. Springer Verlag, New York, pp. 17–34. Bondell, H.D., 2005. Minimum distance estimation for the logistic regression model. Biometrika 92, 724–731. Bondell, H.D., 2008. A characteristic function approach to the biased sampling model, with application to robust logistic regression. Journal of Statistical Planning and Inference 138, 742–755. Cantoni, E., Ronchetti, E., 2001. Robust inference for generalized linear models. Journal of the American Statistical Association 96, 1022–1030. Carroll, R.J., Pederson, S., 1993. On robust estimation in the logistic regression model. Journal of the Royal Statistical Society B 55, 693–706. Christmann, A., 1994. Least median of weighted squares in logistic regression with large strata. Biometrika 81, 413–417. Croux, C., Haesbroeck, G., Joossens, K., 2008. Logistic discrimination using robust estimators: An influence function approach. The Canadian Journal of Statistics 36 (2), 157–174. Croux, C., Flandre, C., Haesbroeck, G., 2002. The breakdown behavior of the maximum likelihood estimator in the logistic regression model. Statistics and Probability Letters 60, 377–386. Croux, C., Haesbroeck, G., 2003. Implementing the Bianco and Yohai estimator for logistic regression. Computational Statististics and Data Analysis 44, 273–295. Finney, D.J., 1947. The estimation from individual records of the relationship between dose and quantal response. Biometrika 34, 320–334. Hauck, W.W., Donner, A., 1977. Wald’s test as applied to hypotheses in logit analysis. Journal of the American Statistical Association 72, 851–853. Heritier, S., Ronchetti, E., 1994. Robust bounded–influence tests in general parametric models. Journal of the American Statistical Association 89, 897–904. Künsch, H., Stefanski, L., Carroll, R., 1989. Conditionally unbiased bounded influence estimation in general regression models, with applications to generalized linear models. Journal of the American Statistical Association 84, 460–466. Markatou, M., He, X., 1994. Bounded influence, high breakdown point testing procedures in linear models. Journal of the American Statistical Association 89, 543–549. Morgenthaler, S., 1992. Least-absolute-deviations fits for generalized linear model. Biometrika 79, 747–754. Pregibon, D., 1981. Logistic regression diagnostics. Annals of Statistics 9, 705–724. Pregibon, D., 1982. Resistant fits for some commonly used logistic models with medical applications. Biometrics 38, 485–498. Rousseeuw, P.J., 1985. Multivariate estimation with high breakdown point. In: Grossmann, W., Pflug, G., Vincze, I., Wertz, W. (Eds.), Mathematical Statistics and Applications, vol. B. Reidel, Dordrecht, pp. 283–297. Stefanski, L., Carroll, R., Ruppert, D., 1986. Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika 73, 413–424.