G. S. Maddala and C.R. Rao, eds., Handbook of Statistics, Vol. 15 © 1997 ElsevierScienceB.V. All rights reserved
A
r"l"
Practical Applications of Bounded-Influence Tests
S. Heritier and M-P. Victoria-Feser
1. Introduction
In the last decade robust methods have been considerably developed but most of the research effort has focused on robust estimation. In the recent years more attention has been paid to robust testing since we cannot robustly estimate the parameters of a model and leave unchanged the usual procedures to test hypotheses about these parameters. Robust versions of classical likelihood ratio, Wald or score tests are now available in a general setting. They are more reliable than their classical counterparts, i.e. they are not influenced by small deviations from the underlying model, and can also be used as useful diagnostic tools to identify influential or outlying d a t a points. The purpose of the paper is to illustrate their performance and to show that they can be easily implemented in different practical situations. In particular, it will be shown that they can be used to robustly choose a model when the hypotheses are non-nested. That is when the model under the null hypothesis cannot be obtained as a particular or limiting case of the model under the alternative hypothesis. The approach we follow here is the approach based on the influence function. It is mainly concerned with local robustness properties of tests. We assume a parametric model and study the effects of departures from the model on the testing procedures. The reader is referred to Huber (1981) and Hampel, Ronchetti, Rousseeuw, and Stahel (1986) for the general theory. Two general references in robust testing which will be the main concern of this paper are Heritier and Ronchetti (1994) and Victoria-Feser (1997). The robust tests we propose in this paper are then constructed to withstand small amounts of contamination. A second step in the robustness analysis would be the study of their resistance to a higher level of contamination. This refers to the approach based on the breakdown point. This quantity was formally introduced for tests by He, Simpson, and (1990) and He (1991). Despite its interest, we will not follow this approach since high breakdown point testing procedures would require high breakdown estimators which are not yet available for general parametric models; see Markatou and He (1994).
77
78
S. Heritier and M-P. Victoria-Feser
The paper is organized as follows. In Section 2 we give an intuitive idea of the formal definitions of the tests we propose and briefly summarize the basic concepts and results in robust bounded-influence testing. Section 3 focuses on robust testing in generalized linear models (or GLIM) and emphasizes the use of robust tests in logistic regression. As typical examples we use the Food-Stamp data analysed first by Stefanski, Carroll, and Ruppert (1986) and the data introduced by Cormier, Magnan, and Morard (1993) in auditing. In both situations, we compare the classical and robust analysis and discuss the advantage of the latter. We perform a sensitivity analysis on the auditing data which shows the potential dangers of classical testing procedures and the good performance of the robust alternatives. To give a feeling for the resistance of the robust tests to higher levels of contamination we also present an empirical breakdown analysis on the FoodStamp data. In Section 4 we introduce a robust competitor to the model choice test introduced by Cox (1961, 1962). The key idea is to write the Cox statistic as a score test and to apply the previous methodology. We first show analytically that the Cox test for non-nested hypothesis is sensitive to small model contaminations and then propose a robust version of the test by applying the theory developed in Heritier and Ronchetti (1994). We illustrate the better performance of the robust version we propose by a simulation study. As we will see, the advantage of this new robust test is not only its resistance to data contamination but also its better convergence to the asymptotic distribution of the test statistic. Indeed, it is well known (see eg. Atkinson (1970)) that for small to moderate sample size the approximation of the exact distribution of the test statistic by its asymptotic distribution lacks accuracy. For robust model choice tests, simulation results show that this problem is overcome at least for moderate sample size. Section 5 concludes. In the Appendix, we discuss some computational aspects and the practical implementation of these robust testing procedures.
2. Robustness concepts and results in testing In this section we briefly summarize some results on robust testing in general parametric models which will be used in the next sections. The purpose in robust testing is twofold. First, the level of a test should be stable under small, arbitrary departures from the null hypothesis (robustness of validity). Secondly, the test should still have good power under small arbitrary departures from specified alternatives (robustness of efficiency). In this paper we present robust versions of the Wald and score tests based on generalizations of the maximum likelihood estimator (MLE), namely M-estimators for general parametric models (see definition below). They are the natural counterpart of the corresponding classical tests and they enjoy robustness of validity and robustness of efficiency. Instead of giving a formal definition of these tests let us first recall the ideas underlying the classical tests and explain their generalization in an intuitive way. Consider a general parametric model {Fo} where 0 is a p x 1 vector and a sample
Practical applications o f bounded-influence tests
79
Zl,ZZ,...,zn of n i.i.d, random vectors. Let f ( . ; 0 ) be the density of Fo and O l o g f ( z ; O ) / O O = s ( z , O ) the score function. Assume for the moment that p = dim(O) = 2, 0 = (01,02) t, and H0 : 02 =- 0 is the null hypothesis to be tested. Figure 1 plots the log-likelihood against 0 for a particular sample zl, z 2 , . . . , zn. The likelihood ratio test is based upon the vertical distance AB which measures the difference between the overall maximum and the maximum under the constraint 02 = 0 or in other words the change in the log-likelihood when the estimate is computed in the full and reduced models. The Wald test is based upon the horizontal distance OC corresponding to (the absolute value of) the estimate of 02 in the full model. The score (or Rao) test is based upon the slope of the log likelihood at E, more exactly on the distance from zero of the total score computed at the estimate of 0 in the reduced model. Any of these tests rejects H0 when the corresponding distance measured in a proper metric is sufficiently large. The classical tests have poor robustness properties since they are closely related to the M L E which is generally nonrobust. A natural way to robustify the Wald, score and likelihood ratio tests is to rely upon M-estimators instead. An M-estimator T, of 0 is defined as the solution of n •
r.)
= o,
i=l
-'i.
: ML E in the full model : M L E in the reduced model
0 .°.."" A" Fig. 1. Intuitive generalization of the classical tests
80
S. Heritier and M-P. Victoria-Feser
where ~k(z, 0) is a score-like function. The choice ~(z, 0) = s(z, O) gives the MLE. A generalization of the classical tests can be achieved by simply replacing log f(z; O) by a function p(z, 0), the score function by ~,(z, 0) which may be the derivative of p with respect to 0. The test statistics are then defined exactly the same way as in the classical case. More generally, we are interested in testing the null hypothesis that q (q < p) components of 0 are zero. Denote by at = (a~l), a~2)) the partition of a vector a into p - q and q components, the test problem is then H0 : 0 = 00 where 0o(2) = 0, 00(1) unspecified, against some specified alternative. The intuitive generalization of the classical tests carries over to testing such composite hypothesis. The formal definition of the new tests we propose can be found in Markatou and Ronchetti (1997), this volume, (see definitions (3.2), (3.3) and (3.6)). From now on we will use the same notation, e.g. we note W2, the generalized version of the Wald test based on T,, and R,, the score-type test based on a M-estimator in the reduced model. Heritier and Ronchetti (1994) showed under general conditions that the Waldand score-type tests have the same asymptotic distribution as their classical 2 under the null hypothesis and noncentral under a counterparts, i.e. a central )~q sequence of contiguous alternatives. Under an additional condition the asymptotic distribution of likelihood ratio-type tests is a linear combination of q independent X~. Since their asymptotic distribution is more complicated and derived under more stringent conditions, we confine the present discussion to Wald- and score-type tests. The local stability of these tests can be investigated by means of a (level) influence function. Roughly speaking, the idea is to study the behavior of the level of the test as a function of an additional observation at any point z. The same technique can be applied to the power. Robustness of validity and robustness of efficiency require that an arbitrary observation z will have only a bounded influence on the level and on the power of the test; see Rousseeuw and Ronchetti (1979, 1981), Hampel et al. (1986), Chapter 3, Heritier and Ronchetti (1994) and Markatou and Ronchetti (1997), this volume, for a review. This in turn implies that the function ~ defining g/2 and R] must be bounded. Note for instance that this is not the case for the classical tests in the logistic regression model (see Section 3) since the score function s(x,y;O)= ~logf(ylx; O)= [y-exp(xtO)/ (1 + exp(xtO))]x is unbounded in y and x. Therefore a single atypical observation may influence the p-values of the classical test and determine its decision. The sensitivity analysis in Section 3 and Section 4 illustrates this point. Within each class an optimally bounded-influence test can be obtained by maximizing the limit of the power of the test at the sequence of contiguous alternatives HI,~ : 0(2) ~- A/v/n; see Heritier (1993), p. 88-93. This optimal robust test statistic is generally difficult to compute. However, it can be easily obtained for specific problems, e.g. model choice testing (see Section 4). For most applications, a simpler test is recommended; see Heritier and Ronchetti (1994), p. 900. For a bound e (c > x/-q), it is based on the following if-function
Practical applications of bounded-influence tests
0) --
81
o),
0) -
(1)
where
we(z, 0) = min 1,
c ) _ a(0)])(2) (A (0)p(z, 0)
(2)
and the p × 1 vector a(O) and the lower triangular p × p matrix A(O) are determined implicitly by the equations
a(O) = f s(z, O)wc(z, O)aFo(z) f wc(z, O)dFo(z) At(O)A(O) =
[/
'
Is(z, 0) - a] [s(z, O) - a]twZ(z, O)dFo(z)
1-1
(3) (4)
Formula (1) shows that the robust test is based on a weighted score function. Each observation zi receives a weight w~(zi, O) which is determined automatically by the data and the method through (3) and (4). Since influential observations will typically receive small weights, these can be used as powerful diagnostic tools to identify influential points. The weights also depend on a tuning constant c which bounds the influence of an observation on the asymptotic level of the test. The choice c = ee gives the classical Wald or score test whereas other choices lead to robust counterparts (see also the discussion at the end of Section 3).
3. Robust testing in logistic regression In this section we focus on robust testing in logistic regression model. This model is a special case of generalized linear models (GLIM) popularized by McCullagh and Nelder FRS (1989). Different robust M-estimators in generalized linear models have already been proposed with application to logistic regression; see Stefanski et al. (1986), Kuensch et al. (1989) and Carroll and Pederson (1993). This provides further motivation to develop robust tests in this setting. As a typical model we consider a generalized linear model where the conditional density of YIX has the form
f(y[x; 0) = exp[(y - h(xtO))q(x'O) + c(y)] where h(.), q(.) and c(.) are known functions and 0 is a vector of regression parameters. We will also assume that the couple (X, Y) has a density of the form
9(x, y; O) = f(ylx; O)u(x)
(5)
where u(x) is the marginal density of the p-vector X. This model is sufficiently general to include many generalized linear models such as probit and logistic
82
S. Heritier and M-P. Victoria-Feser
models, Poisson regression or certains models for lifetime data. The score function s(x, y; 0) = ° l o g f(y]x; O) is generally unbounded, which means that classical estimation and testing procedures based on the maximum likelihood are not robust. Although what follows can be applied to any G L I M we will only consider the logistic regression model, a special case of (5) where Y is an indicator variable with
exp(xtO) P(Y = IIX = x ) -
l+exp(xt0)
3.1. Food-Stamp data As a benchmark example in logistic regression, we first consider the Food-Stamp data previously analysed by Stefanski et al. (1986) and Kuensch et al. (1989) in the framework of robust estimation. These data contain information on 150 randomly selected elderly citizens, 24 of whom participated in the federal FoodStamp program. The response Y indicates participation in the program and the explanatory variables selected for study are: tenancy (3(1), indicating home ownership, supplemental income (X2) and a logarithmic transformation of monthly income 0(3) [logl0(monthly income + 1)]. Previous analyses report that this data contains at least two outliers (cases 5 and 66) which ruin the MLE. This suggests that classical testing procedures could be greatly influenced by these two observations. To compare the performance of classical and robust tests we tested the composite hypothesis H0 : 02 = 03 = 0, i.e. the hypothesis that the income has no influence on the probablity of participation. We also considered single parametric hypotheses H0 : 02 = 0 or H0 : 03 = 0. Table 1 shows the p-values for different' robust Wald- and score-type tests which were computed by using the algorithm in the Appendix. The results clearly show that the classical Wald and score tests (c = oo) fail to reject the null hypothesis H0 : 02 = 03 = 0 because of the presence of the influential observations 5 and 66. In contrast, the robust score- and Wald-type tests (e = 3.5) have a similar performance and both reject the composite hypothesis at the usual 5% significance level (p-value less than .03). A quick inspection of the weights shows that case 5 is severely downweighted whereas case 66 receives only a moderate weight. This means that case 5 is really an influential observation for the classical testing procedures. The performance of the classical tests is even Table 1 p-values of different Wald- and score-type tests for Food-Stamp data Hypothesis
Class. Wald
Rob. Wald
Class. score
Rob. score
02 ~ 03 = 0 02 = 0 03 = 0
.07 .07 .22
.01 .13 .01
.09 .07 .19
.03 .14 .01
Practical applications of bounded-influence tests
83
worse for the single hypothesis H0 : 03 = 0. The tests are not significant (p-value greater t h a n . 19) and do not point out the importance of variable X3 in the model. In contrast, their robust alternatives (c = 2) clearly highlight the importance of income in the participation (p-value = .01). When testing H0 : 02 = 0, classical and robust tests give conflicting results. This is not very surprising since most of the significance of )(2 detected by the classical tests is due to these atypical observations. P-values obtained with the classical score and Wald tests are greater t h a n . 16 once observations 5 and 66 are removed.
3.2. Empirical breakdown analysis The robust bounded-influence tests used in this paper are constructed to resist small amounts of contamination. To investigate their resistance to higher levels of contamination, we present an empirical breakdown analysis. The breakdown point of a test describes its global reliability and gives the maximum amount of contamination which can be tolerated by the test; see He et al. (1990) and He (1991). The exact computation of the breakdown of the tests we propose is beyond the scope of this paper. To perform the breakdown analysis on the FoodStamp data we proceeded as follows. Since case 5 is influential, we added more observations as exact copies of case 5 and computed the p-values of the tests. As an illustration, Figure 2 shows the p-values of the robust score-type test for H0 : 03 = 0 and for different values of c as a function of the amount of contamination. The proportion of contamination is defined as the percentage of observations exactly identical to case 5 in the contaminated data. The plot shows that, for small values of ¢, the p-values of the robust score-type test is resistant up to 8-9% of corrupted data in the sample. For higher percentages of contamination the p-value increases dramatically and the test breaks down. This implies that one should be careful in using these tests when the expected amount of contamination is larger than 8-9%. A similar analysis performed for the Wald-type test gives sligthly worse results since the test breaks down earlier. This may be due to the fact that the Wald-type test is a quadratic form with a sensitive matrix which can break down even when the parameter estimate does not.
3.3. Auditing data As an interesting real example in logistic regression analysis, we consider the data analysed by Cormier et al. (1993) in auditing. The aim of this study was to discriminate between companies which could eventually face financial difficulties and healthy firms while providing auditors with some guidance in planning their analytical review strategy. A sample of 250 companies drawn from all non-financial Canadian corporations listed on the Montreal Exchange was used as original data. It is divided into two groups. One group is composed of 112 healthy companies (Y = 1) where a healthy firm is defined by a positive market-adjusted annual return. The other group consists of 138 companies which potentially face
S. Heritier and M-P. Victoria-Feser
84
Breakdown analysis for scores type tests
oo
¢5
q i
v
2
4
6
8
10
% contamination
Fig. 2. Breakdown analysis for score-type tests
financial difficulties (Y = 0) where financial difficulties are defined by a marketadjusted annual return lower than 50%. The cut-off used to build the sample (Ri - R m > 0; Ri - R m < - 5 0 % ) ensures that both groups are distinct and will exhibit different trends. For each company eight financial variables and seven qualitative variables were recorded as inherent risk indicators. These risk indicators serve as explanatory variables in the logistic regression model to predict the financial health of the companies. The eight financial ratios used in this study are: xl: Variation in adjusted returns, x2: Variation in accounts receivable less sales variation, x3: Variation in inventory less sales variation, x4: Variation in operating expenses less sales variation, xs: Variation in interest payments less sales variation, x6: Change in level of capital expenditures, XT: Variation in debt maturity, x8: Variation in payments to stockholders. To complete the information required to analyse of a company's inherent risk, seven dichotomic indicators of qualitative risk were added to the data
Practical applications of bounded-influence tests
85
X9: Change in related parties transactions, xlo: New industry investment, Xll: Change in the number of locations, x12: Implementation of a bonus scheme, x13: Implementation of a share plan, Xl4: Change in the control of the company, XlS: Change in the method of depreciation used. Finally, as some market-based studies indicated that size affects the financial health of a firm (see Boritz (1991)), a proxy measuring the relative size of a firm was incorporated in the model as a control variable. The proxy used is x161 Logarithm of sales deflated by the sample mean. The data covers the 1982-1988 period and was collected from the company's annual reports over a three-year period. For companies in potential financial difficulties, this three-year period preceeded the first signal of failure, namely a negative market adjusted stock return. Explanatory variables xl-x16 were computed as the average of two annual variations for quantitative variables and the summation of the two annual variations for qualitative variables. Further details about the collection of the data and the computation of the different risk indicators can be found in Cormier et al. (1993), Appendix 1 and Figure 2. Since the two dummy variables Xl0 and x15 are zero for most of the companies, we dropped them from the model to avoid conditioning problems in the computation. We also used the only 240 companies for which no change occured (Xl5 = 0) to neutralize the potential effect of the change of depreciation method. Furthermore, we fitted a model without intercept. This seems a reasonable modification because a financial succes (or failure) probability of .50 is then obtained if no variation occurs for the quantitative management indicators, the qualitative variables are set to zero and the proxy is null, i.e. a value close to the average of the standardized logsales. After a preliminary classical fit, we kept x~, x2, x3, x6, x7, x8, x9, xal, x16 in the model (we excluded variables that were not clearly significant, i.e. p > . 15). Table 2 presents the results of a classical analysis in this model. The signs of the coefficients correspond to the predicted signs and the variables x2, x7, xs, x9, xal and xl6 seem to be significant at the 10% level, which matches the analysis by Cormier et al. (1993). Variables x3 and x6 are close to significance, but surprisingly the variation of adjusted returns (xi) does not appear as really meaningful since the p-values of Wald and score tests are respectively 10% and 19%. This goes in the opposite direction of the previous analysis where Xl was one of the most significant variables (p-value = .001). At that stage, a validation of the fit is required; this can be done by a careful examination of different diagnostic tools provided by usual statistical packages. For instance, the inspection of the ouput of the SAS Logistic procedure gives helpful information. First, observations 18, 77, 215 have large hat matrix diagonal elements, which means that these observations are extreme points in the design space and might be (bad) leverage points. Secondly, confidence interval displacement diagnostics point out that
S. Heritier and M-P. Victoria-Feser
86
Table 2 C l a s s i c a l e s t i m a t i o n a n d testing f o r a u d i t i n g d a t a Variable
MLE
Pred. sign
C1. W a l d c = oo
p-value
CI. score c = oo
p-value
xl x2 x3 x6 x7 x8 x9 xll x16
4.96 .80 -.53 .31 -1.01 -1.02 -.66 -.50 1.13
+ + + +
2.65 2.74 2.32 2.58 16.28 15.86 3.05 5.23 22.49
.10 .10 .13 .11 .00 .00 .08 .02 .00
1.72 2.78 2.35 2.61 17.62 20.12 3.09 5.37 24.83
.19 .10 .13 .11 .00 .00 .08 .02 .00
cases 77 and 215 are influential on the MLE. Moreover, the sensitivity diagnostic on 01, the coefficient of xl, points out the important influence of observation 215. Finally, some deviance residuals seem to be rather high. This brief diagnostic analysis shows that some observations are clearly outlying points and therefore that the current analysis needs to be validated. For this purpose, we performed a robust analysis on the same model; the results are presented in Table 3. The robust estimator used here is based on a Cfunction with bound 7.5, i.e. 2.5 times the admissible lower bound on c. The robust estimates have the expected signs and they are very similar to the classical ones with the exception of 01 which is somewhat higher. A simple inspection of the weights indicates that the observations 77 and 215 are downw e i g h t e d (14'77 = .38 and w215 = .54), which means that these observations are detected as moderately influential on the fit. We found a more important difference when looking at the statistical significance of the different variables. The p-values of Wald- and score-type tests reported in Table 3 clearly show that the variation in adjusted returns (Xl) is significant (p-value = .04). A careful look at
Table 3 R o b u s t e s t i m a t i o n a n d testing f o r a u d i t i n g d a t a Variable
R o b . Est. c = 7.5
Pred. sign
Rob. Wald c = 3
p-value
Rob. score c = 3
p-value
xl x2 x3 x6 x7 x8 x9 xll x16
5.56 .83 .55 .29 .91 -1.14 -.57 -.47 1.07
+ + + -
4.02 1.04 2.18 3.03 12.55 15.62 2.73 5.18 21.94
.04 .30 .14 .08 .00 .00 .10 .02 .00
4.29 1.30 2.03 3.00 14.32 21.90 2.50 5.32 24.05
.04 .25 .15 .08 .00 .00 .11 .02 .00
+
Practical applications of bounded-influence tests
87
the weights indicates that the bounded-influence Wald-type test statistic gives a medium weight to a few observations including cases 12, 35, 77 and downweighs observation 215 (w215 = .19) more heavily. A similar downweighting scheme occurs for the robust score-type test statistic except that case 215 is moderately weighted (w215 = .54) and that case 190 now receives a low weight (wl90 = .18). This indicates that observations 215, 190 (score-type only), and to a lesser extent some other cases, are obviously influencing the classical test statistics and make the resulting p-values unreliable. Another difference with the classical testing procedures also appears. There is not so much evidence that the variation in accounts receivable (x2) is really a meaningful variable since it is no longer significant (p-value greater than .25). Furthermore, some minor differences in the critical probabilities corresponding to the change in level of capital expenditures (xr) and to the variation in related parties transactions (x9) also exist, but they do not seem to be of such a nature as to modify the previous explanation.
3.4. Sensitivity analysis To further illustrate the performance of robust methods and the potential danger of their classical analogues, we performed a sensitivity analysis on the previous data. Since the main difference between the two approaches essentially concerned the variation in adjusted returns (x~), we added two leverage points by modifying the value of xl for two observations. More precisely, we fixed xl = .85 for cases 8 and 35. This has the consequence of increasing the leverage of xl. The value of .85 is relatively high but still possible as indicated by the box plots in Figure 3. We then fitted the model by both classical and robust methods; see Tables 4 and 5. The results presented in Table 4 show that the influence of the 2 additional outliers is henceforth strong on the classical procedures. Now the classical estimate of 01 is negative, its sign differs from the predicted sign, and the p-values of the classical tests are greater than 80%. An explanation based on these results is misleading since an increase in xl tends to indicate financial problems. In contrast, the robust estimation of 0l still has the predicted positive sign and the p-values of the robust bounded influence score- and Wald-type tests still indicate the significance of the variation of adjusted returns; see Table 5. An examination of the weights resulting from the robust fit shows that observations 8 and 35 are heavily downweighted, w8 = w35 = . 10, and that a few other cases also receive a low weight, e.g. w77 .23 and w215 .37. This is not surprising since cases 77 and 215 were detected as influential in the previous analysis. Similarly, a quick inspection of the weights for the robust bounded-influenced tests reveals that the 2 additional outliers are severely downweighted, w8 = w35 = .06 for Wald-type and w8 = .08, w35 -- .10 for score-type, together with some other observations including cases 77 and 215. This explains why the classical test statistics are completely ruined by a small percentage of contamination (< 2%). The effects seem to be worse on the classical Wald or score tests than on the MLE. =
=
S. Heritier and M-P. Victoria-Feser
88 ,q.
#190
q
| : I
, I
!
i
t
i
*'
)
i
# 215 m
o,
•
#74
Group 1
Group 0
Fig. 3. Box plots of variable xl for the auditing data
To summarize the comparison between the two approaches, we can make the following concluding remarks. Firstly, it appears that a few observations, essentially cases 77 and 215 have important influence on the classical testing procedures and moderate influence on the maximum likelihood fit, e.g. on 01 which is underestimated. Thus an analysis based on the use of classical methods without further checking is misleading. This analysis could be even completely wrong in the presence of high leverage points, as the sensitivity analysis showed.
Table 4 Classical estimation and testing with 2 additional outliers in xl for the auditing data Variable
MLE
Pred. sign
C1. W a l d
p-value
C~OO
xl x2 x3 x6 x7 x8 x9 xll xl6
-.26 .85 -.60 .30 -1.03 -.92 -.73 -.50 1.18
+ + + +
0.06 3.12 2.98 2.52 17.04 14.13 3.66 5.19 24.25
C1. score
p-value
C~OO
.81 .08 .08 .11 .00 .00 .06 .02 .00
0.06 3.17 3.02 2.55 18.49 17.89 3.73 5.33 26.99
.81 .08 .08 .11 .00 .00 .05 .02 .00
Practical applications of bounded-influence tests
89
Table 5 Robust estimation and testing with 2 additional outliers in Xl for the auditing data Variable
Xl X2 X3 X6 X7 X8 x9 Xll Xl6
Rob. Est. c=5 4.64 .62 .55 .32 --.90 1.09 -.61 -.50 1.14
Pred. sign
Rob. Wald c=3
p-value
Rob. score c=3
p-value
+ + -+
3.60 1.21 2.88 3.01 13.82 14.08 3.61 5.06 23.78
.06 .27 .09 .08 .00 .00 .06 .02 .00
2.96 1.54 2.95 2.96 15.16 19.62 3.11 5.72 27.01
.08 .21 .09 .08 .00 .00 .08 .02 .00
-+
Secondly, the approach based on the robust methods is safer and also provides helpful diagnostic tools to detect influential or outlying observations by means of a simple inspection of the weights that results from a robust fit or test. In our opinion, the decision that an observation is influential or not on the estimation or testing procedure is a personal one based on the value on the weight function and common sense. However, a clear cut-off appears usually in practice between ordinary observations and the discordant ones. Thirdly, an asymptotic theory exists for robust estimation and tests in a neighborhood of the assumed distribution. This means that reliable asymptotic p-values (or asymptotic confidence intervals) can be correctly computed in presence of slight deviations from the assumed model. In contrast, the approach based on diagnostic, deletion of " b a d " observations, and refit via standard methods lacks a theory for inference and testing; the effects of case deletion upon the distribution is not well understood, even asymptotically, as pointed out by Stefanski et al. (1986). Finally, to use bounded-influence techniques, one has to fix the tuning constant c which controls the "degree of robustness". For the robust bounded-influence estimator we implemented, c is an upper bound on its (self-standardized) influence function; see Hampel et al. (1986), p. 244. The tuning constant therefore controls the worst asymptotic bias caused to the estimate by a small amount of contamination. The lower c is, the more robust but the less efficient is the estimator under the model. There is no clear procedure to choose c in an optimal way. The only attempt in this direction is due to Samarov (1985) but is restricted to the regression moclel. A possible strategy is to choose c to achieve a certain degree of efficiency under the model, typically 95%. Another possibility is to decrease c by step up to a reasonable value, say twice the lower bound (v~)" For the robust score or Wald tests, a similar strategy may be adapted. Theoretically, the approximation o f the level around the null hypothesis can also provide a way to fix c (see equation 3.8 in Markatou and Ronchetti (1997)). If one suspects a maximum amount of contamination, e0, and is ready to tolerate a relative error 6 on the level of the test e0, then an upper bound on c is given by
90
S. Heritier and M-P. Victoria-Feser
0~ 0 1/2
c _< /~e~
(6)
Simulations may otherwise help to find a reasonable value for the tuning constant but should be carried out for each particular problem. 4. Robust model choice tests
In this section we present a robust version of Cox-type test statistics for the choice between two non-nested hypotheses. We first show that the influence of small amounts of contamination in the data on the test decision can be very large. Secondly, we build a robust test statistic by using the results on robust parametric tests and show that its level is stable. We illustrate the good robustness properties of the new test numerically. In general, it is assumed that under the null hypothesis H0 the model is F ° (with density f0(.;~)) and that under the alternative H1 the model is F~ (with density fl(.;fl)), where ~ and fl are parameter vectors. The hypotheses are non-nested in that F~ (F °) cannot be obtained as a special or limiting case of F ° (F~). Let L0(z; 4) = log f°(z; 4) and Ll(z;/~) = l o g J 1(z;/~) be the (maximum values of the) log-likelihood functions, where 4 and fl are the corresponding M L E and define L(z; 4,/~) = L0(z; 4) - Ll(Z;/~). Cox (1961, 1962), proposed the following test statistic
Ucox = .-' EL(z,; 4,
- f L(z; 4, fla) f°(z; 4)dz
(7)
where ~ stands for ~i~1 and fl~ is the pseudo MLE defined as the solution in fl of f O/Off log f l (x; fl)f0 (x; 4)dx = 0. T w o straightforward modifications of Ucox have been proposed by Atkinson (1970) (fl is replaced by fl~) and by White (1982) (fla is replaced by fl). In these three cases, the asymptotic distribution of v/~Ucox is the normal distribution with mean 0 and variance V(F °) = ElL 2] -ElL] 2 E[(s°)tL]{E[(s°)ts °]}-IE[s°L], where L = L(z; ~, fl~), s o = s o(z; ~) = 0/0c~ log f0 (z; ~) and E[.] is the expectation with respect to F °, the argument of V. In practice one needs a consistent estimator of V(F°), e.g. when ~ is replaced by 4. In the last decade, several other Cox-type statistics have been developed mainly in order to simplify the procedure when dealing with particular models like normal regression models (see among others Davidson and MacKinnon (1981), Fisher and McAleer (1981) MacKinnon, White, and Davidson (1983), Gourieroux, Monfort, Trognon (1983)). These 'Cox-type' statistics are actually parametric tests based on an artificial compound model in which the models under the null hypothesis and under the alternative hypothesis are represented. The Cox statistic can as well be seen as a Lagrange Multiplier or score test based on a compound model (see Atkinson (1970), Breusch and Pagan (1980) and Dastoor (1985)). If we construct the comprehensive model
Practical applications of bounded-influence tests
91
fC(z;O)=
(8) where 0 = (~, 2) t, then the score test statistic corresponding to the hypothesis H0 : 2 --- 1 against the alternative//1 : 2 ~ 1 leads to the Cox, Atkinson or White statistic, depending on the choice for the estimator of ft. Note that we could reparametrize the problem by defining ? = 2 - 1, H 0 : 7 = 0, H i : ? ~ 0. This would lead to the same results. Although it is widely accepted that these statistics are very useful, they have been often criticized for several reasons. The most studied one is the lack of accuracy of the approximation of the exact (sample based) distribution of the statistic by its asymptotic distribution (see e.g. Atkinson (1970), Williams (1970), Godfrey and Pesaran (1983) and Loh (1985)). Another (less studied) reason but at least as important is the lack of robustness of Cox-type statistics. Aguirre-Torres and Gallant (1983) propose a generalization of the Cox statistic based on Mestimators for the parameters. The same idea can be found in Hampel et al. (1986), Chapter 7. However, they leave open the question of the choice of the pfunction defining the M-estimators. Our aim is to propose a robust procedure based on optimal bounded-influence parametric tests developed recently by Heritier and Ronchetti (1994). We use the level influence function (LIF) to show that the Cox- type tests are not robust. This evidence is also tested numerically through simulations. With the robust version of the test, we will see that the new procedure is not only robust to small model deviations or contaminations but also that, for at least the chosen particular cases, the asymptotic distribution of the robust test statistic is a better approximation of its exact distribution than in the classical case.
4.1. Robustness properties o f Cox-type statistics We first illustrate numerically, through one simulated example, the non-robustness properties of Cox-type statistics. We consider here the quantal responses problem treated among others by Cox (1962), Atkinson (1970) and Loh (1985). At k levels of a variable xi, called the dose level, n i experiments are performed. The number Yi of successes is distributed binomially with index ni and probability n/° under H0 and 7z] under H1. The purpose of the experiment is to determine the relationship between the dose level and the parameter of the binomial distribution. The two common models are the one- and two-hit models, defined respectively by n o = 1 - e -~i and ~ = 1 - e -llxi -flXi e-flxi. We chose five dose levels Xl = 0.5, x2 = 1, x3 = 2, x4 = 4 and x5 = 8 (see Cox (1962)), and for each of them we simulated 30 binary data with probability of success n °, with parameter ct = 2. We computed the M L E by means of a Newton-Raphson iteration (see Thomas (1972)). We computed the (standardized) White statistic and found it was equal to
S. Heritier and M-P. Victoria-Feser
92
0.8677 with corresponding p-value of 19.3% (with ~ -- 2.18 and/~ = 3.99) leading then to the acceptance of H0. But what happens if some data are changed (the Bernouilli trial is changed to the value of 0 when it is equal to 1, or to 1 when it is equal to 0)? Intuitively, by looking at (7) one can see that unfortunately the classical tests are not robust, because the M L E of ~ is not robust and ~L(zi; c~,ft) can be determined by only one extreme observation. To show this, we changed the value of two binary data points (two corresponding to the level x4 -- 4) and again computed the (standardized) White statistic. As the amount of contamination is very small 1, one would expect the decision not to be influenced by it. However, this time we found a value of 4.6311 with corresponding p-value of less than 5 . 1 0 - 4 % (with c2 = 1.63 and/~ = 2.98) leading to the rejection of H0. This result is not surprising when one computes the LIF of the Cox test when evaluated at the contamination point z. It is given by (see Victoria-Feser (1997))
-2 f~yd~(Y)" (jL(x;',ft.)dF°(x) L(z; o¢,ft.) + j L(x; ~¢,ft.)s°(x; .)dF°(x) • IF(z, &,F~) }
(9)
where IF(z; &,F~) is the influence function of the estimator ~. When ~ is the MLE, the influence function is proportional to the score function s°(z; ~). By looking at the LIF we can see that a single observation z such that L(z;~,ft~) =- log f°(z; c~) - l o g f 1(z; ft~) or s°(z; c~) is large can make the bias on the asymptotic level very large. Indeed, the non-robustness of the test, i.e. the bias on the asymptotic level, is due simultaneously to • the non-robustness of the parameter estimator, • the non-robustness of the test statistic. While s o (z; ~) equals, up to a multiplicative constant, the influence function of the M L E of the parameter under the null hypothesis, L(z; ~, ft~) is directly related to the influence on the test statistic. Therefore, it is not sufficient to base a test on robust estimators for the parameters only. Indeed, a robust estimator for guaranties a bounded value for IF*(z, ~, F °) but not for L(z; ~, ft~). F o r example, if we want to test the G a m m a (F~,~2, ~1 is the shape parameter and ~2 is the scale parameter) against the Lognormal (FB, &, ftl = # and ft2 = O'2), the difference between the log-likelihood functions evaluated at any point z is given, up to a constant, by
(.,
LP2
which can be large when z is large. lit can be argued that although only two data points are contaminated, the chosen ones correspond to a levelwere the probability of success is 0.99967, so that we expect that a change of value from 1 to 0 will have a large influence in the estimates and therefore the test statistic.
Practical applications of bounded-influence tests
93
4.2. Robust Cox-type statistics In this subsection we apply the results of Heritier and Ronchetti (1994) to Coxtype statistics when they are interpreted as a score test. If we consider the compound model (8), under H0 the score function is given by
sO(z;8)
= O l o g f f ( z ; 8) ~=1= [sC(z; 8)(1)]
L:(z; o){2) J
where
sC(z; 8)(1 ) =
logiC(z; 8) 2=1 = GO logf°(z; ~) = s°(z;~)
and •
sO(z; 8)(2) : SCox(Z; a, fl)
The optimal robust score test statistic is given in Heritier (1993). Applying this general result to our case, we get the following optimal ~ function
[ ~/7' (Z; 0~, fl) =
A('l)S°(Z; a)
[21(21)S0 (Z; a) -{- A(22)[SCox(Z; 0~, fl) - a(2)l] w°pt(z;a, fl)
where w°Pt(z;a, fl) = min{1;c. [A(21)s°(z;00 ~- A(22)[XCox(Z; o~,fl) -- a(2)][-'} 0 = (a, 2) t is dropped in a(O) and A(O) for simplicity. The robust Cox-type statistic is finally given by 1
] and
"
U = ~ .i~1 [A(21)s°(zi; ~) -J- 2t(22)[SCox(gi; ~, fl) - a(2)]]wyt(zi; a, fl)
(10)
where ~ is the MLE of a, the vector A(21) (1 × dim(a)) and the scalars a(2) and A(22) are determined implicitly by (11), (12) and (13) (see the Appendix). For consistent estimators/~ of fl, i.e./~(F °) = fl~, the asymptotic distribution of the robust Coxtype test statistic URC = x/~U is the standard normal, see Victoria-Feser (1997). An algorithm to find the test statistic is given in the Appendix. Knowing that the bias on the asymptotic level (see (9)) is proportional to s°(z; a) and to L(z; a, fl), we see that by using the robust version of the score test with the comprehensive model (8), we bound exactly the right quantity. Therefore, the use of (10) prevents the decision to be influenced by a small amount of outliers.
4.3. Simulation study In order to study the robustness properties of URC,we compared it to the classical Cox-type statistics for contaminated and non-contaminated samples. We choose to simulate Pareto samples and test the Pareto distribution against the exponential
94
S. Heritier and M-P. Victoria-Feser
distribution by means of the Atkinson statistic. The Pareto density is given by f°(z; ~) = ~z-(~+l)z~ with 0 < z0 < z < co, so that as an alternative we considered the truncated exponential distribution given by f l ( z ; t ) = fie -~(z-z°). These distributions are often used in describing the distribution of personal income (see Victoria-Feser and Ronchetti (1994)). We simulated 1000 samples of 200 observations from a Pareto distribution with parameter a -- 3.0 (z0 = 0.5) and contaminated the samples by means of (1 - e- 200 -1/2) F~¢o + ~. 200-1/2Fct,lO.zo . For amounts of contaminations from e = 0% to ~ --- 20%, Table 6 gives the actual levels of the classical and robust (c = 2.0) Atkinson statistic when testing the Pareto against the Exponential distribution. The actual levels are the probabilities (estimated by the frequency) that the test statistic computed from the simulated samples exceeds the critical value at the fixed nominal level. We can observe that the classical statistic has very strange behaviour since under no contamination the null hypothesis is underrejected and even with small amounts of contamination, the null hypothesis is overrejected. The first phenomenon is probably due to the fact that the approximation of the actual distribution of the Cox-type statistics by means of their asymptotic distribution is not accurate (see e.g. Williams (1970), Atkinson (1970) and Loh (1985)). The second phenomenon is the lack of robustness. On the other hand, we find that with the robustified Atkinson statistic, not only the asymptotic distribution is a good approximation of its exact distribution, but also that the small departures from the model under the null hypothesis do not influence the level of the test at least for amounts of contamination up to about e = 10%. With more contamination (15% and 20%), the null hypothesis tends to be slightly overrejected at the 5% and 10% levels, but this is not to drastic compared to the classical case. In other words, the robust test is very stable. The fact that the level of the robust test is not influenced very much by contamination is due to the structure of the test itself (see (10)). However, that the asymptotic distribution of the robust test statistic is a good approximation of its sample distribution (as compared to the classical test) can seem at first rather
Table 6 Actual levels (in %) of the classical and robust Atkinson statistic (c = 2.0) with contamination (Pareto against Exponential) Amount of contamination
0% 3% 6% 10% 15°,/o 20%
Classical statistic
Robust statistic
Nominal levels (in %) 1% 3% 5%
10%
Nominal levels (in %) 1% 3% 5%
10%
2.1 6.3 13.1 24.4 35.6 46.3
5.2 14.7 27.6 43.9 58.1 67.1
1.3 1.2 1.4 1.3 1.4 0.9
10.2 10.3 10.7 11.4 14.5 14.5
3.1 8.7 18.5 31.3 44.6 54.2
3.5 10.3 22.5 35.2 49.9 58.6
3.5 3.3 3.6 3.0 4.1 4.1
5.5 5.1 5.4 5.6 7.9 7.6
Practical applications of bounded-influence tests
95
surprising. This can be understood intuitively by remembering the probable causes of the problems in the classic test: Atkinson (1970) remarked that some rather small (legitimate) observations have a large influence on the value of the test statistic because one often takes their logarithm. With robust techniques the influence of such 'extreme but legitimate' observations is bounded, such that the null hypothesis is not under- or overrejected. As a second example, we computed the robust White statistic for the quantal responses model presented above. With the non contaminated sample, the robust test statistic (c = 2.0) has a value of 0.24 corresponding to a p-value of 40.5% leading to the acceptance of H0. With the contaminated sample, the robust test statistic (c = 2.0) has a value of 0.81 corresponding to a p-value of 20.9% leading again and contrary to the classical White statistic to the acceptance of H0. Moreover, with the robust test statistic we can look at the weights given by the robust statistic to the observations such that we can immediately point out the extreme observations.
5. Conclusion
In this paper we presented bounded-influence Wald- and score-type tests in general parametric models and illustrated their performance on significant examples. We showed that they are safer than their classical analogues in the presence of small deviations from the assumed model and give comparable results when the model is correctly specified. Moreover they provide a helpful diagnostic tool to detect influential or outlying observations by means of a simple inspection of the resulting weights. Special attention was paid to robust testing in generalized linear models and to model choice tests for non-nested hypotheses. We confined our discussion to testing problems in logistic regression but the same methodology can be applied to other models. We then focused on model choice procedures for separate models. We showed that the classical Cox-type statistics not only suffer from a lack of robustness but also their asymptotic distribution is not always an accurate approximation of the exact distribution. We therefore proposed an optimal robust version of Cox-type statistics based on robust parametric tests for general parametric models. In particular we showed that small amounts of contamination in the observations have a limited influence on the new test. We illustrated this result by means of a simulation study and found out that the asymptotic distribution of the robust test statistic is a more accurate approximation of its exact distribution than in the classical case. Finally, we would like to mention some limitations of these robust testing procedures. First, bounded-influence testing has been developed in the i.i.d, setting. Extensions to more general situations still need to be developed. Secondly, the general bounded-influence M-estimators upon which we based the robust tests do not have a high breakdown point especially when the dimension of the parameter is large. This implies that the robust procedures we proposed will break down when a cluster of outliers is present in the data. Further research to develop
s. Heritier and M-P. Victoria-Feser
96
high breakdown tests is desired even if we believe that small deviations from the assumed models are more relevant for inference, as pointed out by He et al. (1990) p. 447. However, these limitations should not mask the advantages of such methods in many practical situations. Appendix
In this appendix, we present two algorithms: one for finding the robust test statistic presented in Section 2, and the other for computing the optimal robust test statistic for model choice presented in Section 4. Robust bounded-influenced tests are closely connected to M-estimators usually obtained iteratively via a Newton-Raphson-type algorithm. For a given sample (zl, ...,z,), an M-estimator is defined as the solution for 0 of the equation 1 "
y ~ q,(z,, 0) = 0 i:"~ l
The usual Newton-Raphson increment at the current value Ok is AOk =
-
1 _~
D(q,(z,,
Oh))
1
,=1
!~(zi, Ok)
where D(0) is the Jacobian matrix of 0. The computation of AOk requires the average over the sample of the derivative matrix {OO(zi, Ok)/O0}. If we approximate this average by the integral over the tentatively estimated distribution, we get
AOk = -
O(z, Ok)dFo,(z
• = O(zi, Oh)
n ,=1
for Fisher-consistent M-estimators. This step can also be viewed as direct generalization of the scoring method by replacing the score function by ~. In the case of the robust tests (or estimators) we propose, 0(z, 0) is a weighted score function given by (1) and (2). The weight function we(z, O) depends on a centering vector a needed for consistency and a standardization matrix A which in turn depend on 0 implicitly. An inner loop is then necessary to compute a and A. The complete algorithm for robust bounded-influence Wald-type test statistics is as follows.
Algorithm for the computation of robust bounded-influence Wald-type test statistics Step 1 Fix a precision threshold q < 0, an initial starting point for the para0, and initial values a = 0 and A = [JW2(O)]-t, where J(O)= fs(x, O)s(x, O)tdFo(x) is the Fisher information matrix. The matrix A is chosen to
meter
be lower triangular.
Practicalapplicationsof bounded-influencetests
97
Step 2 Solve equations (3) and (4) with respect to a and A in an iterative way by using as starting values the current values of 0, a and A. Step 3 Compute AO = M -1¼ ~-~=1 [s(zi, O) - a]Wc(Zi, 0), where M = f[s(x, O) - a] Is(x, O) - a]twc(x, O)dFo(x) and wc is given by (2). Step 4 If [IAO]I > t/, then 0 +-- O+AO and return to Step 2, else go to Step 5. Step 5 Compute W2 given by (3.2) in Markatou and Ronchetti (1996) (this volume), where Tn is the value of 0 obtained in Step 4. The computation of robust bounded-influence score-type test statistics follows the same algorithm with 0(2) = 0, M -1 replaced by
in Step 3 and R~ given by formula (3.3) in Markatou and Ronchetti (1996) instead of W~ in Step 5. Robust bounded-influenced estimators can be obtained similarly along steps 14 with the only modification that subscript (2) must be dropped in formula (2) defining the weight. Notice that different weights are therefore obtained if we compute a robust test and a robust estimator. This requires two separate uses of this algorithm. In our particular application to logistic regression we chose the empirical distribution as the distribution of x. Since the response y is dichotomic the integrals in (3) and (4) are replaced by sums which simplifies the overall computation. The algorithm implemented in GAUSS, release 3.0, converged quickly for reasonable starting points and a decent choice of the tuning constant c. In the case of robust model choice tests, we propose to use optimal tests. The algorithm is very much simplified because of the structure of the problem. Indeed, for a given/~ it is given by the following 4 steps.
Algorithm for the computation of optimal robust bounded-influence model choice test statistics Step 1: Compute the M L E for ~ and let 0 = (~, 1) t. Step 2." Solve for A(21), A(22) and a(2), the following implicit equations
A(zl/E [s°(x; a)w°Pt(x;~,/~)]+ A(22/E [S~o~(X;~, ~)wy'(x; A(22)a(z)E [w°cpt(x; a,/~)] = o
(11)
A/2~/E[s°(x;a)s°(x;~)~wT'(x; a,/~)]+ A(22/e[s°(x;~)~SCox(X;~, ~)wT'(x; a,~)]A (22)a(2)E Is0 (x; a) Tw°Pt(x; ~, ~)] = o
(12)
S. Heritier and M-P. Victoria-Feser
98
A(21)E[s°(x; ~)s°(x; ~)Tw°Pt(x; ~, ]~)2] A~21)-[-
2A(z, )A(zz)E [s° (x; £~)Scox(X;£t, fi)w~Pt(x; ~, 1~)2]_ 2A(zl)A(22)a(2)EIsO(x; ~)wOpt(x; ~, /~)2]+
[Soxl; 2A(222)a(2)E[scox(X; 3~,~)wOpt(x; ~, /~)2]
-
÷
2 2 [wopt A(22)a(2)E e (x; ~, /~)2] =
1
(13)
where A(12) = a(1) = 0.
Step 3: Compute U given in (10), with the values of A(2l), A(22) and a(2) computed in Step 2. where the expectations are taken at Fa. One must stress that the second step is not straightforward since one has to solve a complicated nonlinear system in a(2), A(2I) and A(22). We propose to use an iterative process combined with a classical routine to find the zero roots of a system of nonlinear equations. The typical iteration is: given values for a(2), A(21) and A(22), compute the weights and the expectations and then, given these expectations solve for a(2), A(21) and A(22) equations (11), (12) and (13). As starting values, one could choose a = 0 and A(:I), A(22) such that A-1A -r : fsC(x; O)sC(x; 0 ) r f ° ( x ; £~)dx, since they are the solutions when c = ~ . For routine applications, a user-friendly software still has to be developed. A first step-in this direction has been done since Marazzi, Joss, and Randriamiharisoa (1993) made available in S-PLUS some algorithms, routines and functions previously developed in ROBETH. This software is a sytematized collection of numerical algorithms that allow the computation of different ROBust estimators and tests. Most of these methods were originated at the ETH Zurich, hence the acronym. These computational procedures include M-estimates for discrete generalized linear models, especially Kuensch et al. (1989) proposal in the logistic regression model (see Chapter 10 in Marazzi et al. (1993)). They do not include the robust tests we proposed with the exception of the likelihood ratio-type tests or z-tests developed by Ronchetti (1982) for the linear regression model. Robust score- or Wald-type tests based on the robust estimates provided by ROBETH routines in S-PLUS can be easily implemented and constitute simpler alternatives. References Aguirre-Torres, V. and A. R. Gallant (1983). The null and non-null asymptotic distribution of the Cox test for multivariate nonlinear regression: Alternatives and a new distribution-free Cox test. J. Econometrics 21, 5-33.
Practical applications o f bounded-influence tests
99
Atkinson, A. C. (1970). A method for discriminating between models. J. Roy. Statist. Soc., Serie B 32, 323-353. Boritz, J. E. (1991). The going concern assumption: Accounting and auditing implications. Cica research report, Institute of Chartered Accountants, Torronto. Breusch, T. S. and A. R. Pagan (1980). The Lagrange multiplier test and its application to model specification in econometrics. Rev. Econom. Stud. 47, 239-253. Carroll, R. J. and S. Pederson (1993). On robustness in the logistic regression model. J. Roy. Statist. Soc., Serie B 55, 693-706. Cormier, D., M. Magnan and B. Morard (1993). An evaluation of the going concern assumption in an auditing context: Some empirical evidence. J. Account. Finance. Cox, D. R. (1961). Tests of separate families of hypotheses. In: Proceedings o f the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1, Berkeley, pp. 105 123. University of California Press. Cox, D. R. (1962). Further results on tests of separate families of hypotheses. J. Roy. Statist. Soc., Serie B 24, 40~424. Dastoor, N. K. (1985). A classical approach to Cox's test for non-nested hypotheses. J. Econometrics 27, 363-370. Davidson, R. and J. G. MacKinnon (1981). Several tests for model specification in presence of alternative hypotheses. Econometrica 49, 781-793. Fisher, G. R. and M. McAleer (1981). Alternative procedures and associated tests of significance for non-nested hypotheses. J. Econometrics 16, 103-119. Godfrey, L. G. and M. H. Pesaran (1983). Tests of non-nested regression models: Small sample adjustments and Monte Carlo evidence. J. Econometrics 21, 133 154. Gouri6roux, C., A. Monfort and A. Trognon (1983). Testing nested or non-nested hypotheses. J. Econometrics 21, 83-115. Hampel, F. R., E. M. Ronchetti, P. J. Rousseeuw and W. A. Stahel (1986). Robust Statistics: The Approach Based on Influence Functions. New York: John Wiley. He, X. (1991). A local breakdown property of robust tests in linear regression. J. Multivar. Anal. 38, 294~305. He, X., D. G. Simpson and S. L. Portnoy (1990). Breakdown robustness of tests. J. Amer. Statist. Assoc. 85, 446~52. Heritier, S. (1993). Contribution to Robustness in Nonlinear Models. Application to Economic Data. Ph. D. thesis, University of Geneva, Switzerland. Thesis no 387. Heritier, S. and E. Ronchetti (1994). Robust bounded-influence tests in general parametric models. J. the Amer. Statist. Assoc. 89, 897 904. Huber, P. J. (1981). Robust Statistics. New York: John Wiley. Kuensch, H. R., L. A. Stefanski and R. J. Carroll (1989). Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. Amer. Statist. Assoc. 84, 460~66. Loh, W.-Y. (1985). A new method for testing separate families of hypotheses. J. Amer. Statist. Assoc. 80, 362-368. MacKinnon, J. G., H. White and R. Davidson (1983). Tests for model specification in the presence of alternative hypotheses: Some further results. J. Econometrics 21, 53-70. Marazzi, A., J. Joss, and A. Randriamiharisoa (1993). Algorithms, Routines and S-Functions for Robust Statistics. Belmont, California: Wadsworth and Brooks/Cole. Markatou, M. and X. He (1994). Bounded influence and high breakdown point testing procedures in linear models. J. Amer. Statist. Assoc. 89, 187-190. Markatou, M. and E. Ronchetti (1997). Robust inference: The approach based on influence functions. In: G.S. Maddala and C.R. Rao ed., Handbook o f Statistics 1Iol 15: Robust Inference, 49-75. McCullagh, P. and J. A. Nelder FRS (1989). Generalized Linear Models. London: Chapman and Hall. Second edition. Ronchetti, E. (1982). Robust Testing in Linear Models." The Infinitesimal Approach. Ph.D. thesis, ETH, Zurich, Switzerland.
100
S. Heritier and M-P. Victoria-Feser
Rousseeuw, P. J. and E. Ronchetti (1979). The influence curve for tests. Research Report 21, ETH Ztirich, Switzerland. Rousseeuw, P. J. and E. Ronchetti (1981). Influence curves for general statistics. J. Comput. Appl. Math. 7, 161 166. Samarov, A. M. (1985). Bounded influence regression via local minimax mean squared error. 9". Amer. Statist. Assoc. 80, 1032-1040. Stefanski, L. A., R. J. Carroll and D. Ruppert (1986). Optimally bounded score functions for generalized linear models with application to logistic regression. Biometrika 73, 413-424. Thomas, D. G. (1972). Tests of fit for a one-hit vs. two-hit curve. Appl. Statist. 21, 103-112. Victoria-Feser, M.-P. (1997). Robust model choice test for non-nested hypothesis. J. Roy. Statist. Soc., Serie B, to appear. Victoria-Feser, M.-P. and E. Ronchetti (1994). Robust methods for personal income distribution models. Canad. J. Statist. 22, 247-258. White, H. (1982). Regularity conditions for Cox's test of non-nested hypotheses. J. Econometrics 19, 301-318. Williams, D. A. (1970). Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. J. Biometrics 28, 23-32.