Author’s Accepted Manuscript A robust regression method based on exponentialtype kernel functions Francisco de A.T. De Carvalho, Eufrásio de A. Lima Neto, Marcelo R.P. Ferreira www.elsevier.com/locate/neucom
PII: DOI: Reference:
S0925-2312(16)31550-8 http://dx.doi.org/10.1016/j.neucom.2016.12.035 NEUCOM17853
To appear in: Neurocomputing Received date: 11 March 2016 Revised date: 26 August 2016 Accepted date: 13 December 2016 Cite this article as: Francisco de A.T. De Carvalho, Eufrásio de A. Lima Neto and Marcelo R.P. Ferreira, A robust regression method based on exponentialtype kernel functions, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2016.12.035 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
A robust regression method based on exponential-type kernel functions Francisco de A.T. De Carvalho1 Centro de Informatica, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes s/n - Cidade Universitaria, CEP 50740-560, Recife-PE, Brazil
Eufr´asio de A. Lima Neto and Marcelo R. P. Ferreira2 Universidade Federal da Para´ıba, Departamento de Estat´ıstica, Cidade Universit´aria, 58051-900, Jo˜ao Pessoa, PB, Brazil
Abstract Robust regression methods appear commonly in practical situations due the presence of outliers. In this paper we propose a robust regression method that penalize bad fitted observations (outliers) through the use of exponential-type kernel functions in the parameter estimator iterative process. Thus, the weights given to each observation are updated at each iteration in order to optimize a suitable objective function. The convergence of the parameter estimation algorithm is guaranteed with a low computational cost. Its performance is sensitive to the choice of the initial values for the vector of parameters of the regression model as well as to the width hyper-parameter estimator of the kernel functions. A simulation study with synthetic data sets revealed that some width hyper-parameter estimators can improve the performance of the proposed approach and that the ordinary least squares (OLS) method is a suitable choice for the initial values for the vector of parameters of the proposed regression method. A comparative study between the proposed method against some classical robust approaches (WLS, M-Estimator, MM-Estimator and L1 regression) and the OLS method is also considered. The performance of these 1 Corresponding
Author:
[email protected] [email protected]
2
[email protected],
Preprint submitted to Neurocomputing
December 14, 2016
methods are evaluated based on the bias and mean squared error (MSE) of the parameter estimates, considering synthetic data sets with X-space outliers, Y-space outliers and leverage points, different sample sizes and percentage of outliers in a Monte Carlo framework. The results suggest that the proposed approach presents a competitive performance (or best) in outliers scenarios that are comparable to those found in real problems. The proposed method also exhibits a similar performance to the OLS method when no outliers are considered and about half of the computational time if compared with MM-Estimator method. Applications to real data sets corroborates the usefulness of the proposed method. Keywords: Exponential-type kernel functions, Regression models, Robust methods, Width hyper-parameter estimators
1
1. Introduction
2
Robust regression attempts to cope with outliers and leverage points. The
3
regression outliers are observations deviating from the linear model pattern of
4
the majority of the data, whereas leverage points are outlying in the space of
5
the predictor variables. Observations which are regression outliers and lever-
6
age points are called bad leverage points and a non-robust analysis of these
7
points typically leads to biased inferences.
8
The classical linear regression model can be presented in matrix notation as
9
y = Xβ + = µ + ε, where y is an n × 1 vector of the response variables, X
10
is a n × p matrix of explanatory variables, β is a p × 1 vector of regression pa-
11
rameters, µ = Xβ is a n × 1 vector representing the mean of y and ε is an n × 1
12
vector containing the error terms with zero mean and unknown variance σ2 .
13
The ordinary least squares (OLS) estimator optimizes the regression parame-
14
ters β by making the sum of squared errors (SSE) as small as possible. The
15
OLS approach provides the optimal estimates of (β, σ2 ) when the error term is
16
normally distributed. However, it can behave badly when the error distribu-
17
tion is not normal, particularly when the errors are heavy-tailed. This is due to 2
18
sensitivity of the OLS method to residuals with large magnitude. In order to
19
overcome this issue, robust methods are a suitable alternative because they are
20
less sensitive to outliers.
21
With this in mind, standard approaches to robust regression have been pro-
22
posed over the last decades, such as the M-estimation [1, 2], MM-estimation [3]
23
and some methods based on minimizing an objective function different from
24
the sum of squared residuals, such as least absolute deviation (L1 regression),
25
which gained prominence after the work of Ref. [4]. Even for these models,
26
pronounced outliers can still have a considerable impact on the model, moti-
27
vating research into even more robust approaches. We recommend the papers
28
[5] and [6], which include outlier detection, to the readers interested in sur-
29
veys about these standard techniques. For further information about papers
30
in the area, see the most recent works by [7], [8] and [9]. Note that the major
31
part of the contributions that deal with robust methods lies in re-weighting the
32
outliers according to some known function.
33
On the other hand, estimation and learning methods using positive defi-
34
nite kernels have been proposed and become rather popular, particularly in
35
the statistical learning community over the last years. In particular, the idea of
36
using exponential-type kernel functions to measure the similarity between two
37
objects have been successfully applied in a wide range of fields, for instance,
38
computer vision, signal processing, clustering, pattern classification and bioin-
39
formatics and there is an extensive literature on the family of kernel-based al-
40
gorithms. For instance, see the papers [10], [11], [12], [13] and the references
41
contained therein.
42
Inspired from the kernel framework, this paper proposes a regression me-
43
thod, hereafter named ETKRR (Exponential-Type Kernel function based Ro-
44
bust Regression), that re-weights the outliers observations based on exponential-
45
type kernel functions, in such a way that the weights assigned to outlier obser-
46
vations are as small as possible in a iterative process in order to minimize a
47
suitable objective function. The motivation behind the choice of a particular
48
kernel can be very intuitive and straightforward depending on what kind of 3
49
information we are expecting to extract about the data. The ETKRR method
50
allows to use ”exponential-type” kernel functions that penalize, in different
51
ways, bad fitted observations. The weighting is done by calculating the dissim-
52
ilarities between the observed and predicted values of the response variable
53
and is updated at each iteration in order to optimize the objective function.
54
Different methods to compute the initial value of the the vector of parame-
55
ters in the parameter estimates algorithm as well as different estimators for the
56
width hyper-parameter of the ”exponential-type” kernel functions are consid-
57
ered. The convergence of the estimation algorithm is guaranteed with a low
58
computational cost. Particularly, the OLS is a special case of ETKRR approach
59
if we have perfectly aligned points around a line. Thus, in comparison with
60
other robust methods, the ETKRR is preferable when outliers are not present
61
in a given data set.
62
A comparative study between the ETKRR method and some classical re-
63
gression approaches (Weighted Least Squares (WLS), M-Estimator, MM-Esti-
64
mator, L1 regression and OLS) is considered. These methods will be evaluated
65
in terms of the bias and mean squared error (MSE) of the parameter estimates
66
taking into account X-space outliers, Y-space outliers, leverage points, different
67
sample sizes and percentage of outliers, in a Monte Carlo simulation frame-
68
work with 10,000 replications. Applications on real data sets also illustrate the
69
usefulness of the ETKRR method.
70
The paper is organized as follows: the next section gives a brief introduction
71
about some useful robust regression methods and the ordinary least squares
72
method. Section 3 reviews some concepts about exponential-type kernel func-
73
tions; the ETKRR method and the parameter estimate algorithm are discussed
74
in Section 4. Section 5 exhibits the Monte Carlo experiments comparing the
75
ETKRR method with some classical regression techniques and discuss the re-
76
sults obtained in the numerical analysis. Section 6 brings applications to real
77
data sets and, finally, Section 7 gives the concluding remarks.
4
78
79
2. Related Methods This section brings a brief description about the competing methods con-
80
sidered in this paper.
81
2.1. Ordinary Least Squares
82
Linear regression has been one of the most important statistical data analy-
83
sis tools. Given a set of independent and identically distributed (i.i.d.) obser-
84
vations (xi , yi ), i = 1, . . . , n, the linear regression model can be defined by yi = xi> β + ei ,
(1)
85
where xi is a p × 1 vector of values of the explanatory variables, β is an un-
86
known p × 1 vector, and the errors ei ’s are i.i.d. and independent from xi with
87
E(ei |xi ) = 0. The most commonly used estimate for β is the ordinary least
88
squares (OLS) estimate which minimizes the sum of squared errors, i.e, n
βˆ = arg min ∑ (yi − xi> β )2 . β
(2)
i =1
89
However, the ordinary least-squares estimators for linear models are very sen-
90
sitive to unusual values in the design space or outliers among y values. Even
91
one single atypical value may have a large effect on the parameter estimates.
92
Thus, robust regression analysis provides an alternative to a least squares re-
93
gression when fundamental assumptions are unfulfilled by the nature of the
94
data.
95
2.2. Weighted Least Squares
96
The weighted least squares method (WLS), which can be viewed as an ex-
97
tension of the OLS method, estimates β by solving the following minimization
98
problem n
βˆ = arg min ∑ wi (yi − xi> β )2 , β
99
100
with wi =
1 , where σˆ ii2 σˆ ii2
(3)
i =1
is the i-th element in the diagonal of σˆ 2 H = σˆ 2 X(X> X)−1 X>
and σˆ 2 is the estimated variance of the random error. 5
101
2.3. L1 Regression L1 regression estimates the median of y|x instead of the conditional mean. In L1 regression we estimate β by solving the minimization problem 1 βˆ = arg min n β
n
∑ |yi − xi> β |,
i =1
102
i.e., L1 regression minimizes the sum of the absolute errors while OLS (also
103
known as L2 regression) minimizes the sum of the squared errors. Conse-
104
quently, L1 gives much less weight to large deviations.
105
2.4. M-Estimator
106
The M-estimates [2] method suggests a solution for the normal equation
107
considering an appropriate weight function in order to penalize unusual ob-
108
servations. By replacing the least squares criterion (2) with a robust criterion,
109
the M-estimate [14] of β is n
βˆ = arg min ∑ ρ β
i =1
yi − xi> β σˆ
! .
(4)
110
where ρ(.)is a robust loss function and σˆ is an error scale estimate. The deriva-
111
tive of ρ, denoted by Ψ(.) = ρ0 (.), is called the influence function. In particular,
112
if ρ(t) = 21 t2 , then the solution is the OLS estimate.
113
2.5. MM-Estimator
114
MM estimation is a special type of M-estimation developed by Yohai (1987).
115
MM-estimates have become increasingly popular and are one of the most com-
116
monly employed robust regression techniques. The MM-estimates can be found
117
by a three-stage procedure. In the first stage, compute an initial consistent esti-
118
mate βˆ 0 with high breakdown point but possibly low normal efficiency. In the
119
second stage, compute a robust M-estimate of scale σˆ of the residuals based on
120
the initial estimate. In the third stage, find an M-estimate βˆ starting at βˆ 0 .
121
122
More details about the robust regression methods can be found in [15] and [16] and in the references within.
6
123
3. Exponential-type kernel functions Hereafter we briefly recall the basic theory about kernel functions. Let X =
{x1 , . . . , xn } be a non-empty set where xi ∈ R p . A function K : X × X → R is called a positive definite or Mercer kernel [17] if and only if K is symmetric (i.e.
K(xi , xk ) = K(xk , xi )) and the following inequality holds: n
n
∑ ∑ ci ck K(xi , xk ) ≥ 0,
∀n ≥ 2,
i =1 k =1 124
where cr ∈ R, ∀r = 1, . . . , n.
125
Let Φ : X → F be a non-linear mapping from the input space X to a high
126
dimensional feature space F . By applying the mapping Φ, the dot product
127
xi> xk in the input space is mapped to Φ(xi )> Φ(xk ) in the feature space. The key
128
idea supporting the kernel algorithms is that the non-linear mapping Φ does
129
not need to be explicitly specified because each Mercer kernel can be expressed
130
as K(xi , xk ) = Φ(xi )> Φ(xk ) [18].
131
One of the most relevant aspects in applications is that it is possible to com-
132
pute Euclidean distances in F without knowing explicitly Φ. This can be done
133
using the so called distance kernel trick [18]:
||Φ(xi ) − Φ(xk )||2
= (Φ(xi ) − Φ(xk ))> (Φ(xi ) − Φ(xk )) = Φ(xi )> Φ(xi ) − 2Φ(xi )> Φ(xk ) + Φ(xk )> Φ(xk ) = K(xi , xi ) − 2K(xi , xk ) + K(xk , xk ).
134
135
(5)
Table 1 gives some exponential-type kernel functions, which are often used in literature.
136
The width hyper-parameter γ2 plays an important role in the performance
137
of these kernels and must be carefully adjusted to the problem in question.
138
Overestimated values of γ2 lead to a linear projection and higher dimension
139
in the behavior of the exponential function. On the other hand, for under-
140
estimated values of γ2 , the function will lack regularization and the decision
141
boundary will be highly sensitive to noise in data. Moreover, note that the ex-
142
ponential kernel is closely related to the Gaussian kernel, with only the square 7
Table 1: Some ”exponential-type” kernel functions; γ > 0.
Gaussian
||x − x ||2 KG (xi , xk ) = exp − i 2 k 2γ
Exponential
||x − x || KE (xi , xk ) = exp − i 2 k 2γ
Laplacian
||x − xk || KL (xi , xk ) = exp − i γ
143
of the norm left out. The Laplace kernel is in turn completely equivalent to
144
the exponential kernel, except for being less sensitive for changes in γ2 . In this
145
way, because to the mathematical facilities provided by the Gaussian kernel we
146
will consider it for the ETKRR model.
147
4. The ETKRR method
148
The proposed method is inspired from the kernel framework. The estima-
149
tion of the parameters of the regression model is guided by the minimization
150
of an objective function that is defined as the sum of squared errors that is com-
151
puted not in the original space but in a high dimensional space F through a
152
non-linear mapping Φ applied to the response variable yi and its correspond-
153
ing mean µi . Thus, the objective function is defined as follows: n
S=
∑ ||Φ(yi ) − Φ(µi )||2
(6)
i =1 154
where µi = β 0 + β 1 xi1 + . . . + β p xip represents the mean of the response vari-
155
able yi , which is structured in terms of a set of explanatory variables xi1 , . . . , xip
156
and theirs respective parameters β 0 , β 1 , . . . , β p to be estimated.
157
As the non-linear mapping Φ is not explicitly knowing, the estimation of
158
the parameters of the regression model requires the use of the distance kernel 8
159
trick (see equation 5), thus the objective function becomes: n
S=
∑ {K(yi , yi ) − 2K(yi , µi ) + K(µi , µi )} .
(7)
i =1 160
Let KG be the gaussian kernel function listed in Table 1. Then, the follow-
161
ing property is verified: KG (yi , yi ) = KG (µˆ i , µˆ i ) = 1 (i = 1, . . . , n), and the
162
functional (7) can be rewritten as n
S=
∑ 2 [1 − KG (yi , µi )] ,
γ > 0.
(8)
i =1 163
Thus, as close we have yi and µi (i = 1, . . . , n) as close to zero becomes the
164
objective function for the ETKRR model.
165
166
Differentiating equation (8) with respect to the vector of parameters β =
( β 0 , . . . , β p ), we obtain the normal equations for the ETKRR model, given by n ∂S x (y − µˆ ) = 2 ∑ KG (yi , µˆ i ) ir i 2 i = 0, ∂β r γ i =1
167
168
r = 1, 2, . . . , p.
(9)
The solution of the normal equation system (9) is obtained through an iterative re-weighting least squares process, based on regression model y ∗ = X ∗ β + ∗ ,
(10)
169
where: y∗ = K1/2 y is a n × 1 weighted vector of response variable, X∗ = K1/2 X
170
is a n × p weighted matrix of explanatory variables, β is the vector of the regres-
171
sion parameters, ε∗ = K1/2 ε is a n × 1 weighted vector of independent errors
172
and K = diag(k11 , k22 , . . . , k nn ) is a n × n diagonal kernel weight matrix with k ii
173
defined as k ii = KG (yi , µi ), ∀i = 1, 2, . . . , n. Note that 0 ≤ KG (yi , µi ) ≤ 1, thus
174
observations with large residual will have a small weight for the parameter
175
estimates, as illustrated by Figure 1.
176
4.1. Estimation algorithm
177
178
The estimation of the vector of parameters β can be performed through the algorithm presented hereafter.
179
The algorithm starts from an initial solution (for instance, the OLS solution)
180
and alternates between the estimation of the regression model (coefficients and 9
input : X, y, a tolerance limit e, a maximum number of iterations T, the kernel width hyper-parameter γ2 ˆ µˆ output: β, Initialization: Set t = 0 and K(0) = In ;
// In is an n × n identity matrix
Compute βˆ (0) ;
// a start value for βˆ
Compute µˆ (0) = Xβˆ (0) ; Compute S(0) as given in Eq. (8); Fitting steps:
repeat Set t = t + 1; (t)
(t)
(t)
( t −1)
Compute K(t) = diag{k11 , . . . , k nn }, where k ii = KG (yi , µˆ i ( t −1) µˆ i
=
xiT βˆ (t−1) ;
Compute βˆ (t) = (X> K(t) X)−1 X> K(t) y; Compute µˆ (t) = Xβˆ (t) ; Compute S(t) as given in Eq. (8); until |S(t) − S(t−1) | ≤ ε or t ≥ T;
10
) and
0.4 0.0
weigth
0.8
Gaussian Kernel
−4
−2
0
2
4
residual
Figure 1: The Gaussian kernel function in ETKRR method.
181
predicted responses) and the estimation of the observation weights until the
182
convergence when the objective function becomes stationary.
183
As the kernel weight matrix needs to be computed only once at each it-
184
eration, the complexity of estimate the coefficients vector β is O(n( p + 1)),
185
where n is the number of observations and p is the number of variables. At
186
each iteration t the kernel weight matrix requires n kernel function evaluations
187
k ii = KG (yi , µˆ i
188
( t −1) βp xip ,
189
is O(n( p + 1)) for a single iteration.
190
4.2. Convergence properties
( t −1)
(t)
( t −1)
), i = 1, . . . , n, and, since µˆ i
( t −1)
= β0
( t −1)
+ β1
xi1 + · · · +
i = 1, . . . , n, the computational complexity of the ETKRR algorithm
191
The convergence of the proposed algorithm can be studied from two series:
192
wt = µˆ (t) and zt = S(wt ) = S(µˆ (t) ) (t = 0, 1, . . .). From an initial term w0 =
193
µˆ (0) = Xβˆ (0) , where βˆ (0) represents the start value for βˆ and K(0) = In , the
194
algorithm computes iteratively µˆ (t) until the convergence when the criterion S
195
achieves a stationary value.
196
Proposition 4.1. The series zt = S(wt ) decreases at each iteration and converges.
197
Proof. First of all, we will show that the following inequality S(t−1) = S(µˆ (t−1) ) ≥ S(t) = S(µˆ (t) )
11
(11)
198
hold, i.e., the series decreases at each iteration. The inequality (11) holds because S(t−1) = S(µˆ (t−1) ) =
n
∑2
i =1
( t −1)
where µˆ i
h
( t −1)
1 − KG (yi , µˆ i
i ) ,
γ > 0,
= xi> βˆ (t−1) , βˆ (t−1) = (X> K(t−1) X)−1 X> K(t−1) y, and S(t) = S(µˆ (t) ) =
n
h i (t) ˆ 2 1 − K ( y , µ ) , G i ∑ i
γ > 0,
i =1
(t)
where µˆ i
= xi> βˆ (t) , βˆ (t) = (X> K(t) X)−1 X> K(t) y and, at the t-th iteration, we
have that
n
βˆ (t) = arg min ∑ 2 [1 − KG (yi , µi )] , β
199
γ > 0.
i =1
Finally, because the series zt decreases and it is bounded (zt = S(wt ) ≥ 0),
200
it converges.
201
Proposition 4.2. The series wt = µˆ (t) converges.
202
Proof. Assume that the stationarity of the series zt is achieved at the iteration
203
t = T. Then, we have that z T = z T +1 , i.e., S(µˆ (T ) ) = S(µˆ (T +1) ). Moreover,
204
we have that µˆ (T ) = µˆ (T +1) , because µˆ = Xβˆ and βˆ = (X> KX)−1 X> Ky is
205
unique minimizing S. So, we can conclude that wT = wT +1 . This conclusion
206
holds for all t ≥ T and wt = wT ∀t ≥ T and it follows that the series wt = µˆ (t)
207
converges.
208
5. Monte Carlo experiments
209
This section performs a Monte Carlo simulation study to evaluate the be-
210
havior of the proposed parameter estimate algorithm of section 4.1 consider-
211
ing different alternatives for the initial value of the vector of parameters βˆ (0) as
212
well as for the width hyper-parameter estimators, taking into account a wide
213
range of scenarios. Moreover, a comparative study against some classical re-
214
gression methods (WLS, M-Estimator, MM-Estimator, L1 and OLS) is also con-
215
sidered. All experiments were implemented in the R language [19] and per-
216
formed on the same machine (OS: OSX El Captain, Version 10.11.1; 64-bits; 12
217
Memory: 8 GB 1600 MHz DDR3; Processor: Intel Core i5-4278U (Fourth Gen-
218
eration); CPU @ 2.6GHz). The code is available on GitHub through the link
219
https://github.com/marcelorpf/gkrreg.
220
5.1. Experimental settings
221
The parameter estimates for the ETKRR method were obtained according to
222
the algorithm presented in the Section 4.1 with a tolerance limit e = 1 × 10−10
223
and a maximum number of iterations T = 100.
224
The simulation settings reflect scenarios found in real problems. We fixed a
225
percentage of clean data and added the complementary percentage of outliers.
226
The clean data for the explanatory variable X came from an uniform distribu-
227
tion in the interval [10, 20], while the ith clean data for the response variable Y
228
is generated according to the linear model yi = 1.7 + 2xi + ei , where ei is the
229
random error term distributed as N (0; 1).
230
The outlier observations are generated according to three different scenar-
231
ios often found in real life problems (Y-space outliers, X-space outliers and
232
leverage points). Figure 2 illustrates these scenarios that take into account the
233
following bivariate Gaussian distribution: N2 (µ = (µ x , µy ), Σ = I2 ). Scenario
234
0 considers just clean observations. Scenario 1 considers Y-space outliers, since
235
their y values or response values are significantly different from the rest of the
236
data. Scenario 2 considers X-space outliers observations which present unusual
237
x values in relation to the rest of the data. Finally, scenario 3 considers a per-
238
centage of leverage observations, that is, the observations are shifted on both X
239
and Y axis. Table 2 describes the used parameters to generate the outliers from
240
the clean observations.
241
Concerning the width hyper-parameter γ2 , Ref. [20] proposed to estimate
242
it as the average between the percentiles 0.1 and 0.9 of the term (yi − µˆ OLS )2 i
243
computed on a random sample of the data with size n∗ = αn, where n is the
244
number of observations, α ∈ (0, 1) (we used α = 0.5) and µˆ OLS is the predicted i
245
value for yi considering the OLS method. In this paper, we considered the total of three estimators for the width 13
● ● ● ●
●
●
●
●
● ● ●● ● ● ● ● ●●
●
●
●
●
Y−space outliers
●
●● ● ● ●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ●●●●●● ● ● ● ● ● ●● ● ● ● ●● ● ●●
Y
40
●
25
30
Clean observations
●
35
●● ●● ●● ● ● ● ●● ●● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ●
●●●
● ● ● ● ●●
45
40 20
25
30
Y
35
●● ● ● ●●
50
● ● ●● ● ●● ●●
●
● ● ● ● ●● ● ● ● ● ●
● ● ●
20
●
10
12
14
16
18
20
10
12
14
16
18
20
X
45
50
X
● ●
● ● ●● ●
●
20
45 40 30
● ● ● ●
10
20 15
20
● ● ● ● ● ● ●● ● ●● ● ●● ●●●●● ●● ● ●● ● ● ● ● ●● ●●● ● ● ● ●● ● ●●● ●● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ●● ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ●● ●●●● ●
X−space outliers
●
25
● ● ●●● ● ● ● ● ●
●
35
● ●● ● ● ● ● ● ● ● ●● ● ● ●●
Y
●● ● ●
25
25
30
Y
35
40
● ● ● ● ●● ● ●● ● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●●● ● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ● ●● ●●● ● ● ● ● ●● ● ●● ● ●● ● ●●● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●
Leverage points
●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ●
10
15
X
20
25
X
Figure 2: The outlier scenarios
Table 2: Outlier parameters according to bivariate Gaussian distribution N2 (µ, I2 )
Scenario
Outlier’s scenario
µx
µy
0
No-outliers
-
-
1
Y-space outliers
x¯
max(y)+5
2
X-space outliers
max( x )+5
y¯
3
Leverage points
max( x )+5
max(y)+5
14
hyper-parameter γ2 and we evaluated how these estimators affect the performance of the algorithm presented in section 4.1 based on bias and MSE of the parameter estimates. The estimator S1 is the Caputo’s estimator [20]. The estimator S2 is the median of the values of rij = (yi − µˆ OLS )2 , ∀rij 6= 0, i, j = j 1, . . . , n. Finally, S3 is defined as n 1 (y − µˆ OLS )2 , ∑ i n − p − 1 i =1 i 246
where p is the number of explanatory variables.
247
5.2. Width hyper-parameter estimators and computation of the initial value of the vec-
248
tor of parameters.
249
Initially, we aim to evaluate how different initial values for the vector of
250
parameters affect the final parameter estimates of the algorithm given in the
251
section 4.1. We consider three different methods to compute the initial values
252
of the vector of parameter estimates βˆ (0) : the ordinary least squares estimate
253
(OLS), the L1 regression [16] and the MM-Estimator [3]. For all these cases
254
the width hyper-parameter γ2 was fixed as the S1 estimator. The evaluation
255
was based on the computation of the bias and mean squared error (MSE) of
256
the parameter estimates for the method ETKRR according to the choice of the
257
initial values (OLS, L1 and MM) for the vector of parameters. A Monte Carlo
258
simulation 10,000 times repeated was considered for each configuration.
259
Tables 3 to 6 present the bias and MSE of the parameter estimates accord-
260
ing to the initial value used to compute βˆ (0) , taking into account the scenarios
261
described in Table 2. They were computed for the sample sizes n = 50 and
262
n = 100.
263
Note that all methods (OLS, L1 and MM) used as initial values produced
264
parameter estimates very close to the true parameters in scenario 0 for all sam-
265
ple sizes (Table 3) and in scenario 1 with 5% of outliers (Table 4). Moreover,
266
when the percentage of outliers is increased to 10% in scenario 1 the algorithm
267
presents better results with OLS for both sample sizes, n = 50 and n = 100.
15
268
It can be also observed that the OLS method still provided the best parame-
269
ter estimates when the percentage of outliers is increased up to 30%. The OLS
270
method is also recommended in scenario 2 for the percentage of outliers up
271
to 15% (Table 5). In this scenario, when we have 20% or 30% of outliers, the
272
methods presented a quite similar performance. Finally, in scenario 3 (Table 6),
273
although the OLS method presented slightly better parameter estimates, we
274
cannot say that the choice of the method to compute βˆ (0) does affect the re-
275
sults. Thus, we recommend to consider the OLS as the method to compute
276
βˆ (0) in scenarios 0, 1 and 2. Moreover, we also recommend the OLS method to
277
compute βˆ (0) in scenario 3 for the sake of simplicity. Table 3: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 0 (Clean observations). n
50
100
βˆ 0
Start Value
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
OLS
1.708
0.609777
0.769106
2.000
0.039932
0.050297
L1
1.708
0.609822
0.769217
2.000
0.039938
0.050302
MM
1.708
0.609855
0.769238
2.000
0.039946
0.050305
OLS
1.706
0.428355
0.536767
2.000
0.028093
0.035180
L1
1.706
0.428450
0.536838
2.000
0.028099
0.035184
MM
1.706
0.428401
0.536819
2.000
0.028095
0.035182
Now, we aim to evaluate how the estimators of the width hyper-parameter
278
279
γ2
280
algorithm in terms of bias and MSE. Tables 7 to 10 show the results of the
281
simulation study. All results considered the OLS method as initial values for
282
the vector of parameters.
presented in section 5.1 affect the performance of the parameter estimation
283
It can be seen in Table 7 the results observed in scenario 0. We conclude
284
that the estimator S1 presented a slight advantage in relation to S2 and S3. The
285
estimator S3 presented the worst performance for the width hyper-parameter
286
γ2 . Table 8 presents the results for scenario 1 (Y-space outliers). We c observed
287
that the estimators S2 and S3 showed a quite similar performance in terms of
16
Table 4: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 1 (Y-space outliers). n
Perc.out
5%
10%
50
15%
20%
30%
5%
10%
100
15%
20%
30%
βˆ 0
Start Value
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
OLS
1.709
0.634914
0.797223
2.003
0.042032
0.052586
L1
1.712
0.636436
0.798247
2.003
0.042036
0.052593
MM
1.712
0.635412
0.797374
2.003
0.041983
0.052554
OLS
1.792
0.665063
0.829221
2.011
0.044798
0.055236
L1
1.816
0.673648
0.838127
2.011
0.044907
0.055459
MM
1.820
0.670898
0.831721
2.011
0.044800
0.055261
OLS
1.842
0.711230
0.882880
2.017
0.048791
0.059288
L1
1.894
0.733170
0.901370
2.017
0.049140
0.059799
MM
1.904
0.738003
0.906416
2.017
0.049287
0.060004
OLS
1.932
0.781798
0.954722
2.025
0.054010
0.063878
L1
2.059
0.853621
1.013820
2.024
0.055392
0.066023
MM
2.082
0.860618
1.018809
2.024
0.055417
0.066249
OLS
2.226
1.007429
1.154879
2.046
0.069767
0.076472
L1
2.770
1.400957
1.395382
2.041
0.074591
0.085595
MM
2.894
1.500895
1.456148
2.040
0.076278
0.088456
OLS
1.711
0.439884
0.550732
2.006
0.029511
0.036412
L1
1.715
0.440522
0.551257
2.006
0.029493
0.036423
MM
1.715
0.440725
0.551612
2.006
0.029482
0.036399
OLS
1.760
0.462924
0.577045
2.010
0.031645
0.038439
L1
1.775
0.465670
0.579857
2.010
0.031753
0.038625
MM
1.774
0.465432
0.579120
2.010
0.031763
0.038666
OLS
1.856
0.509206
0.620153
2.018
0.035901
0.041548
L1
1.908
0.533588
0.639165
2.018
0.036203
0.041951
MM
1.919
0.538405
0.640340
2.018
0.036258
0.042096
OLS
1.949
0.566382
0.663505
2.025
0.040312
0.044372
L1
2.074
0.637127
0.705277
2.025
0.041079
0.045505
MM
2.106
0.655613
0.716427
2.024
0.041268
0.045955
OLS
2.251
0.801614
0.840609
2.043
0.055116
0.053883
L1
2.763
1.196070
1.007438
2.040
0.057859
0.060187
MM
2.887
1.298831
1.035883
2.039
0.058465
0.061779
17
Table 5: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 2 (X-space outliers). n
Perc.out
5%
10%
50
15%
20%
30%
5%
10%
100
15%
20%
30%
βˆ 0
Start Value
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
OLS
1.838
0.661443
0.818520
1.990
0.043678
0.053866
L1
2.142
0.807842
0.936373
1.969
0.054455
0.062657
MM
2.179
0.830120
0.963676
1.966
0.056080
0.064668
OLS
1.900
0.708920
1.005415
1.986
0.046984
0.068713
L1
6.166
4.584316
5.147559
1.684
0.323673
0.364805
MM
8.738
7.098527
5.970376
1.501
0.502349
0.423401
OLS
5.015
3.769306
7.490316
1.756
0.27277
0.550184
L1
110.437
8.843221
8.051954
1.380
0.627115
0.572677
MM
115.869
14.19444
6.646001
0.995
1.006148
0.471304
OLS
22.047
20.37075
5.122490
0.522
1.479040
0.357634
L1
21.268
19.59096
6.343609
0.580
1.421528
0.450756
MM
20.542
18.85129
4.759383
0.662
1.338261
0.336347
OLS
24.722
23.02172
2.866738
0.323
1.677110
0.110791
L1
24.820
23.11966
3.074839
0.318
1.682449
0.118428
MM
24.155
22.45514
1.533297
0.407
1.593347
0.094365
OLS
1.820
0.459912
0.565744
1.991
0.030415
0.037191
L1
2.459
0.876644
0.875098
1.946
0.060890
0.060080
MM
2.621
1.018729
0.971732
1.935
0.071117
0.067128
OLS
1.835
0.481912
0.592555
1.990
0.031976
0.039179
L1
4.508
2.849042
3.001795
1.801
0.201301
0.212278
MM
6.101
4.421047
3.884832
1.689
0.312696
0.274986
OLS
4.457
3.0705940
6.825146
1.798
0.222514
0.501557
L1
111.351
9.6816040
7.945892
1.315
0.686950
0.564497
MM
118.030
16.331596
4.785727
0.841
1.158690
0.339760
OLS
22.738
21.04012
2.414839
0.472
1.527747
0.166687
L1
21.869
20.17852
4.959773
0.534
1.466850
0.356829
MM
21.609
19.90915
2.170408
0.587
1.412763
0.152295
OLS
24.419
22.71850
1.694834
0.338
1.661689
0.066461
L1
24.432
22.73208
1.812080
0.336
1.664272
0.070509
MM
24.101
22.40098
0.874208
0.411
1.589396
0.050820
18
Table 6: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 3 (Leverage points). n
Perc.out
5%
10%
50
15%
20%
30%
5%
10%
100
15%
20%
30%
βˆ 0
Start Value
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
OLS
3.565
1.890510
0.998692
1.869
0.132862
0.067174
L1
3.585
1.910962
1.009286
1.867
0.134296
0.067945
MM
3.589
1.914571
1.012603
1.867
0.134546
0.068170
OLS
5.406
3.705959
1.132789
1.740
0.260312
0.076817
L1
5.413
3.713813
1.136892
1.739
0.260852
0.077082
MM
5.418
3.718180
1.139170
1.739
0.261154
0.077245
OLS
5.917
4.217446
1.137684
1.704
0.295867
0.077259
L1
5.923
4.222865
1.139960
1.704
0.296231
0.077403
MM
5.925
4.225435
1.141311
1.704
0.296409
0.077495
OLS
6.372
4.671863
1.175514
1.673
0.327273
0.079162
L1
6.374
4.674489
1.176368
1.673
0.327446
0.079208
MM
6.374
4.674062
1.175688
1.673
0.327422
0.079173
OLS
7.075
5.374844
1.189473
1.625
0.374564
0.079409
L1
7.076
5.375981
1.189593
1.625
0.374634
0.079414
MM
7.076
5.375793
1.189656
1.625
0.374622
0.079415
OLS
4.030
2.330987
0.771953
1.836
0.163987
0.052493
L1
4.042
2.342148
0.777821
1.835
0.164764
0.052913
MM
4.044
2.344828
0.779386
1.835
0.164952
0.053036
OLS
4.940
3.240474
0.846693
1.772
0.227690
0.057890
L1
4.948
3.248258
0.848942
1.772
0.228227
0.058052
MM
4.951
3.250806
0.850860
1.772
0.228405
0.058184
OLS
5.691
3.990905
0.892398
1.720
0.279769
0.060982
L1
5.694
3.993894
0.893858
1.720
0.279971
0.061079
MM
5.695
3.994590
0.894313
1.720
0.280018
0.061102
OLS
6.078
4.377678
0.914409
1.694
0.306134
0.062323
L1
6.080
4.379860
0.914634
1.694
0.306279
0.062336
MM
6.080
4.379808
0.914768
1.694
0.306276
0.062350
OLS
6.702
5.001932
0.973949
1.652
0.348264
0.066097
L1
6.703
5.002987
0.974001
1.652
0.348330
0.066098
MM
6.703
5.002586
0.973995
1.652
0.348306
0.066100
19
288
bias and MSE up to 10% of outliers. Moreover, when the percentage of outliers
289
was 15% or higher the estimator S3 presented a slightly better performance.
290
Still in this scenario, the estimator S1 presented the worst performance. It can
291
be observed from Table 9 that the estimator S2 presented the best performance
292
in scenario 2 (X-space outliers) up to 15% of outliers. In this scenario, when
293
the percentage of outliers is greater than 15%, the results are inconclusive due
294
to the poor performance of all considered estimators. Finally, according to Ta-
295
ble 10, the estimator S3 is the best choice in scenario 3, whatever the percentage
296
of bad leverage points.
297
Based on the simulation results we recommend the choice of the follow-
298
ing width hiper-parameter γ2 for the ETKRR method, according to the outlier
299
scenario:
300
301
• for the scenario 0 (the datasets have just clean observations), S1 is the recommended estimator;
302
• for the scenario 1 (datasets with Y-space outliers), whatever the percent-
303
age of outliers, S3 is the recommended estimator (up to 10% of outliers,
304
estimator S2 can also be used);
305
306
307
308
309
• for the scenario 2 (datasets with X-space outliers), S2 is the recommended estimator; • finally, for the scenario 3 (datasets with leverage points), S3 is the recommended estimator.
5.3. Comparative study with robust methods
310
This section considers a comparative study between the ETKRR method
311
and some classical robust regression methods (WLS, M-Estimator, MM-Esti-
313
mator and L1), as well as the OLS method. The parameter estimates algorithm for the ETKRR method uses the OLS to estimate β (ˆ0) and the estimator S1 for
314
the width hyper-parameter γ2 in the scenario 0, the estimators S2 and S3 in the
312
20
Table 7: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 0 (Clean observations). n
50
100
βˆ 0
γ2 estimator
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
S1
1.706
0.606571
0.762750
2.000
0.039663
0.049894
S2
1.706
0.607502
0.763946
2.000
0.039720
0.049972
S3
1.721
0.950831
1.197118
1.998
0.062221
0.078308
S1
1.696
0.427941
0.535824
2.000
0.027951
0.035088
S2
1.696
0.428609
0.537025
2.000
0.028000
0.035169
S3
1.688
0.666771
0.840075
2.001
0.043898
0.055167
315
scenario 1, the estimator S2 in the scenario and 2 and finally the estimator S3 in
316
the scenario 3.
317
We considered the outlier scenarios of Table 2 as well as two different sam-
318
ple sizes n = {50, 100} and five different percentage of outliers p.out = {5%,
319
10%, 15%, 20%, 30%} for each outlier scenario, in a total of 32 different config-
320
urations. A Monte Carlo simulation based on 10, 000 replicates was considered
321
for each configuration. Bias and mean squared error (MSE) of the parameter
322
estimates as well as the computational time have been evaluated for each re-
323
gression method (ETKRR, WLS, M-Estimator, MM-Estimator, L1 and OLS).
324
Outliers scenario 0
325
Table 11 presents the performance of each method in scenario 0, where only
326
clean observations are considered. The ETKRR approach presents the closest
327
performance to OLS method. In terms of bias and MSE, the other competing
328
models present a worse performance, when compared with the ETKRR and
329
OLS methods. The ETKRR method also has the second lowest computational
330
time (Comp. Time) among the robust approaches evaluated. Thus, the simula-
331
tion results suggest that the ETKRR method is appropriate for data sets without
332
outliers.
21
Table 8: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 1 (Y-space outliers). n
Perc.out
5%
10%
50
15%
20%
30%
5%
10%
100
15%
20%
30%
βˆ 0
γ2 estimator
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
S1
1.691
0.616940
0.778125
2.005
0.041004
0.051465
S2
1.673
0.614689
0.774935
2.002
0.040409
0.050930
S3
1.676
0.626423
0.789505
2.001
0.041172
0.051859
S1
1.785
0.661896
0.827249
2.012
0.044802
0.055222
S2
1.682
0.639721
0.803219
2.002
0.041935
0.052606
S3
1.695
0.640378
0.803971
2.000
0.041843
0.052523
S1
1.833
0.704721
0.875125
2.018
0.048472
0.058413
S2
1.657
0.658064
0.824814
2.005
0.043340
0.054177
S3
1.672
0.656388
0.823248
2.003
0.042931
0.053862
S1
1.950
0.784339
0.953450
2.024
0.053968
0.064074
S2
1.666
0.681434
0.859305
2.007
0.045366
0.056792
S3
1.683
0.678400
0.856095
2.003
0.044601
0.056179
S1
2.241
1.028277
1.173735
2.045
0.070507
0.077524
S2
1.613
0.748447
0.938207
2.021
0.052560
0.062853
S3
1.621
0.742235
0.930869
2.015
0.050263
0.061413
S1
1.727
0.433235
0.542686
2.005
0.028892
0.035997
S2
1.697
0.428589
0.537861
2.001
0.027976
0.035235
S3
1.701
0.431702
0.541973
2.000
0.028123
0.035458
S1
1.766
0.462638
0.576972
2.010
0.031552
0.038505
S2
1.687
0.447802
0.562343
2.002
0.029388
0.036911
S3
1.696
0.447701
0.562767
2.000
0.029325
0.036866
S1
1.868
0.509394
0.616437
2.017
0.035653
0.041469
S2
1.685
0.461308
0.577899
2.003
0.030539
0.038096
S3
1.698
0.460436
0.576985
2.001
0.030254
0.037873
S1
1.964
0.575540
0.671378
2.024
0.040258
0.044771
S2
1.670
0.484889
0.605958
2.006
0.032166
0.039837
S3
1.682
0.483067
0.603838
2.003
0.031606
0.039439
S1
2.233
0.786075
0.828176
2.044
0.055518
0.053370
S2
1.620
0.523976
0.650317
2.019
0.037784
0.043411
S3
1.624
0.520282
0.645929
2.014
0.035772
0.042565
22
Table 9: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 2 (X-space outliers). n
Perc.out
5%
10%
50
15%
20%
30%
5%
10%
100
15%
20%
30%
βˆ 0
γ2 estimator
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
S1
1.816
0.639645
0.798985
1.991
0.042330
0.052729
S2
1.677
0.615404
0.775885
2.001
0.040417
0.050962
S3
1.677
0.624343
0.786545
2.001
0.041036
0.051671
S1
1.886
0.699032
0.907273
1.987
0.046295
0.060616
S2
1.700
0.642099
0.806081
2.000
0.041966
0.052667
S3
1.699
0.642952
0.807201
2.000
0.042024
0.052740
S1
5.312
4.051649
7.730725
1.734
0.294069
0.569120
S2
2.208
1.167070
3.544921
1.962
0.081581
0.264678
S3
2.918
1.852625
5.248582
1.908
0.133052
0.391537
S1
22.112
20.432730
5.028495
0.518
1.483233
0.350089
S2
22.127
20.488680
7.748253
0.484
1.519741
0.517908
S3
22.018
20.379950
7.536004
0.495
1.509371
0.509895
S1
24.728
23.027820
2.881520
0.323
1.677190
0.110917
S2
25.036
23.336080
5.506787
0.278
1.721735
0.211789
S3
24.760
23.060120
4.607061
0.297
1.702578
0.174031
S1
1.834
0.454470
0.556272
1.991
0.029957
0.036672
S2
1.703
0.429001
0.538578
2.000
0.027956
0.035244
S3
1.702
0.432174
0.542490
2.000
0.028150
0.035491
S1
1.836
0.482659
0.594037
1.990
0.031966
0.039271
S2
1.698
0.448020
0.563317
2.000
0.029344
0.036900
S3
1.697
0.449180
0.564835
2.000
0.029421
0.037002
S1
4.435
3.034044
6.779746
1.799
0.220067
0.498810
S2
1.889
0.635219
2.068811
1.986
0.043608
0.156031
S3
2.229
0.967306
3.432958
1.960
0.068413
0.256715
S1
22.750
21.052830
2.532885
0.472
1.527942
0.174635
S2
22.929
21.246200
5.068086
0.407
1.593688
0.324366
S3
22.841
21.158770
4.891182
0.417
1.583701
0.321615
S1
24.417
22.717200
1.684653
0.338
1.662079
0.064995
S2
24.198
22.497810
4.300836
0.311
1.688568
0.165900
S3
24.108
22.408390
3.275091
0.323
1.676774
0.123583
23
Table 10: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 3 (Leverage points). n
Perc.out
5%
10%
50
15%
20%
30%
5%
10%
100
15%
20%
30%
βˆ 0
γ2 estimator
βˆ 1
Average
Bias
MSE
Average
Bias
MSE
S1
3.541
1.866300
0.978264
1.870
0.131455
0.066192
S2
3.162
1.500139
0.867279
1.897
0.105596
0.058168
S3
1.805
0.884509
1.117948
1.992
0.058743
0.074350
S1
5.403
3.703708
1.135771
1.740
0.260268
0.077082
S2
5.221
3.521295
1.097968
1.752
0.247708
0.074500
S3
2.323
1.117845
1.363483
1.956
0.076204
0.092876
S1
5.901
4.201088
1.136392
1.705
0.294613
0.076738
S2
5.780
4.079860
1.119582
1.714
0.286375
0.075645
S3
2.793
1.469331
1.672238
1.922
0.101831
0.115509
S1
6.387
4.687066
1.175533
1.672
0.327865
0.079226
S2
6.306
4.605586
1.169140
1.678
0.322446
0.078826
S3
3.675
2.194258
2.042215
1.860
0.154099
0.141603
S1
7.086
5.386258
1.198691
1.625
0.375446
0.079545
S2
7.045
5.344782
1.198882
1.627
0.372847
0.079542
S3
5.651
3.970430
2.039129
1.720
0.280805
0.136341
S1
4.042
2.342226
0.766873
1.835
0.164697
0.052045
S2
3.733
2.033590
0.682655
1.857
0.143148
0.046050
S3
1.899
0.632342
0.783932
1.986
0.042188
0.052137
S1
4.917
3.217475
0.842000
1.774
0.225941
0.057509
S2
4.711
3.010908
0.797985
1.788
0.211651
0.054427
S3
2.118
0.751452
0.882339
1.970
0.050994
0.059599
S1
5.698
3.997522
0.884449
1.720
0.280053
0.060375
S2
5.581
3.880767
0.867923
1.728
0.272123
0.059258
S3
2.630
1.105629
1.115076
1.934
0.076810
0.076805
S1
6.103
4.402573
0.914198
1.692
0.307816
0.062141
S2
6.021
4.320764
0.905684
1.698
0.302365
0.061571
S3
3.290
1.685910
1.411286
1.887
0.118486
0.098083
S1
6.699
4.998802
0.972223
1.652
0.348159
0.066043
S2
6.655
4.954697
0.970150
1.655
0.345377
0.065893
S3
5.084
3.390093
1.538221
1.760
0.240278
0.106280
24
Table 11: Outliers scenario 0: bias and MSE for the parameter estimates.
Method
βˆ 0
n
βˆ 1
Comp. Time
Average
Bias
MSE
Average
Bias
MSE
OLS
1.706
0.605
0.758
1.999
0.040
0.050
0.0014
WLS
1.706
0.645
0.807
1.999
0.042
0.053
0.0015
M
1.707
0.622
0.780
1.999
0.041
0.051
0.0031
1.708
0.627
0.787
1.999
0.041
0.052
0.0079
L1
1.707
0.760
0.954
1.999
0.050
0.063
0.0019
ETKRR(S1)
1.706
0.605
0.758
1.999
0.040
0.050
0.0018
OLS
1.696
0.428
0.537
2.000
0.028
0.035
0.0016
WLS
1.692
0.455
0.570
2.001
0.030
0.037
0.0016
M
1.696
0.440
0.551
2.000
0.029
0.036
0.0031
1.696
0.442
0.554
2.000
0.029
0.036
0.0122
L1
1.699
0.538
0.676
2.000
0.035
0.044
0.0019
ETKRR(S1)
1.696
0.428
0.537
2.000
0.028
0.035
0.0027
MM
MM
50
100
25
333
Outliers scenario 1
334
Scenario 1 considers outliers observations in Y-space. Table 12 presents
335
a comparative study between the methods WLS, L1 and ETKRR(S2,S3) . The
336
ETKRR method outperforms both approaches, WLS and L1, since it presents
337
the lower bias and the higher precision (lower MSE) for the parameter esti-
338
mates in all sample sizes. The ETKRR method exhibits satisfactory results be-
339
tween 5% and 20% of outliers in the sample. Still in this outlier range, the
340
method ETKRR presents an average value to the intercept (β 0 ) close to the true
341
parameter, in contrast with the methods WLS and L1.
342
Table 13 gives the comparison between the methods ETKRR(S2,S3) , M-Estimator
343
and MM-Estimator. Note that the ETKRR outperforms the M-Estimator in the
344
larger part of configurations, specially for the coefficient β 0 . At the level of
345
30% of outliers, the ETKRR presents a lower bias and a higher precision in the
346
parameter estimates when compared with the M-Estimator method. The com-
347
parative study between the MM-Estimator and ETKRR methods demonstrates
348
that both methods present a similar performance until 20% of outliers, with
349
ETKRR exhibiting a slightly small bias and MSE. Particularly, until the level of
350
20% of outliers, the ETKRR method is very competitive. Moreover, the ETKRR
351
has about half of computational time in comparison with the MM-Estimator.
352
Outliers scenario 2
353
Scenario 2 represents outlier observations in X-space. Based on the results
354
presented in the tables 14 and 15, we conclude that the method ETKRR(S2) out-
355
performs the methods WLS, M-Estimator and L1 regression based on the bias
356
and the MSE of the parameter estimates until the level of 15% of outliers. More-
357
over, the method ETKRR presented better results than method MM-Estimator
358
unitl 10% of outliers. This level represents a frequent outlier percentage in real
359
problems. After 15% of outliers just the MM-Estimator showed a good perfor-
360
mance.
26
Table 12: Outliers scenario 1: bias and MSE for the parameter estimates. Comparison between WLS, L1 and ETKRR(S2,S3) methods. n
Perc.out
βˆ 0
Method
βˆ 1
Comp. Time
Average
Bias
MSE
Average
Bias
MSE
WLS
2.912
1.784
1.858
1.980
0.098
0.121
0.0016
L1
1.778
0.790
0.985
1.998
0.052
0.065
0.0020
ETKRR(S2)
1.712
0.619
0.777
1.999
0.040
0.051
0.0025
WLS
5.144
3.724
2.807
1.941
0.154
0.184
0.0016
L1
1.874
0.825
1.024
1.999
0.053
0.067
0.0020
ETKRR(S3)
1.702
0.643
0.807
2.000
0.042
0.052
0.0032
WLS
6.209
4.740
3.207
1.923
0.179
0.212
0.0016
L1
1.914
0.852
1.050
2.001
0.055
0.069
0.0020
ETKRR(S3)
1.680
0.652
0.818
2.002
0.042
0.053
0.0034
WLS
7.668
6.124
3.632
1.887
0.212
0.241
0.0016
L1
2.032
0.924
1.115
1.999
0.058
0.073
0.0022
ETKRR(S3)
1.676
0.686
0.863
2.003
0.045
0.056
0.0039
WLS
11.009
9.375
4.435
1.800
0.289
0.300
0.0017
L1
2.276
1.100
1.260
2.000
0.066
0.083
0.0021
ETKRR(S3)
1.619
0.732
0.921
2.015
0.049
0.060
0.0052
WLS
3.517
1.990
1.536
1.970
0.083
0.100
0.0018
L1
1.784
0.551
0.691
2.000
0.036
0.045
0.0021
ETKRR(S2)
1.698
0.435
0.546
2.000
0.028
0.035
0.0041
WLS
4.738
3.133
1.936
1.950
0.109
0.126
0.0017
L1
1.842
0.573
0.709
2.000
0.037
0.046
0.0020
ETKRR(S3)
1.699
0.449
0.565
2.000
0.029
0.037
0.0045
WLS
6.459
4.797
2.316
1.917
0.139
0.152
0.0020
L1
1.949
0.624
0.743
1.999
0.039
0.049
0.0021
ETKRR(S3)
1.700
0.466
0.582
2.000
0.030
0.038
0.0058
WLS
7.820
6.136
2.598
1.886
0.167
0.172
0.0018
L1
2.027
0.670
0.778
1.999
0.041
0.051
0.0020
ETKRR(S3)
1.679
0.476
0.598
2.003
0.031
0.039
0.0061
WLS
10.996
9.302
3.196
1.804
0.238
0.214
0.0017
L1
2.266
0.842
0.885
2.000
0.046
0.058
0.0021
ETKRR(S3)
1.636
0.523
0.655
2.013
0.035
0.043
0.0080
5%
10%
50 15%
20%
30%
5%
10%
100 15%
20%
30%
27
Table 13: Scenario 1: bias and MSE for the parameter estimates. Comparison between M-Estimator, MM-Estimator and ETKRR(S2,S3) methods. n
Perc.out
5%
10%
50 15%
20%
30%
5%
10%
100 15%
20%
30%
βˆ 0
Method
βˆ 1
Comp. Time
Average
Bias
MSE
Average
Bias
MSE
M
1.789
0.642
0.799
1.999
0.042
0.052
0.0034
MM
1.719
0.635
0.796
1.999
0.042
0.052
0.0084
ETKRR(S2)
1.712
0.619
0.777
1.999
0.041
0.051
0.0025
M
1.939
0.696
0.840
2.000
0.044
0.055
0.0037
MM
1.706
0.651
0.817
2.000
0.043
0.053
0.0088
ETKRRS3)
1.702
0.643
0.807
2.000
0.042
0.052
0.0032
M
2.030
0.735
0.860
2.001
0.045
0.056
0.0037
MM
1.690
0.658
0.825
2.001
0.043
0.054
0.0087
ETKRR(S3)
1.680
0.652
0.818
2.001
0.042
0.053
0.0034
M
2.231
0.857
0.931
2.000
0.048
0.061
0.0043
MM
1.703
0.688
0.866
2.000
0.045
0.057
0.0082
ETKRR(S3)
1.676
0.686
0.863
2.003
0.045
0.056
0.0039
M
5.926
4.258
2.328
1.975
0.110
0.140
0.0061
MM
1.691
0.722
0.911
2.001
0.047
0.060
0.0090
ETKRR(S3)
1.618
0.732
0.921
2.015
0.049
0.060
0.0052
M
1.811
0.458
0.563
2.000
0.029
0.037
0.0038
MM
1.704
0.443
0.556
2.000
0.029
0.036
0.0141
ETKRR(S2)
1.698
0.435
0.546
2.000
0.028
0.035
0.0041
M
1.896
0.491
0.584
2.000
0.030
0.038
0.0038
MM
1.701
0.453
0.571
2.000
0.030
0.037
0.0127
ETKRR(S3)
1.699
0.449
0.565
2.000
0.029
0.037
0.0045
M
2.064
0.574
0.613
1.999
0.032
0.040
0.0042
MM
1.711
0.470
0.588
1.999
0.031
0.039
0.0132
ETKRR(S3)
1.700
0.466
0.582
2.000
0.030
0.038
0.0058
M
2.235
0.682
0.643
2.000
0.033
0.042
0.0045
MM
1.705
0.479
0.601
2.000
0.031
0.039
0.0125
ETKRR(S3)
1.679
0.476
0.598
2.003
0.031
0.039
0.0061
M
5.115
3.419
1.498
1.994
0.070
0.089
0.0073
MM
1.703
0.515
0.648
2.000
0.034
0.043
0.0126
ETKRR(S3)
1.636
0.523
0.655
2.013
0.035
0.043
0.0080
28
Table 14: Outliers scenario 2: bias and MSE for the parameter estimates. n
Perc.out
5%
10%
50
15%
20%
30%
βˆ 0
Method
βˆ 1
Comp. Time
Average
Bias
MSE
Average
Bias
MSE
WLS
4.796
3.096
0.782
1.787
0.212
0.050
0.0010
M
2.944
1.291
0.833
1.912
0.090
0.054
0.0022
MM
1.720
0.634
0.795
1.998
0.041
0.052
0.0062
L1
2.632
1.120
1.009
1.934
0.077
0.066
0.0012
ETKRR(S2)
1.717
0.620
0.778
1.999
0.040
0.051
0.0165
WLS
11.408
9.708
0.951
1.344
0.655
0.060
0.0012
M
10.692
8.992
4.202
1.363
0.636
0.297
0.0037
MM
1.706
0.650
0.817
1.999
0.042
0.053
0.0062
L1
4.928
3.230
1.335
1.770
0.229
0.090
0.0012
ETKRR(S2)
1.706
0.644
0.808
2.000
0.042
0.053
0.0194
WLS
14.669
12.969
1.073
1.128
0.871
0.066
0.0011
M
20.018
18.318
1.318
0.705
1.294
0.091
0.0021
MM
1.690
0.657
0.825
2.001
0.043
0.053
0.0067
L1
7.853
6.153
3.453
1.559
0.440
0.251
0.0013
ETKRR(S2)
2.265
1.210
3.593
1.957
0.084
0.276
0.0229
WLS
18.179
16.479
1.176
0.899
1.100
0.069
0.0010
M
21.700
20.000
1.198
0.585
1.414
0.082
0.0018
MM
1.703
0.688
0.866
1.999
0.045
0.057
0.0060
L1
21.707
20.007
5.979
0.556
1.444
0.422
0.0012
ETKRR(S2)
22.143
20.461
4.881
0.514
1.486
0.340
0.0305
WLS
24.608
22.908
1.259
0.489
1.510
0.061
0.0010
M
24.051
22.351
1.199
0.411
1.588
0.071
0.0021
MM
1.714
0.738
1.096
1.998
0.048
0.076
0.0062
L1
25.707
24.007
2.241
0.270
1.729
0.086
0.0012
ETKRR(S2)
24.751
23.051
2.900
0.321
1.678
0.112
0.0304
29
Table 15: Outliers scenario 2: bias and MSE for the parameter estimates. n
Perc.out
5%
10%
100
15%
20%
30%
βˆ 0
Method
βˆ 1
Comp. Time
Average
Bias
MSE
Average
Bias
MSE
WLS
6.381
4.681
0.557
1.680
0.319
0.036
0.0013
M
3.632
1.932
0.620
1.863
0.136
0.041
0.0026
MM
1.704
0.443
0.556
1.999
0.029
0.036
0.0091
L1
3.069
1.382
0.716
1.903
0.097
0.047
0.0015
ETKRR(S2)
1.703
0.436
0.547
2.000
0.028
0.035
0.0186
WLS
10.047
8.347
0.636
1.434
0.565
0.040
0.0016
M
6.314
4.614
1.179
1.673
0.326
0.082
0.0040
MM
1.701
0.452
0.570
1.999
0.030
0.037
0.0089
L1
4.301
2.601
0.827
1.815
0.184
0.055
0.0014
ETKRR(S2)
1.701
0.449
0.566
2.000
0.029
0.037
0.0268
WLS
14.914
13.214
0.744
1.113
0.887
0.046
0.0013
M
20.220
18.520
0.883
0.691
1.308
0.062
0.0022
MM
1.710
0.470
0.587
1.999
0.031
0.039
0.0089
L1
7.737
6.037
2.181
1.568
0.431
0.158
0.0014
ETKRR(S2)
1.890
0.642
2.073
1.986
0.044
0.156
0.0286
WLS
18.205
16.505
0.836
0.899
1.101
0.049
0.0010
M
21.741
20.041
0.845
0.584
1.415
0.058
0.0018
MM
1.704
0.478
0.601
1.999
0.031
0.039
0.0087
L1
21.578
20.878
4.456
0.492
1.507
0.315
0.0012
ETKRR(S2)
22.671
20.973
2.625
0.476
1.523
0.183
0.0508
WLS
24.282
22.582
0.893
0.510
1.489
0.044
0.0010
M
23.951
22.251
0.824
0.420
1.576
0.052
0.0022
MM
1.704
0.515
0.648
1.999
0.033
0.042
0.0088
L1
25.658
23.958
1.597
0.273
1.726
0.061
0.0012
ETKRR(S2)
24.421
22.721
1.654
0.338
1.661
0.064
0.0501
30
361
Outliers scenario 3
362
Tables 16 and 17 present the results for scenario 3, where leverage points are
363
considered. This is the hardest configuration for the methods evaluated in this
364
paper. Despite that, it can be observed that the ETKRR(S3) method presented
365
the best performance. For the ETKRR method, observe that the average values
366
of the parameter estimates βˆ 0 and βˆ 1 are the closest to the true parameters until
367
the percentage of outliers equal to 15%, specially for the parameter β 1 . The
368
ETKRR method also demonstrated the best average value and smallest bias in
369
comparison with the other methods when the percentage of outliers is at most
370
equal to 20%.
371
6. Applications on real data sets
372
This section considers applications on real data sets. The aim is to illus-
373
trate the usefulness of the ETKRR method and compare it with other robust
374
methods in real world applications. The robustness of each method will be
375
evaluated based on the percentage of change of the parameter estimates when
376
the outliers are suppressed from the data set. Based on this criterion it will
377
be possible to verify which approach will demonstrate more robustness to the
378
presence of outliers. Most of the real databases considered in this section are
379
available in the software R. It is important to emphasize that the outliers present
380
in the real data sets were not inputed, they can be considered as natural outliers
381
observations.
382
6.1. U.S. navy bachelors officer data
383
The main purpose of a robust regression method is to provide estimates for
384
the regression model that are less sensitive to the presence of outliers. This
385
data set is available in Table 5.2 of Ref. [21] and has 25 observation and 9 vari-
386
ables. We consider as response variable the average daily occupancy (Y). The
387
explanatory variables are the number of building wings (X1 ) and the monthly
388
man hours (X2 ).
31
Table 16: Outliers scenario 3: bias and MSE for the parameter estimates. n
Perc.out
5%
10%
50
15%
20%
30%
βˆ 0
Method
βˆ 1
Comp. Time
Average
Bias
MSE
Average
Bias
MSE
WLS
2.387
0.868
0.811
1.953
0.057
0.052
0.0010
M
2.806
1.185
0.854
1.922
0.082
0.056
0.0023
MM
2.143
0.830
0.942
1.968
0.056
0.063
0.0063
L1
2.561
1.079
1.011
1.939
0.073
0.066
0.0012
ETKRR(S3)
1.847
0.899
1.132
1.990
0.059
0.075
0.0028
WLS
3.818
2.213
0.902
1.857
0.142
0.058
0.0012
M
4.781
3.082
1.114
1.782
0.217
0.076
0.0023
MM
3.628
2.025
1.406
1.863
0.142
0.096
0.0063
L1
4.178
2.488
1.188
1.824
0.175
0.079
0.0011
ETKRR(S3)
2.319
1.122
1.380
1.956
0.076
0.094
0.0031
WLS
4.552
2.854
1.009
1.809
0.190
0.064
0.0013
M
5.476
3.777
1.200
1.734
0.265
0.081
0.0023
MM
4.570
2.909
1.517
1.797
0.204
0.104
0.0064
L1
4.938
3.240
1.297
1.771
0.228
0.086
0.0012
ETKRR(S3)
2.827
1.501
1.714
1.920
0.104
0.118
0.0036
WLS
5.361
3.661
1.106
1.756
0.243
0.071
0.0010
M
6.050
4.350
1.210
1.694
0.305
0.081
0.0022
MM
5.523
3.835
1.456
1.731
0.270
0.100
0.0062
L1
5.622
3.923
1.351
1.722
0.277
0.090
0.0012
ETKRR(S3)
3.679
2.199
2.029
1.858
0.154
0.141
0.0038
WLS
6.910
5.210
1.311
1.659
0.340
0.082
0.0010
M
6.887
5.187
1.260
1.637
0.362
0.083
0.0020
MM
6.774
5.074
1.314
1.644
0.355
0.087
0.0062
L1
6.626
4.926
1.424
1.653
0.346
0.092
0.0011
ETKRR(S3)
5.659
3.979
2.032
1.719
0.281
0.136
0.0044
32
Table 17: Outliers scenario 3: bias and MSE for the parameter estimates. n
Perc.out
5%
10%
100
15%
20%
30%
βˆ 0
Method
βˆ 1
Comp. Time
Average
Bias
MSE
Average
Bias
MSE
WLS
2.637
0.964
0.583
1.936
0.065
0.038
0.0014
M
3.242
1.546
0.648
1.891
0.109
0.043
0.0024
MM
2.437
0.851
0.730
1.947
0.059
0.049
0.0092
L1
2.895
1.223
0.720
1.915
0.085
0.047
0.0011
ETKRR(S3)
1.899
0.640
0.793
1.986
0.042
0.052
0.0054
WLS
3.392
1.695
0.647
1.885
0.114
0.042
0.0016
M
4.262
2.562
0.792
1.819
0.180
0.053
0.0025
MM
3.271
1.606
0.941
1.889
0.112
0.064
0.0091
L1
3.737
2.039
0.798
1.856
0.143
0.053
0.0014
ETKRR(S3)
2.114
0.736
0.864
1.971
0.049
0.058
0.0064
WLS
4.402
2.702
0.760
1.819
0.180
0.049
0.0013
M
5.249
3.549
0.905
1.750
0.249
0.061
0.0024
MM
4.616
2.919
1.039
1.794
0.205
0.071
0.0092
L1
4.758
3.058
0.941
1.784
0.215
0.063
0.0012
ETKRR(S3)
2.663
1.119
1.130
1.934
0.077
0.077
0.0078
WLS
5.104
3.404
0.853
1.774
0.225
0.055
0.0010
M
5.748
4.048
0.947
1.715
0.284
0.064
0.0023
MM
5.388
3.688
1.032
1.740
0.259
0.070
0.0091
L1
5.342
3.642
1.000
1.743
0.256
0.067
0.0011
ETKRR(S3)
3.237
1.399
1.641
1.890
0.115
0.097
0.0090
WLS
6.474
4.774
1.020
1.687
0.312
0.065
0.0009
M
6.507
4.807
0.990
1.664
0.335
0.067
0.0022
MM
6.399
4.699
1.037
1.671
0.328
0.070
0.0091
L1
6.231
4.531
1.074
1.681
0.318
0.071
0.0012
ETKRR(S3)
5.104
3.410
1.525
1.758
0.241
0.105
0.0099
33
389
Figure 3 presents the 3D scatter plot of the data and the fitted plan for each
390
regression method. The fitted plan for the ETKRR method considers the esti-
391
mator S3 for the width hyper-parameter γ2 . The presence of Y-space outliers
392
can justify this fact. Notice that due the presence of the top-middle point, some
393
regression plans are being pulling up. Table 18 shows that the method ETKRR
394
presented the lower percentage of change in the parameter estimates, followed
395
by L1 regression. Particularly, the MM-Estimator demonstrated a bad perfor-
1000
mance for this data set.
Y
600
●
800
OLS WLS M MM L1 KRR
400
● ●
200
●
50
● ● ●●
●
40
30
●● ●
●
20
10
● ●● ●
●
0 10000
8000
6000
4000
2000
●● ● ●●
0
X1
●
60
0
X2
Figure 3: U.S. navy bachelors data set: 3D scatter plot between the response and explanatory variables. 396
397
6.2. International calls from Belgium data
398
This data set, available in the R library robustbase, considers the number
399
of international calls from Belgium between 1950 and 1973, taken from the Bel-
400
gian Statistical Survey, published by the Ministry of Economy. The response
34
Table 18: US navy bachelors data set: Percentage change (%) in the parameter estimates after suppression of the outlier observations.
βˆ 0
βˆ 1
βˆ 2
OLS
823.45
205.48
32.10
WLS
82.04
10.80
43.94
M
30.39
141.85
14.16
MM
42.40
142.47
15.68
L1
22.23
82.17
12.83
ETKRR(S3)
18.84
44.61
8.77
Method
401
variable represents the number of calls (in tens of millions) . The covariate is
402
the year. Figure 4 presents the scatter plot of the data and the straight lines estimated for each method.
15 10 0
5
Number of Calls
20
OLS WLS M MM L1 KRR
50
55
60
65
70
Year (1950−1973)
Figure 4: International calls from Belgium data: Estimated straight line according to method. 403
404
It can be observed the presence of some Y-space outliers. It is possible iden-
405
tify that the ETKRR and MM-Estimator methods (red and yellow lines, respec-
406
tively) exhibit very close lines. The OLS and WLS methods (black and blue
407
lines, respectively) are strongly affected by the outliers. 35
408
From Table 19, it is verified that the methods MM-Estimator and ETKRR
409
presented the lower percentage of change in the parameter estimates. The es-
410
timator S3 for the width hyper-parameter γ2 was considered for the method ETKRR due the presence of Y-space outliers (scenario 1). Table 19: International calls from Belgium data set: Percentage change (%) in the parameter estimates after suppression of the outlier observation.
βˆ 0
βˆ 1
OLS
79.77
78.07
WLS
85.18
83.34
M
50.55
47.79
0.74
0.92
L1
29.13
27.12
ETKRR(S3)
16.81
15.94
Method
MM
411
412
6.3. Cloud point data
413
This data set, also available in the R library robustbase, contains the mea-
414
surements concerning the cloud point of a liquid. The cloud point is a measure
415
of the degree of crystallization in a stock. The response variable is the cloud
416
point (Y) and the covariate is Percentage I-8. Figure 5 presents the scatter plot
417
of the data and the straight lines estimated for each method.
418
It can be observed the presence of 3 leverage points. Moreover, ETKRR (red
419
line) was the method less affected by these leverage points. Finally, according
420
to Table 20, the method ETKRR presented the lower percentage of change in
421
the parameter estimates take into account the estimator S3 for the width hyper-
422
parameter γ2 , as expected from the results obtained with the synthetic data sets
423
of section 5.3.
424
6.4. Delivery time data
425
This data set is available in the R library robustbase, as well as in Ref. [22],
426
and considers 25 observation and 3 variables. We consider as response variable 36
30 28 26 22
24
Cloud point
32
OLS WLS M MM L1 KRR
0
2
4
6
8
10
Percentage I−8
Figure 5: Cloud point data: Estimated straight line according to method.
Table 20: Cloud point data set: Percentage change (%) in the parameter estimates after suppression of the outlier observations.
βˆ 0
βˆ 1
OLS
4.37
14.92
WLS
3.64
14.40
M
4.39
15.18
MM
1.18
4.72
L1
0.69
3.63
ETKRR(S3)
0.11
0.61
Method
37
427
the delivery time (Y) and as explanatory variables the number of products (X1 )
428
and the distance (X2 ). Figure 6 presents the 3D scatter plot of the data as well as the fitted regression plans.
●
80
●
30
● ●
20
● ●
● ●
Y
40
●
25
●
● ●
●
● ●
15
● ●
● ●
10
20
X1
●
60
OLS WLS M MM L1 KRR
●
● ● ● ●
5 0
0 1500
1000
500
0
X2
Figure 6: Delivery time data: 3D scatter plot between the response and explanatory variables. 429 430
The presence of leverage points located in the top-left-corner of the graphic
431
have pulling up some regression plans. The fitted plan for the ETKRR method
432
considered the estimator S3 for the width hyper-parameter. This is in accor-
433
dance with the results provided in the simulation study (section 5.3), due the
434
presence of leverage points. In general, Table 21 shows that the method ETKRR
435
and L1 regression have presented the lower percentage of change in the pa-
436
rameter estimates. It is worthy to note that the MM-Estimator presented a bad
437
performance for this data set, in comparison with L1 and ETKRR methods.
38
Table 21: Delivery time data set: Percentage change (%) in the parameter estimates after suppression of the outlier observations.
βˆ 0
βˆ 1
βˆ 2
OLS
142.32
21.73
26.18
WLS
61.62
18.79
10.88
M
60.13
13.68
24.07
MM
17.19
12.55
8.77
L1
16.34
4.19
3.65 × 10−14
2.42
6.07
13.82
Method
ETKRR(S3)
438
6.5. Kootenay river data
439
This data set considers the water-flow measurement of the Kootenay river
440
in the cities of Libby (Montana, USA) and Newgate (British Columbia, Canada).
441
It is available in the R library robustbase.
442
Figure 7 presents the scatter plot of the data and the straight lines estimated for each method.
25 20
Newgate
30
OLS WLS M MM L1 KRR
20
30
40
50
60
70
80
Libby
Figure 7: Kootenay data: Estimated straight line according to method. 443
444
The methods ETKRR and MM-Estimator (red and yellow lines, respectively) 39
445
exhibit the closest lines from the sample points. The M-Estimator and OLS
446
methods (black and magenta lines, respectively) are strongly affected by the
447
point located at right of the graph (X-space outlier). The estimator S1 for the
448
width hyper-parameter γ2 presented the best results for ETKRR method in this
449
data set. Moreover, Table 22 shows that the method ETKRR presents the lower
450
percentage of change in the parameter estimates after the suppression of the outlier observation. Table 22: Kootenay data set: Percentage change (%) in the parameter estimates after suppression of the outlier observation.
βˆ 0
βˆ 1
OLS
76.29
4.5782 × 103
WLS
56.25
8.7152 × 101
M
76.69
3.4564 × 103
0.20
0.1276
72.92
1.1183 × 102
0.11
0.0671
Method
MM L1 ETKRR(S1) 451
452
6.6. Public schools data
453
The data set named “Public Schools” is available in the R library sandwich.
454
The response variable represents the per capita spending on public schools
455
(Expenditure). The covariate is the per capita income, by state, in the USA
456
[23]. We have removed the state of Wisconsin because information about it
457
was missing.
458
459
Figure 8 illustrates the scatter plot including the straight lines estimated for each method assessed in this paper.
460
Notice that the ETKRR and MM-Estimator methods (red and yellow lines,
461
respectively) present very close regression lines and are less affected by the
462
leverage point located at the top-right area and the X-space outlier located in
463
middle-right area in the Figure 8. This can also be verified in the Table 23,
40
WLS M
500
MM L1 KRR
300
Expenditure
700
OLS
6000
7000
8000
9000
10000
11000
Income
Figure 8: Public school data: Estimated straight line according to method.
464
where the MM-Estimator and the method ETKRR present the lower percent-
465
age of change in the parameter estimates. The method ETKRR presented the
466
second best performance using the estimator S1. Moreover, the OLS method
467
(black line) exhibits the worst regression equation due to its higher percent-
468
age of change in the parameter estimates. It is important to mention that this
469
particular dataset presents outliers in two parts of the scatter plot. This sce-
470
nario was not considered in the simulation settings. Nevertheless, the ETKRR
471
method presented a competitive performance.
472
6.7. ETKRR overall performance on the real data sets.
473
The performance of the method ETKRR in the six real data sets demon-
474
strates that the proposed approach represents a competitive method when com-
475
pared with the classical robust methods. Table 24 ranks the methods for each
476
data set based on the percentage of change of the parameter estimates (rank 1
477
for the lower percentage of change). As expected, the OLS is the most sensitive
478
method to the presence of outliers. The methods WLS and M-Estimator have
479
exhibited the worst performance between the classical robust methods. The
41
Table 23: Public school data set: Percentage change (%) in the parameter estimates after suppression of the outlier observation.
βˆ 0
βˆ 1
OLS
82.28
24.82
WLS
30.63
7.12
M
66.04
12.80
0.99
0.06
20.12
3.96
6.35
0.53
Method
MM L1 ETKRR(S1)
480
methods MM-Estimator and L1 have outperformed the previous approaches
481
and have demonstrated a similar performance. Moreover, the method ETKRR
482
exhibited the best performance based on the average of the rank, corroborating
483
with the results presented in the simulation section. Table 24: Overall performance of the method based on percentage of change’s rank.
484
Method
Public
Kootenay
US Navy
Int. calls
Cloud
Delivery
Average Rank
OLS
6
6
6
5
5
6
5.66
WLS
4
3
5
6
4
5
4.50
M
5
5
3
4
6
4
4.50
MM
1
2
4
1
3
3
2.33
L1
3
4
2
3
2
1
2.50
ETKRR
2
1
1
2
1
2
1.50
7. Concluding remarks
485
A regression method based on kernel functions is proposed in this pa-
486
per. The Exponential-Type Kernel function based Robust Regression (ETKRR)
487
method allows the use of exponential-type kernel functions to penalize bad fit-
488
ted observations. The weights given to each observation are updated at each 42
489
iteration of the parameter estimate algorithm. A proof for the convergence of
490
the parameter estimate algorithm was given. Due to mathematical facilities, we
491
considered the Gaussian kernel for the parameter estimation algorithm, that
492
belongs to the family of “exponential-type” kernels.
493
The performance of the proposed parameter estimate algorithm is sensi-
494
tive to the computation of the initial value of the vector of parameters of the
495
regression model as well as to the width hyper-parameter estimator of the
496
exponential-type kernel functions. A simulation study with synthetic data sets,
497
gave insights about the appropriateness of some width hyper-parameter esti-
498
mators to X-space and Y-space outliers as well as leverage points in order to
499
improve the performance of the proposed approach in terms of bias and MSE
500
of the parameter estimates. Moreover, the OLS method is suitable to com-
501
pute the initial value of the vector of parameters for the proposed regression
502
method.
503
A comparative study between the ETKRR method and some classical ro-
504
bust regression approaches also have demonstrated the usefulness of the pro-
505
posed method. We have considered synthetic data sets with X-space outliers,
506
Y-space outliers and leverage points, in a Monte Carlo simulation framework
507
with 10, 000 replications, different sample sizes and increasing percentage of
508
outliers. Particularly, the ETKRR method have presented a similar perfor-
509
mance to OLS method when only clean observations were considered. In the
510
scenario with outliers observations in the Y-space, the ETKRR method pre-
511
sented the best performance at the level of 15% of outliers. Moreover, the
512
ETKRR method had about half of computational time when compared with
513
MM-Estimador method. In the scenario with outliers observations in X-space,
514
the method ETKRR presented the best performance when the proportion of
515
outliers is at most 10% (if compared with MM-Estimator) and 15% (if com-
516
pared with WLS, L1 and M-Estimator). The ETKRR approach presented the
517
best performance until 15% of outliers in the scenario where bad leverage ob-
518
servations are considered. For more than 15% of outliers, all robust methods
519
presented a poor performance in this scenario. Thus, the results suggested that 43
520
the proposed approach presented a competitive performance (or best) in sce-
521
narios that are comparable to those often found in real life problems. Finally,
522
concerning benchmark real data sets, it was observed that the ETKRR method
523
presented the best overall performance in comparison with the classical robust
524
methods.
525
Acknowledgments
526
The authors are grateful to the anonymous referees for their careful revi-
527
sion, valuable suggestions, and comments which improved this paper. The
528
authors would like to thank CAPES (National Foundation for Post-Graduated
529
Programs, Brazil) and CNPq (National Council for Scientific and Technological
530
Development, Brazil) for their financial support. The first author would like to
531
thank also FACEPE (Reseach Agency from the State of Pernambuco, Brazil).
532
533
[1] P. J. Huber, Robust regression: asymptotic, conjectures and monte carlo, The Annals of Statistics 1 (1973) 799–991.
534
[2] P. J. Huber, Robust Statistics, John Wiley and Sons Inc., New York, 1981.
535
[3] V. J. Yohai, High breakdown point and high efficiency robust estimates for
536
537
538
539
540
541
542
regression, The Annals of Statistics 15 (1973) 642–656. [4] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of Royal Statistical Society - Series B 58 (1996) 267–288. [5] P. J. Rousseeuw, A. M. Leroy, Robust Regression and Outlier Detection, John Wiley and Sons Inc., New York, 1987. [6] T. P. Ryan, Modern Regression Methods, John Wiley and Sons Inc., Hoboken, NJ, 2009.
543
[7] X. Tong, Y. Jin, L. Li, An improved weighted total least squares method
544
with applications in linear fitting and coordinate transformation, Journal
545
of Surveying Engineering 137 (2011) 120–128. 44
546
547
548
549
550
551
552
553
554
555
[8] Y. Fan, G. Qin, Z. Zhua, Variable selection in robust regression models for longitudinal data, Journal of Multivariate Analysis 109 (2012) 156–167. [9] I. Naseem, R. Togneri, M. Bennamoun, Robust regression for face recognition, Pattern Recognition 45 (2012) 104–188. [10] B. Scholkoepf, A. Smola, Learning with kernels, MIT Press, Cambridge MA, 2002. [11] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, 2004. [12] N. Cristianini, J. Shawe-Taylor, Introduction to Support Vector Machines, Cambridge University Press, Cambridge, 2000.
556
[13] T. Hofmann, B. Scholkoepf, A. Smola, A review of kernel methods in
557
machine learning, Tech. rep., Max-Planck-Institut fuer biologische Kyber-
558
netik (2006).
559
[14] P. J. Huber, Robust estimation of a location parameter, Ann. Math. Statist.
560
35 (1) (1964) 73–101. doi:10.1214/aoms/1177703732.
561
URL http://dx.doi.org/10.1214/aoms/1177703732
562
[15] P. Rousseeuw, V. Yohai, Robust Regression by Means of S-Estimators,
563
Springer US, New York, NY, 1984, pp. 256–272.
564
978-1-4615-7821-5_15.
565
URL http://dx.doi.org/10.1007/978-1-4615-7821-5_15
566
567
doi:10.1007/
[16] R. A. Maronna, R. D. Martin, V. J. Yohai, Robust Statistics: Theory and Methods, John Wiley and Sons Inc., Chichester, 2006.
568
[17] J. Mercer, Functions of positive and negative type and their connection
569
with the theory of integral equations, Philosophical Transactions of the
570
Royal Society A 209 (1909) 441–458.
45
571
[18] K. R. Mueller, S. Mika, G. R. Raetsch, K. Tsuda, B. Schloelkopf, An intro-
572
duction to kernel-based learning algorithms, IEEE Transactions on Neural
573
Networks 12 (2001) 181–202.
574
[19] R Core Team, R: A Language and Environment for Statistical Computing,
575
R Foundation for Statistical Computing, Vienna, Austria (2016).
576
URL https://www.R-project.org/
577
[20] B. Caputo, K. SIM, F. Furesjo, A. Mola, Appearance-based object recog-
578
nition using svms: which kernel should i use?, in: Proceedings of NIPS
579
workshop on Statistical methods for computational experiments in visual
580
processing and computer vision, 2002.
581
582
583
584
585
586
[21] R. H. Meyers, Classical and modern regression with applications, Duxbury Press, Belmont, CA, 1990. [22] D. C. Montgomery, Introduction to linear regression analysis, John Wiley and Sons Inc., New York, 1982. [23] W. H. Greene, Econometric Analysis, Macmillan Publishing Company, New York, 1993.
46