A robust regression method based on exponential-type kernel functions

A robust regression method based on exponential-type kernel functions

Author’s Accepted Manuscript A robust regression method based on exponentialtype kernel functions Francisco de A.T. De Carvalho, Eufrásio de A. Lima N...

674KB Sizes 0 Downloads 155 Views

Author’s Accepted Manuscript A robust regression method based on exponentialtype kernel functions Francisco de A.T. De Carvalho, Eufrásio de A. Lima Neto, Marcelo R.P. Ferreira www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(16)31550-8 http://dx.doi.org/10.1016/j.neucom.2016.12.035 NEUCOM17853

To appear in: Neurocomputing Received date: 11 March 2016 Revised date: 26 August 2016 Accepted date: 13 December 2016 Cite this article as: Francisco de A.T. De Carvalho, Eufrásio de A. Lima Neto and Marcelo R.P. Ferreira, A robust regression method based on exponentialtype kernel functions, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2016.12.035 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A robust regression method based on exponential-type kernel functions Francisco de A.T. De Carvalho1 Centro de Informatica, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes s/n - Cidade Universitaria, CEP 50740-560, Recife-PE, Brazil

Eufr´asio de A. Lima Neto and Marcelo R. P. Ferreira2 Universidade Federal da Para´ıba, Departamento de Estat´ıstica, Cidade Universit´aria, 58051-900, Jo˜ao Pessoa, PB, Brazil

Abstract Robust regression methods appear commonly in practical situations due the presence of outliers. In this paper we propose a robust regression method that penalize bad fitted observations (outliers) through the use of exponential-type kernel functions in the parameter estimator iterative process. Thus, the weights given to each observation are updated at each iteration in order to optimize a suitable objective function. The convergence of the parameter estimation algorithm is guaranteed with a low computational cost. Its performance is sensitive to the choice of the initial values for the vector of parameters of the regression model as well as to the width hyper-parameter estimator of the kernel functions. A simulation study with synthetic data sets revealed that some width hyper-parameter estimators can improve the performance of the proposed approach and that the ordinary least squares (OLS) method is a suitable choice for the initial values for the vector of parameters of the proposed regression method. A comparative study between the proposed method against some classical robust approaches (WLS, M-Estimator, MM-Estimator and L1 regression) and the OLS method is also considered. The performance of these 1 Corresponding

Author: [email protected] [email protected]

2 [email protected],

Preprint submitted to Neurocomputing

December 14, 2016

methods are evaluated based on the bias and mean squared error (MSE) of the parameter estimates, considering synthetic data sets with X-space outliers, Y-space outliers and leverage points, different sample sizes and percentage of outliers in a Monte Carlo framework. The results suggest that the proposed approach presents a competitive performance (or best) in outliers scenarios that are comparable to those found in real problems. The proposed method also exhibits a similar performance to the OLS method when no outliers are considered and about half of the computational time if compared with MM-Estimator method. Applications to real data sets corroborates the usefulness of the proposed method. Keywords: Exponential-type kernel functions, Regression models, Robust methods, Width hyper-parameter estimators

1

1. Introduction

2

Robust regression attempts to cope with outliers and leverage points. The

3

regression outliers are observations deviating from the linear model pattern of

4

the majority of the data, whereas leverage points are outlying in the space of

5

the predictor variables. Observations which are regression outliers and lever-

6

age points are called bad leverage points and a non-robust analysis of these

7

points typically leads to biased inferences.

8

The classical linear regression model can be presented in matrix notation as

9

y = Xβ +  = µ + ε, where y is an n × 1 vector of the response variables, X

10

is a n × p matrix of explanatory variables, β is a p × 1 vector of regression pa-

11

rameters, µ = Xβ is a n × 1 vector representing the mean of y and ε is an n × 1

12

vector containing the error terms with zero mean and unknown variance σ2 .

13

The ordinary least squares (OLS) estimator optimizes the regression parame-

14

ters β by making the sum of squared errors (SSE) as small as possible. The

15

OLS approach provides the optimal estimates of (β, σ2 ) when the error term is

16

normally distributed. However, it can behave badly when the error distribu-

17

tion is not normal, particularly when the errors are heavy-tailed. This is due to 2

18

sensitivity of the OLS method to residuals with large magnitude. In order to

19

overcome this issue, robust methods are a suitable alternative because they are

20

less sensitive to outliers.

21

With this in mind, standard approaches to robust regression have been pro-

22

posed over the last decades, such as the M-estimation [1, 2], MM-estimation [3]

23

and some methods based on minimizing an objective function different from

24

the sum of squared residuals, such as least absolute deviation (L1 regression),

25

which gained prominence after the work of Ref. [4]. Even for these models,

26

pronounced outliers can still have a considerable impact on the model, moti-

27

vating research into even more robust approaches. We recommend the papers

28

[5] and [6], which include outlier detection, to the readers interested in sur-

29

veys about these standard techniques. For further information about papers

30

in the area, see the most recent works by [7], [8] and [9]. Note that the major

31

part of the contributions that deal with robust methods lies in re-weighting the

32

outliers according to some known function.

33

On the other hand, estimation and learning methods using positive defi-

34

nite kernels have been proposed and become rather popular, particularly in

35

the statistical learning community over the last years. In particular, the idea of

36

using exponential-type kernel functions to measure the similarity between two

37

objects have been successfully applied in a wide range of fields, for instance,

38

computer vision, signal processing, clustering, pattern classification and bioin-

39

formatics and there is an extensive literature on the family of kernel-based al-

40

gorithms. For instance, see the papers [10], [11], [12], [13] and the references

41

contained therein.

42

Inspired from the kernel framework, this paper proposes a regression me-

43

thod, hereafter named ETKRR (Exponential-Type Kernel function based Ro-

44

bust Regression), that re-weights the outliers observations based on exponential-

45

type kernel functions, in such a way that the weights assigned to outlier obser-

46

vations are as small as possible in a iterative process in order to minimize a

47

suitable objective function. The motivation behind the choice of a particular

48

kernel can be very intuitive and straightforward depending on what kind of 3

49

information we are expecting to extract about the data. The ETKRR method

50

allows to use ”exponential-type” kernel functions that penalize, in different

51

ways, bad fitted observations. The weighting is done by calculating the dissim-

52

ilarities between the observed and predicted values of the response variable

53

and is updated at each iteration in order to optimize the objective function.

54

Different methods to compute the initial value of the the vector of parame-

55

ters in the parameter estimates algorithm as well as different estimators for the

56

width hyper-parameter of the ”exponential-type” kernel functions are consid-

57

ered. The convergence of the estimation algorithm is guaranteed with a low

58

computational cost. Particularly, the OLS is a special case of ETKRR approach

59

if we have perfectly aligned points around a line. Thus, in comparison with

60

other robust methods, the ETKRR is preferable when outliers are not present

61

in a given data set.

62

A comparative study between the ETKRR method and some classical re-

63

gression approaches (Weighted Least Squares (WLS), M-Estimator, MM-Esti-

64

mator, L1 regression and OLS) is considered. These methods will be evaluated

65

in terms of the bias and mean squared error (MSE) of the parameter estimates

66

taking into account X-space outliers, Y-space outliers, leverage points, different

67

sample sizes and percentage of outliers, in a Monte Carlo simulation frame-

68

work with 10,000 replications. Applications on real data sets also illustrate the

69

usefulness of the ETKRR method.

70

The paper is organized as follows: the next section gives a brief introduction

71

about some useful robust regression methods and the ordinary least squares

72

method. Section 3 reviews some concepts about exponential-type kernel func-

73

tions; the ETKRR method and the parameter estimate algorithm are discussed

74

in Section 4. Section 5 exhibits the Monte Carlo experiments comparing the

75

ETKRR method with some classical regression techniques and discuss the re-

76

sults obtained in the numerical analysis. Section 6 brings applications to real

77

data sets and, finally, Section 7 gives the concluding remarks.

4

78

79

2. Related Methods This section brings a brief description about the competing methods con-

80

sidered in this paper.

81

2.1. Ordinary Least Squares

82

Linear regression has been one of the most important statistical data analy-

83

sis tools. Given a set of independent and identically distributed (i.i.d.) obser-

84

vations (xi , yi ), i = 1, . . . , n, the linear regression model can be defined by yi = xi> β + ei ,

(1)

85

where xi is a p × 1 vector of values of the explanatory variables, β is an un-

86

known p × 1 vector, and the errors ei ’s are i.i.d. and independent from xi with

87

E(ei |xi ) = 0. The most commonly used estimate for β is the ordinary least

88

squares (OLS) estimate which minimizes the sum of squared errors, i.e, n

βˆ = arg min ∑ (yi − xi> β )2 . β

(2)

i =1

89

However, the ordinary least-squares estimators for linear models are very sen-

90

sitive to unusual values in the design space or outliers among y values. Even

91

one single atypical value may have a large effect on the parameter estimates.

92

Thus, robust regression analysis provides an alternative to a least squares re-

93

gression when fundamental assumptions are unfulfilled by the nature of the

94

data.

95

2.2. Weighted Least Squares

96

The weighted least squares method (WLS), which can be viewed as an ex-

97

tension of the OLS method, estimates β by solving the following minimization

98

problem n

βˆ = arg min ∑ wi (yi − xi> β )2 , β

99

100

with wi =

1 , where σˆ ii2 σˆ ii2

(3)

i =1

is the i-th element in the diagonal of σˆ 2 H = σˆ 2 X(X> X)−1 X>

and σˆ 2 is the estimated variance of the random error. 5

101

2.3. L1 Regression L1 regression estimates the median of y|x instead of the conditional mean. In L1 regression we estimate β by solving the minimization problem 1 βˆ = arg min n β

n

∑ |yi − xi> β |,

i =1

102

i.e., L1 regression minimizes the sum of the absolute errors while OLS (also

103

known as L2 regression) minimizes the sum of the squared errors. Conse-

104

quently, L1 gives much less weight to large deviations.

105

2.4. M-Estimator

106

The M-estimates [2] method suggests a solution for the normal equation

107

considering an appropriate weight function in order to penalize unusual ob-

108

servations. By replacing the least squares criterion (2) with a robust criterion,

109

the M-estimate [14] of β is n

βˆ = arg min ∑ ρ β

i =1

yi − xi> β σˆ

! .

(4)

110

where ρ(.)is a robust loss function and σˆ is an error scale estimate. The deriva-

111

tive of ρ, denoted by Ψ(.) = ρ0 (.), is called the influence function. In particular,

112

if ρ(t) = 21 t2 , then the solution is the OLS estimate.

113

2.5. MM-Estimator

114

MM estimation is a special type of M-estimation developed by Yohai (1987).

115

MM-estimates have become increasingly popular and are one of the most com-

116

monly employed robust regression techniques. The MM-estimates can be found

117

by a three-stage procedure. In the first stage, compute an initial consistent esti-

118

mate βˆ 0 with high breakdown point but possibly low normal efficiency. In the

119

second stage, compute a robust M-estimate of scale σˆ of the residuals based on

120

the initial estimate. In the third stage, find an M-estimate βˆ starting at βˆ 0 .

121

122

More details about the robust regression methods can be found in [15] and [16] and in the references within.

6

123

3. Exponential-type kernel functions Hereafter we briefly recall the basic theory about kernel functions. Let X =

{x1 , . . . , xn } be a non-empty set where xi ∈ R p . A function K : X × X → R is called a positive definite or Mercer kernel [17] if and only if K is symmetric (i.e.

K(xi , xk ) = K(xk , xi )) and the following inequality holds: n

n

∑ ∑ ci ck K(xi , xk ) ≥ 0,

∀n ≥ 2,

i =1 k =1 124

where cr ∈ R, ∀r = 1, . . . , n.

125

Let Φ : X → F be a non-linear mapping from the input space X to a high

126

dimensional feature space F . By applying the mapping Φ, the dot product

127

xi> xk in the input space is mapped to Φ(xi )> Φ(xk ) in the feature space. The key

128

idea supporting the kernel algorithms is that the non-linear mapping Φ does

129

not need to be explicitly specified because each Mercer kernel can be expressed

130

as K(xi , xk ) = Φ(xi )> Φ(xk ) [18].

131

One of the most relevant aspects in applications is that it is possible to com-

132

pute Euclidean distances in F without knowing explicitly Φ. This can be done

133

using the so called distance kernel trick [18]:

||Φ(xi ) − Φ(xk )||2

= (Φ(xi ) − Φ(xk ))> (Φ(xi ) − Φ(xk )) = Φ(xi )> Φ(xi ) − 2Φ(xi )> Φ(xk ) + Φ(xk )> Φ(xk ) = K(xi , xi ) − 2K(xi , xk ) + K(xk , xk ).

134

135

(5)

Table 1 gives some exponential-type kernel functions, which are often used in literature.

136

The width hyper-parameter γ2 plays an important role in the performance

137

of these kernels and must be carefully adjusted to the problem in question.

138

Overestimated values of γ2 lead to a linear projection and higher dimension

139

in the behavior of the exponential function. On the other hand, for under-

140

estimated values of γ2 , the function will lack regularization and the decision

141

boundary will be highly sensitive to noise in data. Moreover, note that the ex-

142

ponential kernel is closely related to the Gaussian kernel, with only the square 7

Table 1: Some ”exponential-type” kernel functions; γ > 0.

Gaussian

  ||x − x ||2 KG (xi , xk ) = exp − i 2 k 2γ

Exponential

  ||x − x || KE (xi , xk ) = exp − i 2 k 2γ

Laplacian

  ||x − xk || KL (xi , xk ) = exp − i γ

143

of the norm left out. The Laplace kernel is in turn completely equivalent to

144

the exponential kernel, except for being less sensitive for changes in γ2 . In this

145

way, because to the mathematical facilities provided by the Gaussian kernel we

146

will consider it for the ETKRR model.

147

4. The ETKRR method

148

The proposed method is inspired from the kernel framework. The estima-

149

tion of the parameters of the regression model is guided by the minimization

150

of an objective function that is defined as the sum of squared errors that is com-

151

puted not in the original space but in a high dimensional space F through a

152

non-linear mapping Φ applied to the response variable yi and its correspond-

153

ing mean µi . Thus, the objective function is defined as follows: n

S=

∑ ||Φ(yi ) − Φ(µi )||2

(6)

i =1 154

where µi = β 0 + β 1 xi1 + . . . + β p xip represents the mean of the response vari-

155

able yi , which is structured in terms of a set of explanatory variables xi1 , . . . , xip

156

and theirs respective parameters β 0 , β 1 , . . . , β p to be estimated.

157

As the non-linear mapping Φ is not explicitly knowing, the estimation of

158

the parameters of the regression model requires the use of the distance kernel 8

159

trick (see equation 5), thus the objective function becomes: n

S=

∑ {K(yi , yi ) − 2K(yi , µi ) + K(µi , µi )} .

(7)

i =1 160

Let KG be the gaussian kernel function listed in Table 1. Then, the follow-

161

ing property is verified: KG (yi , yi ) = KG (µˆ i , µˆ i ) = 1 (i = 1, . . . , n), and the

162

functional (7) can be rewritten as n

S=

∑ 2 [1 − KG (yi , µi )] ,

γ > 0.

(8)

i =1 163

Thus, as close we have yi and µi (i = 1, . . . , n) as close to zero becomes the

164

objective function for the ETKRR model.

165

166

Differentiating equation (8) with respect to the vector of parameters β =

( β 0 , . . . , β p ), we obtain the normal equations for the ETKRR model, given by n ∂S x (y − µˆ ) = 2 ∑ KG (yi , µˆ i ) ir i 2 i = 0, ∂β r γ i =1

167

168

r = 1, 2, . . . , p.

(9)

The solution of the normal equation system (9) is obtained through an iterative re-weighting least squares process, based on regression model y ∗ = X ∗ β + ∗ ,

(10)

169

where: y∗ = K1/2 y is a n × 1 weighted vector of response variable, X∗ = K1/2 X

170

is a n × p weighted matrix of explanatory variables, β is the vector of the regres-

171

sion parameters, ε∗ = K1/2 ε is a n × 1 weighted vector of independent errors

172

and K = diag(k11 , k22 , . . . , k nn ) is a n × n diagonal kernel weight matrix with k ii

173

defined as k ii = KG (yi , µi ), ∀i = 1, 2, . . . , n. Note that 0 ≤ KG (yi , µi ) ≤ 1, thus

174

observations with large residual will have a small weight for the parameter

175

estimates, as illustrated by Figure 1.

176

4.1. Estimation algorithm

177

178

The estimation of the vector of parameters β can be performed through the algorithm presented hereafter.

179

The algorithm starts from an initial solution (for instance, the OLS solution)

180

and alternates between the estimation of the regression model (coefficients and 9

input : X, y, a tolerance limit e, a maximum number of iterations T, the kernel width hyper-parameter γ2 ˆ µˆ output: β, Initialization: Set t = 0 and K(0) = In ;

// In is an n × n identity matrix

Compute βˆ (0) ;

// a start value for βˆ

Compute µˆ (0) = Xβˆ (0) ; Compute S(0) as given in Eq. (8); Fitting steps:

repeat Set t = t + 1; (t)

(t)

(t)

( t −1)

Compute K(t) = diag{k11 , . . . , k nn }, where k ii = KG (yi , µˆ i ( t −1) µˆ i

=

xiT βˆ (t−1) ;

Compute βˆ (t) = (X> K(t) X)−1 X> K(t) y; Compute µˆ (t) = Xβˆ (t) ; Compute S(t) as given in Eq. (8); until |S(t) − S(t−1) | ≤ ε or t ≥ T;

10

) and

0.4 0.0

weigth

0.8

Gaussian Kernel

−4

−2

0

2

4

residual

Figure 1: The Gaussian kernel function in ETKRR method.

181

predicted responses) and the estimation of the observation weights until the

182

convergence when the objective function becomes stationary.

183

As the kernel weight matrix needs to be computed only once at each it-

184

eration, the complexity of estimate the coefficients vector β is O(n( p + 1)),

185

where n is the number of observations and p is the number of variables. At

186

each iteration t the kernel weight matrix requires n kernel function evaluations

187

k ii = KG (yi , µˆ i

188

( t −1) βp xip ,

189

is O(n( p + 1)) for a single iteration.

190

4.2. Convergence properties

( t −1)

(t)

( t −1)

), i = 1, . . . , n, and, since µˆ i

( t −1)

= β0

( t −1)

+ β1

xi1 + · · · +

i = 1, . . . , n, the computational complexity of the ETKRR algorithm

191

The convergence of the proposed algorithm can be studied from two series:

192

wt = µˆ (t) and zt = S(wt ) = S(µˆ (t) ) (t = 0, 1, . . .). From an initial term w0 =

193

µˆ (0) = Xβˆ (0) , where βˆ (0) represents the start value for βˆ and K(0) = In , the

194

algorithm computes iteratively µˆ (t) until the convergence when the criterion S

195

achieves a stationary value.

196

Proposition 4.1. The series zt = S(wt ) decreases at each iteration and converges.

197

Proof. First of all, we will show that the following inequality S(t−1) = S(µˆ (t−1) ) ≥ S(t) = S(µˆ (t) )

11

(11)

198

hold, i.e., the series decreases at each iteration. The inequality (11) holds because S(t−1) = S(µˆ (t−1) ) =

n

∑2

i =1

( t −1)

where µˆ i

h

( t −1)

1 − KG (yi , µˆ i

i ) ,

γ > 0,

= xi> βˆ (t−1) , βˆ (t−1) = (X> K(t−1) X)−1 X> K(t−1) y, and S(t) = S(µˆ (t) ) =

n

h i (t) ˆ 2 1 − K ( y , µ ) , G i ∑ i

γ > 0,

i =1

(t)

where µˆ i

= xi> βˆ (t) , βˆ (t) = (X> K(t) X)−1 X> K(t) y and, at the t-th iteration, we

have that

n

βˆ (t) = arg min ∑ 2 [1 − KG (yi , µi )] , β

199

γ > 0.

i =1

Finally, because the series zt decreases and it is bounded (zt = S(wt ) ≥ 0),

200

it converges.

201

Proposition 4.2. The series wt = µˆ (t) converges.

202

Proof. Assume that the stationarity of the series zt is achieved at the iteration

203

t = T. Then, we have that z T = z T +1 , i.e., S(µˆ (T ) ) = S(µˆ (T +1) ). Moreover,

204

we have that µˆ (T ) = µˆ (T +1) , because µˆ = Xβˆ and βˆ = (X> KX)−1 X> Ky is

205

unique minimizing S. So, we can conclude that wT = wT +1 . This conclusion

206

holds for all t ≥ T and wt = wT ∀t ≥ T and it follows that the series wt = µˆ (t)

207

converges.

208

5. Monte Carlo experiments

209

This section performs a Monte Carlo simulation study to evaluate the be-

210

havior of the proposed parameter estimate algorithm of section 4.1 consider-

211

ing different alternatives for the initial value of the vector of parameters βˆ (0) as

212

well as for the width hyper-parameter estimators, taking into account a wide

213

range of scenarios. Moreover, a comparative study against some classical re-

214

gression methods (WLS, M-Estimator, MM-Estimator, L1 and OLS) is also con-

215

sidered. All experiments were implemented in the R language [19] and per-

216

formed on the same machine (OS: OSX El Captain, Version 10.11.1; 64-bits; 12

217

Memory: 8 GB 1600 MHz DDR3; Processor: Intel Core i5-4278U (Fourth Gen-

218

eration); CPU @ 2.6GHz). The code is available on GitHub through the link

219

https://github.com/marcelorpf/gkrreg.

220

5.1. Experimental settings

221

The parameter estimates for the ETKRR method were obtained according to

222

the algorithm presented in the Section 4.1 with a tolerance limit e = 1 × 10−10

223

and a maximum number of iterations T = 100.

224

The simulation settings reflect scenarios found in real problems. We fixed a

225

percentage of clean data and added the complementary percentage of outliers.

226

The clean data for the explanatory variable X came from an uniform distribu-

227

tion in the interval [10, 20], while the ith clean data for the response variable Y

228

is generated according to the linear model yi = 1.7 + 2xi + ei , where ei is the

229

random error term distributed as N (0; 1).

230

The outlier observations are generated according to three different scenar-

231

ios often found in real life problems (Y-space outliers, X-space outliers and

232

leverage points). Figure 2 illustrates these scenarios that take into account the

233

following bivariate Gaussian distribution: N2 (µ = (µ x , µy ), Σ = I2 ). Scenario

234

0 considers just clean observations. Scenario 1 considers Y-space outliers, since

235

their y values or response values are significantly different from the rest of the

236

data. Scenario 2 considers X-space outliers observations which present unusual

237

x values in relation to the rest of the data. Finally, scenario 3 considers a per-

238

centage of leverage observations, that is, the observations are shifted on both X

239

and Y axis. Table 2 describes the used parameters to generate the outliers from

240

the clean observations.

241

Concerning the width hyper-parameter γ2 , Ref. [20] proposed to estimate

242

it as the average between the percentiles 0.1 and 0.9 of the term (yi − µˆ OLS )2 i

243

computed on a random sample of the data with size n∗ = αn, where n is the

244

number of observations, α ∈ (0, 1) (we used α = 0.5) and µˆ OLS is the predicted i

245

value for yi considering the OLS method. In this paper, we considered the total of three estimators for the width 13

● ● ● ●









● ● ●● ● ● ● ● ●●









Y−space outliers



●● ● ● ●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●●●● ●● ● ●● ● ● ● ● ● ●●●●●● ● ● ● ● ● ●● ● ● ● ●● ● ●●

Y

40



25

30

Clean observations



35

●● ●● ●● ● ● ● ●● ●● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ●

●●●

● ● ● ● ●●

45

40 20

25

30

Y

35

●● ● ● ●●

50

● ● ●● ● ●● ●●



● ● ● ● ●● ● ● ● ● ●

● ● ●

20



10

12

14

16

18

20

10

12

14

16

18

20

X

45

50

X

● ●

● ● ●● ●



20

45 40 30

● ● ● ●

10

20 15

20

● ● ● ● ● ● ●● ● ●● ● ●● ●●●●● ●● ● ●● ● ● ● ● ●● ●●● ● ● ● ●● ● ●●● ●● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ●● ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ●● ●●●● ●

X−space outliers



25

● ● ●●● ● ● ● ● ●



35

● ●● ● ● ● ● ● ● ● ●● ● ● ●●

Y

●● ● ●

25

25

30

Y

35

40

● ● ● ● ●● ● ●● ● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●●● ● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ● ●● ●●● ● ● ● ● ●● ● ●● ● ●● ● ●●● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ●●● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●

Leverage points

●● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ●

10

15

X

20

25

X

Figure 2: The outlier scenarios

Table 2: Outlier parameters according to bivariate Gaussian distribution N2 (µ, I2 )

Scenario

Outlier’s scenario

µx

µy

0

No-outliers

-

-

1

Y-space outliers



max(y)+5

2

X-space outliers

max( x )+5



3

Leverage points

max( x )+5

max(y)+5

14

hyper-parameter γ2 and we evaluated how these estimators affect the performance of the algorithm presented in section 4.1 based on bias and MSE of the parameter estimates. The estimator S1 is the Caputo’s estimator [20]. The estimator S2 is the median of the values of rij = (yi − µˆ OLS )2 , ∀rij 6= 0, i, j = j 1, . . . , n. Finally, S3 is defined as n 1 (y − µˆ OLS )2 , ∑ i n − p − 1 i =1 i 246

where p is the number of explanatory variables.

247

5.2. Width hyper-parameter estimators and computation of the initial value of the vec-

248

tor of parameters.

249

Initially, we aim to evaluate how different initial values for the vector of

250

parameters affect the final parameter estimates of the algorithm given in the

251

section 4.1. We consider three different methods to compute the initial values

252

of the vector of parameter estimates βˆ (0) : the ordinary least squares estimate

253

(OLS), the L1 regression [16] and the MM-Estimator [3]. For all these cases

254

the width hyper-parameter γ2 was fixed as the S1 estimator. The evaluation

255

was based on the computation of the bias and mean squared error (MSE) of

256

the parameter estimates for the method ETKRR according to the choice of the

257

initial values (OLS, L1 and MM) for the vector of parameters. A Monte Carlo

258

simulation 10,000 times repeated was considered for each configuration.

259

Tables 3 to 6 present the bias and MSE of the parameter estimates accord-

260

ing to the initial value used to compute βˆ (0) , taking into account the scenarios

261

described in Table 2. They were computed for the sample sizes n = 50 and

262

n = 100.

263

Note that all methods (OLS, L1 and MM) used as initial values produced

264

parameter estimates very close to the true parameters in scenario 0 for all sam-

265

ple sizes (Table 3) and in scenario 1 with 5% of outliers (Table 4). Moreover,

266

when the percentage of outliers is increased to 10% in scenario 1 the algorithm

267

presents better results with OLS for both sample sizes, n = 50 and n = 100.

15

268

It can be also observed that the OLS method still provided the best parame-

269

ter estimates when the percentage of outliers is increased up to 30%. The OLS

270

method is also recommended in scenario 2 for the percentage of outliers up

271

to 15% (Table 5). In this scenario, when we have 20% or 30% of outliers, the

272

methods presented a quite similar performance. Finally, in scenario 3 (Table 6),

273

although the OLS method presented slightly better parameter estimates, we

274

cannot say that the choice of the method to compute βˆ (0) does affect the re-

275

sults. Thus, we recommend to consider the OLS as the method to compute

276

βˆ (0) in scenarios 0, 1 and 2. Moreover, we also recommend the OLS method to

277

compute βˆ (0) in scenario 3 for the sake of simplicity. Table 3: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 0 (Clean observations). n

50

100

βˆ 0

Start Value

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

OLS

1.708

0.609777

0.769106

2.000

0.039932

0.050297

L1

1.708

0.609822

0.769217

2.000

0.039938

0.050302

MM

1.708

0.609855

0.769238

2.000

0.039946

0.050305

OLS

1.706

0.428355

0.536767

2.000

0.028093

0.035180

L1

1.706

0.428450

0.536838

2.000

0.028099

0.035184

MM

1.706

0.428401

0.536819

2.000

0.028095

0.035182

Now, we aim to evaluate how the estimators of the width hyper-parameter

278

279

γ2

280

algorithm in terms of bias and MSE. Tables 7 to 10 show the results of the

281

simulation study. All results considered the OLS method as initial values for

282

the vector of parameters.

presented in section 5.1 affect the performance of the parameter estimation

283

It can be seen in Table 7 the results observed in scenario 0. We conclude

284

that the estimator S1 presented a slight advantage in relation to S2 and S3. The

285

estimator S3 presented the worst performance for the width hyper-parameter

286

γ2 . Table 8 presents the results for scenario 1 (Y-space outliers). We c observed

287

that the estimators S2 and S3 showed a quite similar performance in terms of

16

Table 4: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 1 (Y-space outliers). n

Perc.out

5%

10%

50

15%

20%

30%

5%

10%

100

15%

20%

30%

βˆ 0

Start Value

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

OLS

1.709

0.634914

0.797223

2.003

0.042032

0.052586

L1

1.712

0.636436

0.798247

2.003

0.042036

0.052593

MM

1.712

0.635412

0.797374

2.003

0.041983

0.052554

OLS

1.792

0.665063

0.829221

2.011

0.044798

0.055236

L1

1.816

0.673648

0.838127

2.011

0.044907

0.055459

MM

1.820

0.670898

0.831721

2.011

0.044800

0.055261

OLS

1.842

0.711230

0.882880

2.017

0.048791

0.059288

L1

1.894

0.733170

0.901370

2.017

0.049140

0.059799

MM

1.904

0.738003

0.906416

2.017

0.049287

0.060004

OLS

1.932

0.781798

0.954722

2.025

0.054010

0.063878

L1

2.059

0.853621

1.013820

2.024

0.055392

0.066023

MM

2.082

0.860618

1.018809

2.024

0.055417

0.066249

OLS

2.226

1.007429

1.154879

2.046

0.069767

0.076472

L1

2.770

1.400957

1.395382

2.041

0.074591

0.085595

MM

2.894

1.500895

1.456148

2.040

0.076278

0.088456

OLS

1.711

0.439884

0.550732

2.006

0.029511

0.036412

L1

1.715

0.440522

0.551257

2.006

0.029493

0.036423

MM

1.715

0.440725

0.551612

2.006

0.029482

0.036399

OLS

1.760

0.462924

0.577045

2.010

0.031645

0.038439

L1

1.775

0.465670

0.579857

2.010

0.031753

0.038625

MM

1.774

0.465432

0.579120

2.010

0.031763

0.038666

OLS

1.856

0.509206

0.620153

2.018

0.035901

0.041548

L1

1.908

0.533588

0.639165

2.018

0.036203

0.041951

MM

1.919

0.538405

0.640340

2.018

0.036258

0.042096

OLS

1.949

0.566382

0.663505

2.025

0.040312

0.044372

L1

2.074

0.637127

0.705277

2.025

0.041079

0.045505

MM

2.106

0.655613

0.716427

2.024

0.041268

0.045955

OLS

2.251

0.801614

0.840609

2.043

0.055116

0.053883

L1

2.763

1.196070

1.007438

2.040

0.057859

0.060187

MM

2.887

1.298831

1.035883

2.039

0.058465

0.061779

17

Table 5: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 2 (X-space outliers). n

Perc.out

5%

10%

50

15%

20%

30%

5%

10%

100

15%

20%

30%

βˆ 0

Start Value

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

OLS

1.838

0.661443

0.818520

1.990

0.043678

0.053866

L1

2.142

0.807842

0.936373

1.969

0.054455

0.062657

MM

2.179

0.830120

0.963676

1.966

0.056080

0.064668

OLS

1.900

0.708920

1.005415

1.986

0.046984

0.068713

L1

6.166

4.584316

5.147559

1.684

0.323673

0.364805

MM

8.738

7.098527

5.970376

1.501

0.502349

0.423401

OLS

5.015

3.769306

7.490316

1.756

0.27277

0.550184

L1

110.437

8.843221

8.051954

1.380

0.627115

0.572677

MM

115.869

14.19444

6.646001

0.995

1.006148

0.471304

OLS

22.047

20.37075

5.122490

0.522

1.479040

0.357634

L1

21.268

19.59096

6.343609

0.580

1.421528

0.450756

MM

20.542

18.85129

4.759383

0.662

1.338261

0.336347

OLS

24.722

23.02172

2.866738

0.323

1.677110

0.110791

L1

24.820

23.11966

3.074839

0.318

1.682449

0.118428

MM

24.155

22.45514

1.533297

0.407

1.593347

0.094365

OLS

1.820

0.459912

0.565744

1.991

0.030415

0.037191

L1

2.459

0.876644

0.875098

1.946

0.060890

0.060080

MM

2.621

1.018729

0.971732

1.935

0.071117

0.067128

OLS

1.835

0.481912

0.592555

1.990

0.031976

0.039179

L1

4.508

2.849042

3.001795

1.801

0.201301

0.212278

MM

6.101

4.421047

3.884832

1.689

0.312696

0.274986

OLS

4.457

3.0705940

6.825146

1.798

0.222514

0.501557

L1

111.351

9.6816040

7.945892

1.315

0.686950

0.564497

MM

118.030

16.331596

4.785727

0.841

1.158690

0.339760

OLS

22.738

21.04012

2.414839

0.472

1.527747

0.166687

L1

21.869

20.17852

4.959773

0.534

1.466850

0.356829

MM

21.609

19.90915

2.170408

0.587

1.412763

0.152295

OLS

24.419

22.71850

1.694834

0.338

1.661689

0.066461

L1

24.432

22.73208

1.812080

0.336

1.664272

0.070509

MM

24.101

22.40098

0.874208

0.411

1.589396

0.050820

18

Table 6: Bias and MSE of the parameter estimates according to the methods (OLS, L1 and MM) for the scenario 3 (Leverage points). n

Perc.out

5%

10%

50

15%

20%

30%

5%

10%

100

15%

20%

30%

βˆ 0

Start Value

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

OLS

3.565

1.890510

0.998692

1.869

0.132862

0.067174

L1

3.585

1.910962

1.009286

1.867

0.134296

0.067945

MM

3.589

1.914571

1.012603

1.867

0.134546

0.068170

OLS

5.406

3.705959

1.132789

1.740

0.260312

0.076817

L1

5.413

3.713813

1.136892

1.739

0.260852

0.077082

MM

5.418

3.718180

1.139170

1.739

0.261154

0.077245

OLS

5.917

4.217446

1.137684

1.704

0.295867

0.077259

L1

5.923

4.222865

1.139960

1.704

0.296231

0.077403

MM

5.925

4.225435

1.141311

1.704

0.296409

0.077495

OLS

6.372

4.671863

1.175514

1.673

0.327273

0.079162

L1

6.374

4.674489

1.176368

1.673

0.327446

0.079208

MM

6.374

4.674062

1.175688

1.673

0.327422

0.079173

OLS

7.075

5.374844

1.189473

1.625

0.374564

0.079409

L1

7.076

5.375981

1.189593

1.625

0.374634

0.079414

MM

7.076

5.375793

1.189656

1.625

0.374622

0.079415

OLS

4.030

2.330987

0.771953

1.836

0.163987

0.052493

L1

4.042

2.342148

0.777821

1.835

0.164764

0.052913

MM

4.044

2.344828

0.779386

1.835

0.164952

0.053036

OLS

4.940

3.240474

0.846693

1.772

0.227690

0.057890

L1

4.948

3.248258

0.848942

1.772

0.228227

0.058052

MM

4.951

3.250806

0.850860

1.772

0.228405

0.058184

OLS

5.691

3.990905

0.892398

1.720

0.279769

0.060982

L1

5.694

3.993894

0.893858

1.720

0.279971

0.061079

MM

5.695

3.994590

0.894313

1.720

0.280018

0.061102

OLS

6.078

4.377678

0.914409

1.694

0.306134

0.062323

L1

6.080

4.379860

0.914634

1.694

0.306279

0.062336

MM

6.080

4.379808

0.914768

1.694

0.306276

0.062350

OLS

6.702

5.001932

0.973949

1.652

0.348264

0.066097

L1

6.703

5.002987

0.974001

1.652

0.348330

0.066098

MM

6.703

5.002586

0.973995

1.652

0.348306

0.066100

19

288

bias and MSE up to 10% of outliers. Moreover, when the percentage of outliers

289

was 15% or higher the estimator S3 presented a slightly better performance.

290

Still in this scenario, the estimator S1 presented the worst performance. It can

291

be observed from Table 9 that the estimator S2 presented the best performance

292

in scenario 2 (X-space outliers) up to 15% of outliers. In this scenario, when

293

the percentage of outliers is greater than 15%, the results are inconclusive due

294

to the poor performance of all considered estimators. Finally, according to Ta-

295

ble 10, the estimator S3 is the best choice in scenario 3, whatever the percentage

296

of bad leverage points.

297

Based on the simulation results we recommend the choice of the follow-

298

ing width hiper-parameter γ2 for the ETKRR method, according to the outlier

299

scenario:

300

301

• for the scenario 0 (the datasets have just clean observations), S1 is the recommended estimator;

302

• for the scenario 1 (datasets with Y-space outliers), whatever the percent-

303

age of outliers, S3 is the recommended estimator (up to 10% of outliers,

304

estimator S2 can also be used);

305

306

307

308

309

• for the scenario 2 (datasets with X-space outliers), S2 is the recommended estimator; • finally, for the scenario 3 (datasets with leverage points), S3 is the recommended estimator.

5.3. Comparative study with robust methods

310

This section considers a comparative study between the ETKRR method

311

and some classical robust regression methods (WLS, M-Estimator, MM-Esti-

313

mator and L1), as well as the OLS method. The parameter estimates algorithm for the ETKRR method uses the OLS to estimate β (ˆ0) and the estimator S1 for

314

the width hyper-parameter γ2 in the scenario 0, the estimators S2 and S3 in the

312

20

Table 7: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 0 (Clean observations). n

50

100

βˆ 0

γ2 estimator

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

S1

1.706

0.606571

0.762750

2.000

0.039663

0.049894

S2

1.706

0.607502

0.763946

2.000

0.039720

0.049972

S3

1.721

0.950831

1.197118

1.998

0.062221

0.078308

S1

1.696

0.427941

0.535824

2.000

0.027951

0.035088

S2

1.696

0.428609

0.537025

2.000

0.028000

0.035169

S3

1.688

0.666771

0.840075

2.001

0.043898

0.055167

315

scenario 1, the estimator S2 in the scenario and 2 and finally the estimator S3 in

316

the scenario 3.

317

We considered the outlier scenarios of Table 2 as well as two different sam-

318

ple sizes n = {50, 100} and five different percentage of outliers p.out = {5%,

319

10%, 15%, 20%, 30%} for each outlier scenario, in a total of 32 different config-

320

urations. A Monte Carlo simulation based on 10, 000 replicates was considered

321

for each configuration. Bias and mean squared error (MSE) of the parameter

322

estimates as well as the computational time have been evaluated for each re-

323

gression method (ETKRR, WLS, M-Estimator, MM-Estimator, L1 and OLS).

324

Outliers scenario 0

325

Table 11 presents the performance of each method in scenario 0, where only

326

clean observations are considered. The ETKRR approach presents the closest

327

performance to OLS method. In terms of bias and MSE, the other competing

328

models present a worse performance, when compared with the ETKRR and

329

OLS methods. The ETKRR method also has the second lowest computational

330

time (Comp. Time) among the robust approaches evaluated. Thus, the simula-

331

tion results suggest that the ETKRR method is appropriate for data sets without

332

outliers.

21

Table 8: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 1 (Y-space outliers). n

Perc.out

5%

10%

50

15%

20%

30%

5%

10%

100

15%

20%

30%

βˆ 0

γ2 estimator

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

S1

1.691

0.616940

0.778125

2.005

0.041004

0.051465

S2

1.673

0.614689

0.774935

2.002

0.040409

0.050930

S3

1.676

0.626423

0.789505

2.001

0.041172

0.051859

S1

1.785

0.661896

0.827249

2.012

0.044802

0.055222

S2

1.682

0.639721

0.803219

2.002

0.041935

0.052606

S3

1.695

0.640378

0.803971

2.000

0.041843

0.052523

S1

1.833

0.704721

0.875125

2.018

0.048472

0.058413

S2

1.657

0.658064

0.824814

2.005

0.043340

0.054177

S3

1.672

0.656388

0.823248

2.003

0.042931

0.053862

S1

1.950

0.784339

0.953450

2.024

0.053968

0.064074

S2

1.666

0.681434

0.859305

2.007

0.045366

0.056792

S3

1.683

0.678400

0.856095

2.003

0.044601

0.056179

S1

2.241

1.028277

1.173735

2.045

0.070507

0.077524

S2

1.613

0.748447

0.938207

2.021

0.052560

0.062853

S3

1.621

0.742235

0.930869

2.015

0.050263

0.061413

S1

1.727

0.433235

0.542686

2.005

0.028892

0.035997

S2

1.697

0.428589

0.537861

2.001

0.027976

0.035235

S3

1.701

0.431702

0.541973

2.000

0.028123

0.035458

S1

1.766

0.462638

0.576972

2.010

0.031552

0.038505

S2

1.687

0.447802

0.562343

2.002

0.029388

0.036911

S3

1.696

0.447701

0.562767

2.000

0.029325

0.036866

S1

1.868

0.509394

0.616437

2.017

0.035653

0.041469

S2

1.685

0.461308

0.577899

2.003

0.030539

0.038096

S3

1.698

0.460436

0.576985

2.001

0.030254

0.037873

S1

1.964

0.575540

0.671378

2.024

0.040258

0.044771

S2

1.670

0.484889

0.605958

2.006

0.032166

0.039837

S3

1.682

0.483067

0.603838

2.003

0.031606

0.039439

S1

2.233

0.786075

0.828176

2.044

0.055518

0.053370

S2

1.620

0.523976

0.650317

2.019

0.037784

0.043411

S3

1.624

0.520282

0.645929

2.014

0.035772

0.042565

22

Table 9: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 2 (X-space outliers). n

Perc.out

5%

10%

50

15%

20%

30%

5%

10%

100

15%

20%

30%

βˆ 0

γ2 estimator

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

S1

1.816

0.639645

0.798985

1.991

0.042330

0.052729

S2

1.677

0.615404

0.775885

2.001

0.040417

0.050962

S3

1.677

0.624343

0.786545

2.001

0.041036

0.051671

S1

1.886

0.699032

0.907273

1.987

0.046295

0.060616

S2

1.700

0.642099

0.806081

2.000

0.041966

0.052667

S3

1.699

0.642952

0.807201

2.000

0.042024

0.052740

S1

5.312

4.051649

7.730725

1.734

0.294069

0.569120

S2

2.208

1.167070

3.544921

1.962

0.081581

0.264678

S3

2.918

1.852625

5.248582

1.908

0.133052

0.391537

S1

22.112

20.432730

5.028495

0.518

1.483233

0.350089

S2

22.127

20.488680

7.748253

0.484

1.519741

0.517908

S3

22.018

20.379950

7.536004

0.495

1.509371

0.509895

S1

24.728

23.027820

2.881520

0.323

1.677190

0.110917

S2

25.036

23.336080

5.506787

0.278

1.721735

0.211789

S3

24.760

23.060120

4.607061

0.297

1.702578

0.174031

S1

1.834

0.454470

0.556272

1.991

0.029957

0.036672

S2

1.703

0.429001

0.538578

2.000

0.027956

0.035244

S3

1.702

0.432174

0.542490

2.000

0.028150

0.035491

S1

1.836

0.482659

0.594037

1.990

0.031966

0.039271

S2

1.698

0.448020

0.563317

2.000

0.029344

0.036900

S3

1.697

0.449180

0.564835

2.000

0.029421

0.037002

S1

4.435

3.034044

6.779746

1.799

0.220067

0.498810

S2

1.889

0.635219

2.068811

1.986

0.043608

0.156031

S3

2.229

0.967306

3.432958

1.960

0.068413

0.256715

S1

22.750

21.052830

2.532885

0.472

1.527942

0.174635

S2

22.929

21.246200

5.068086

0.407

1.593688

0.324366

S3

22.841

21.158770

4.891182

0.417

1.583701

0.321615

S1

24.417

22.717200

1.684653

0.338

1.662079

0.064995

S2

24.198

22.497810

4.300836

0.311

1.688568

0.165900

S3

24.108

22.408390

3.275091

0.323

1.676774

0.123583

23

Table 10: Bias and MSE of the parameter estimates according to the methods for estimate the gaussian kernel hyperparameter (S1, S2 and S3) for the scenario 3 (Leverage points). n

Perc.out

5%

10%

50

15%

20%

30%

5%

10%

100

15%

20%

30%

βˆ 0

γ2 estimator

βˆ 1

Average

Bias

MSE

Average

Bias

MSE

S1

3.541

1.866300

0.978264

1.870

0.131455

0.066192

S2

3.162

1.500139

0.867279

1.897

0.105596

0.058168

S3

1.805

0.884509

1.117948

1.992

0.058743

0.074350

S1

5.403

3.703708

1.135771

1.740

0.260268

0.077082

S2

5.221

3.521295

1.097968

1.752

0.247708

0.074500

S3

2.323

1.117845

1.363483

1.956

0.076204

0.092876

S1

5.901

4.201088

1.136392

1.705

0.294613

0.076738

S2

5.780

4.079860

1.119582

1.714

0.286375

0.075645

S3

2.793

1.469331

1.672238

1.922

0.101831

0.115509

S1

6.387

4.687066

1.175533

1.672

0.327865

0.079226

S2

6.306

4.605586

1.169140

1.678

0.322446

0.078826

S3

3.675

2.194258

2.042215

1.860

0.154099

0.141603

S1

7.086

5.386258

1.198691

1.625

0.375446

0.079545

S2

7.045

5.344782

1.198882

1.627

0.372847

0.079542

S3

5.651

3.970430

2.039129

1.720

0.280805

0.136341

S1

4.042

2.342226

0.766873

1.835

0.164697

0.052045

S2

3.733

2.033590

0.682655

1.857

0.143148

0.046050

S3

1.899

0.632342

0.783932

1.986

0.042188

0.052137

S1

4.917

3.217475

0.842000

1.774

0.225941

0.057509

S2

4.711

3.010908

0.797985

1.788

0.211651

0.054427

S3

2.118

0.751452

0.882339

1.970

0.050994

0.059599

S1

5.698

3.997522

0.884449

1.720

0.280053

0.060375

S2

5.581

3.880767

0.867923

1.728

0.272123

0.059258

S3

2.630

1.105629

1.115076

1.934

0.076810

0.076805

S1

6.103

4.402573

0.914198

1.692

0.307816

0.062141

S2

6.021

4.320764

0.905684

1.698

0.302365

0.061571

S3

3.290

1.685910

1.411286

1.887

0.118486

0.098083

S1

6.699

4.998802

0.972223

1.652

0.348159

0.066043

S2

6.655

4.954697

0.970150

1.655

0.345377

0.065893

S3

5.084

3.390093

1.538221

1.760

0.240278

0.106280

24

Table 11: Outliers scenario 0: bias and MSE for the parameter estimates.

Method

βˆ 0

n

βˆ 1

Comp. Time

Average

Bias

MSE

Average

Bias

MSE

OLS

1.706

0.605

0.758

1.999

0.040

0.050

0.0014

WLS

1.706

0.645

0.807

1.999

0.042

0.053

0.0015

M

1.707

0.622

0.780

1.999

0.041

0.051

0.0031

1.708

0.627

0.787

1.999

0.041

0.052

0.0079

L1

1.707

0.760

0.954

1.999

0.050

0.063

0.0019

ETKRR(S1)

1.706

0.605

0.758

1.999

0.040

0.050

0.0018

OLS

1.696

0.428

0.537

2.000

0.028

0.035

0.0016

WLS

1.692

0.455

0.570

2.001

0.030

0.037

0.0016

M

1.696

0.440

0.551

2.000

0.029

0.036

0.0031

1.696

0.442

0.554

2.000

0.029

0.036

0.0122

L1

1.699

0.538

0.676

2.000

0.035

0.044

0.0019

ETKRR(S1)

1.696

0.428

0.537

2.000

0.028

0.035

0.0027

MM

MM

50

100

25

333

Outliers scenario 1

334

Scenario 1 considers outliers observations in Y-space. Table 12 presents

335

a comparative study between the methods WLS, L1 and ETKRR(S2,S3) . The

336

ETKRR method outperforms both approaches, WLS and L1, since it presents

337

the lower bias and the higher precision (lower MSE) for the parameter esti-

338

mates in all sample sizes. The ETKRR method exhibits satisfactory results be-

339

tween 5% and 20% of outliers in the sample. Still in this outlier range, the

340

method ETKRR presents an average value to the intercept (β 0 ) close to the true

341

parameter, in contrast with the methods WLS and L1.

342

Table 13 gives the comparison between the methods ETKRR(S2,S3) , M-Estimator

343

and MM-Estimator. Note that the ETKRR outperforms the M-Estimator in the

344

larger part of configurations, specially for the coefficient β 0 . At the level of

345

30% of outliers, the ETKRR presents a lower bias and a higher precision in the

346

parameter estimates when compared with the M-Estimator method. The com-

347

parative study between the MM-Estimator and ETKRR methods demonstrates

348

that both methods present a similar performance until 20% of outliers, with

349

ETKRR exhibiting a slightly small bias and MSE. Particularly, until the level of

350

20% of outliers, the ETKRR method is very competitive. Moreover, the ETKRR

351

has about half of computational time in comparison with the MM-Estimator.

352

Outliers scenario 2

353

Scenario 2 represents outlier observations in X-space. Based on the results

354

presented in the tables 14 and 15, we conclude that the method ETKRR(S2) out-

355

performs the methods WLS, M-Estimator and L1 regression based on the bias

356

and the MSE of the parameter estimates until the level of 15% of outliers. More-

357

over, the method ETKRR presented better results than method MM-Estimator

358

unitl 10% of outliers. This level represents a frequent outlier percentage in real

359

problems. After 15% of outliers just the MM-Estimator showed a good perfor-

360

mance.

26

Table 12: Outliers scenario 1: bias and MSE for the parameter estimates. Comparison between WLS, L1 and ETKRR(S2,S3) methods. n

Perc.out

βˆ 0

Method

βˆ 1

Comp. Time

Average

Bias

MSE

Average

Bias

MSE

WLS

2.912

1.784

1.858

1.980

0.098

0.121

0.0016

L1

1.778

0.790

0.985

1.998

0.052

0.065

0.0020

ETKRR(S2)

1.712

0.619

0.777

1.999

0.040

0.051

0.0025

WLS

5.144

3.724

2.807

1.941

0.154

0.184

0.0016

L1

1.874

0.825

1.024

1.999

0.053

0.067

0.0020

ETKRR(S3)

1.702

0.643

0.807

2.000

0.042

0.052

0.0032

WLS

6.209

4.740

3.207

1.923

0.179

0.212

0.0016

L1

1.914

0.852

1.050

2.001

0.055

0.069

0.0020

ETKRR(S3)

1.680

0.652

0.818

2.002

0.042

0.053

0.0034

WLS

7.668

6.124

3.632

1.887

0.212

0.241

0.0016

L1

2.032

0.924

1.115

1.999

0.058

0.073

0.0022

ETKRR(S3)

1.676

0.686

0.863

2.003

0.045

0.056

0.0039

WLS

11.009

9.375

4.435

1.800

0.289

0.300

0.0017

L1

2.276

1.100

1.260

2.000

0.066

0.083

0.0021

ETKRR(S3)

1.619

0.732

0.921

2.015

0.049

0.060

0.0052

WLS

3.517

1.990

1.536

1.970

0.083

0.100

0.0018

L1

1.784

0.551

0.691

2.000

0.036

0.045

0.0021

ETKRR(S2)

1.698

0.435

0.546

2.000

0.028

0.035

0.0041

WLS

4.738

3.133

1.936

1.950

0.109

0.126

0.0017

L1

1.842

0.573

0.709

2.000

0.037

0.046

0.0020

ETKRR(S3)

1.699

0.449

0.565

2.000

0.029

0.037

0.0045

WLS

6.459

4.797

2.316

1.917

0.139

0.152

0.0020

L1

1.949

0.624

0.743

1.999

0.039

0.049

0.0021

ETKRR(S3)

1.700

0.466

0.582

2.000

0.030

0.038

0.0058

WLS

7.820

6.136

2.598

1.886

0.167

0.172

0.0018

L1

2.027

0.670

0.778

1.999

0.041

0.051

0.0020

ETKRR(S3)

1.679

0.476

0.598

2.003

0.031

0.039

0.0061

WLS

10.996

9.302

3.196

1.804

0.238

0.214

0.0017

L1

2.266

0.842

0.885

2.000

0.046

0.058

0.0021

ETKRR(S3)

1.636

0.523

0.655

2.013

0.035

0.043

0.0080

5%

10%

50 15%

20%

30%

5%

10%

100 15%

20%

30%

27

Table 13: Scenario 1: bias and MSE for the parameter estimates. Comparison between M-Estimator, MM-Estimator and ETKRR(S2,S3) methods. n

Perc.out

5%

10%

50 15%

20%

30%

5%

10%

100 15%

20%

30%

βˆ 0

Method

βˆ 1

Comp. Time

Average

Bias

MSE

Average

Bias

MSE

M

1.789

0.642

0.799

1.999

0.042

0.052

0.0034

MM

1.719

0.635

0.796

1.999

0.042

0.052

0.0084

ETKRR(S2)

1.712

0.619

0.777

1.999

0.041

0.051

0.0025

M

1.939

0.696

0.840

2.000

0.044

0.055

0.0037

MM

1.706

0.651

0.817

2.000

0.043

0.053

0.0088

ETKRRS3)

1.702

0.643

0.807

2.000

0.042

0.052

0.0032

M

2.030

0.735

0.860

2.001

0.045

0.056

0.0037

MM

1.690

0.658

0.825

2.001

0.043

0.054

0.0087

ETKRR(S3)

1.680

0.652

0.818

2.001

0.042

0.053

0.0034

M

2.231

0.857

0.931

2.000

0.048

0.061

0.0043

MM

1.703

0.688

0.866

2.000

0.045

0.057

0.0082

ETKRR(S3)

1.676

0.686

0.863

2.003

0.045

0.056

0.0039

M

5.926

4.258

2.328

1.975

0.110

0.140

0.0061

MM

1.691

0.722

0.911

2.001

0.047

0.060

0.0090

ETKRR(S3)

1.618

0.732

0.921

2.015

0.049

0.060

0.0052

M

1.811

0.458

0.563

2.000

0.029

0.037

0.0038

MM

1.704

0.443

0.556

2.000

0.029

0.036

0.0141

ETKRR(S2)

1.698

0.435

0.546

2.000

0.028

0.035

0.0041

M

1.896

0.491

0.584

2.000

0.030

0.038

0.0038

MM

1.701

0.453

0.571

2.000

0.030

0.037

0.0127

ETKRR(S3)

1.699

0.449

0.565

2.000

0.029

0.037

0.0045

M

2.064

0.574

0.613

1.999

0.032

0.040

0.0042

MM

1.711

0.470

0.588

1.999

0.031

0.039

0.0132

ETKRR(S3)

1.700

0.466

0.582

2.000

0.030

0.038

0.0058

M

2.235

0.682

0.643

2.000

0.033

0.042

0.0045

MM

1.705

0.479

0.601

2.000

0.031

0.039

0.0125

ETKRR(S3)

1.679

0.476

0.598

2.003

0.031

0.039

0.0061

M

5.115

3.419

1.498

1.994

0.070

0.089

0.0073

MM

1.703

0.515

0.648

2.000

0.034

0.043

0.0126

ETKRR(S3)

1.636

0.523

0.655

2.013

0.035

0.043

0.0080

28

Table 14: Outliers scenario 2: bias and MSE for the parameter estimates. n

Perc.out

5%

10%

50

15%

20%

30%

βˆ 0

Method

βˆ 1

Comp. Time

Average

Bias

MSE

Average

Bias

MSE

WLS

4.796

3.096

0.782

1.787

0.212

0.050

0.0010

M

2.944

1.291

0.833

1.912

0.090

0.054

0.0022

MM

1.720

0.634

0.795

1.998

0.041

0.052

0.0062

L1

2.632

1.120

1.009

1.934

0.077

0.066

0.0012

ETKRR(S2)

1.717

0.620

0.778

1.999

0.040

0.051

0.0165

WLS

11.408

9.708

0.951

1.344

0.655

0.060

0.0012

M

10.692

8.992

4.202

1.363

0.636

0.297

0.0037

MM

1.706

0.650

0.817

1.999

0.042

0.053

0.0062

L1

4.928

3.230

1.335

1.770

0.229

0.090

0.0012

ETKRR(S2)

1.706

0.644

0.808

2.000

0.042

0.053

0.0194

WLS

14.669

12.969

1.073

1.128

0.871

0.066

0.0011

M

20.018

18.318

1.318

0.705

1.294

0.091

0.0021

MM

1.690

0.657

0.825

2.001

0.043

0.053

0.0067

L1

7.853

6.153

3.453

1.559

0.440

0.251

0.0013

ETKRR(S2)

2.265

1.210

3.593

1.957

0.084

0.276

0.0229

WLS

18.179

16.479

1.176

0.899

1.100

0.069

0.0010

M

21.700

20.000

1.198

0.585

1.414

0.082

0.0018

MM

1.703

0.688

0.866

1.999

0.045

0.057

0.0060

L1

21.707

20.007

5.979

0.556

1.444

0.422

0.0012

ETKRR(S2)

22.143

20.461

4.881

0.514

1.486

0.340

0.0305

WLS

24.608

22.908

1.259

0.489

1.510

0.061

0.0010

M

24.051

22.351

1.199

0.411

1.588

0.071

0.0021

MM

1.714

0.738

1.096

1.998

0.048

0.076

0.0062

L1

25.707

24.007

2.241

0.270

1.729

0.086

0.0012

ETKRR(S2)

24.751

23.051

2.900

0.321

1.678

0.112

0.0304

29

Table 15: Outliers scenario 2: bias and MSE for the parameter estimates. n

Perc.out

5%

10%

100

15%

20%

30%

βˆ 0

Method

βˆ 1

Comp. Time

Average

Bias

MSE

Average

Bias

MSE

WLS

6.381

4.681

0.557

1.680

0.319

0.036

0.0013

M

3.632

1.932

0.620

1.863

0.136

0.041

0.0026

MM

1.704

0.443

0.556

1.999

0.029

0.036

0.0091

L1

3.069

1.382

0.716

1.903

0.097

0.047

0.0015

ETKRR(S2)

1.703

0.436

0.547

2.000

0.028

0.035

0.0186

WLS

10.047

8.347

0.636

1.434

0.565

0.040

0.0016

M

6.314

4.614

1.179

1.673

0.326

0.082

0.0040

MM

1.701

0.452

0.570

1.999

0.030

0.037

0.0089

L1

4.301

2.601

0.827

1.815

0.184

0.055

0.0014

ETKRR(S2)

1.701

0.449

0.566

2.000

0.029

0.037

0.0268

WLS

14.914

13.214

0.744

1.113

0.887

0.046

0.0013

M

20.220

18.520

0.883

0.691

1.308

0.062

0.0022

MM

1.710

0.470

0.587

1.999

0.031

0.039

0.0089

L1

7.737

6.037

2.181

1.568

0.431

0.158

0.0014

ETKRR(S2)

1.890

0.642

2.073

1.986

0.044

0.156

0.0286

WLS

18.205

16.505

0.836

0.899

1.101

0.049

0.0010

M

21.741

20.041

0.845

0.584

1.415

0.058

0.0018

MM

1.704

0.478

0.601

1.999

0.031

0.039

0.0087

L1

21.578

20.878

4.456

0.492

1.507

0.315

0.0012

ETKRR(S2)

22.671

20.973

2.625

0.476

1.523

0.183

0.0508

WLS

24.282

22.582

0.893

0.510

1.489

0.044

0.0010

M

23.951

22.251

0.824

0.420

1.576

0.052

0.0022

MM

1.704

0.515

0.648

1.999

0.033

0.042

0.0088

L1

25.658

23.958

1.597

0.273

1.726

0.061

0.0012

ETKRR(S2)

24.421

22.721

1.654

0.338

1.661

0.064

0.0501

30

361

Outliers scenario 3

362

Tables 16 and 17 present the results for scenario 3, where leverage points are

363

considered. This is the hardest configuration for the methods evaluated in this

364

paper. Despite that, it can be observed that the ETKRR(S3) method presented

365

the best performance. For the ETKRR method, observe that the average values

366

of the parameter estimates βˆ 0 and βˆ 1 are the closest to the true parameters until

367

the percentage of outliers equal to 15%, specially for the parameter β 1 . The

368

ETKRR method also demonstrated the best average value and smallest bias in

369

comparison with the other methods when the percentage of outliers is at most

370

equal to 20%.

371

6. Applications on real data sets

372

This section considers applications on real data sets. The aim is to illus-

373

trate the usefulness of the ETKRR method and compare it with other robust

374

methods in real world applications. The robustness of each method will be

375

evaluated based on the percentage of change of the parameter estimates when

376

the outliers are suppressed from the data set. Based on this criterion it will

377

be possible to verify which approach will demonstrate more robustness to the

378

presence of outliers. Most of the real databases considered in this section are

379

available in the software R. It is important to emphasize that the outliers present

380

in the real data sets were not inputed, they can be considered as natural outliers

381

observations.

382

6.1. U.S. navy bachelors officer data

383

The main purpose of a robust regression method is to provide estimates for

384

the regression model that are less sensitive to the presence of outliers. This

385

data set is available in Table 5.2 of Ref. [21] and has 25 observation and 9 vari-

386

ables. We consider as response variable the average daily occupancy (Y). The

387

explanatory variables are the number of building wings (X1 ) and the monthly

388

man hours (X2 ).

31

Table 16: Outliers scenario 3: bias and MSE for the parameter estimates. n

Perc.out

5%

10%

50

15%

20%

30%

βˆ 0

Method

βˆ 1

Comp. Time

Average

Bias

MSE

Average

Bias

MSE

WLS

2.387

0.868

0.811

1.953

0.057

0.052

0.0010

M

2.806

1.185

0.854

1.922

0.082

0.056

0.0023

MM

2.143

0.830

0.942

1.968

0.056

0.063

0.0063

L1

2.561

1.079

1.011

1.939

0.073

0.066

0.0012

ETKRR(S3)

1.847

0.899

1.132

1.990

0.059

0.075

0.0028

WLS

3.818

2.213

0.902

1.857

0.142

0.058

0.0012

M

4.781

3.082

1.114

1.782

0.217

0.076

0.0023

MM

3.628

2.025

1.406

1.863

0.142

0.096

0.0063

L1

4.178

2.488

1.188

1.824

0.175

0.079

0.0011

ETKRR(S3)

2.319

1.122

1.380

1.956

0.076

0.094

0.0031

WLS

4.552

2.854

1.009

1.809

0.190

0.064

0.0013

M

5.476

3.777

1.200

1.734

0.265

0.081

0.0023

MM

4.570

2.909

1.517

1.797

0.204

0.104

0.0064

L1

4.938

3.240

1.297

1.771

0.228

0.086

0.0012

ETKRR(S3)

2.827

1.501

1.714

1.920

0.104

0.118

0.0036

WLS

5.361

3.661

1.106

1.756

0.243

0.071

0.0010

M

6.050

4.350

1.210

1.694

0.305

0.081

0.0022

MM

5.523

3.835

1.456

1.731

0.270

0.100

0.0062

L1

5.622

3.923

1.351

1.722

0.277

0.090

0.0012

ETKRR(S3)

3.679

2.199

2.029

1.858

0.154

0.141

0.0038

WLS

6.910

5.210

1.311

1.659

0.340

0.082

0.0010

M

6.887

5.187

1.260

1.637

0.362

0.083

0.0020

MM

6.774

5.074

1.314

1.644

0.355

0.087

0.0062

L1

6.626

4.926

1.424

1.653

0.346

0.092

0.0011

ETKRR(S3)

5.659

3.979

2.032

1.719

0.281

0.136

0.0044

32

Table 17: Outliers scenario 3: bias and MSE for the parameter estimates. n

Perc.out

5%

10%

100

15%

20%

30%

βˆ 0

Method

βˆ 1

Comp. Time

Average

Bias

MSE

Average

Bias

MSE

WLS

2.637

0.964

0.583

1.936

0.065

0.038

0.0014

M

3.242

1.546

0.648

1.891

0.109

0.043

0.0024

MM

2.437

0.851

0.730

1.947

0.059

0.049

0.0092

L1

2.895

1.223

0.720

1.915

0.085

0.047

0.0011

ETKRR(S3)

1.899

0.640

0.793

1.986

0.042

0.052

0.0054

WLS

3.392

1.695

0.647

1.885

0.114

0.042

0.0016

M

4.262

2.562

0.792

1.819

0.180

0.053

0.0025

MM

3.271

1.606

0.941

1.889

0.112

0.064

0.0091

L1

3.737

2.039

0.798

1.856

0.143

0.053

0.0014

ETKRR(S3)

2.114

0.736

0.864

1.971

0.049

0.058

0.0064

WLS

4.402

2.702

0.760

1.819

0.180

0.049

0.0013

M

5.249

3.549

0.905

1.750

0.249

0.061

0.0024

MM

4.616

2.919

1.039

1.794

0.205

0.071

0.0092

L1

4.758

3.058

0.941

1.784

0.215

0.063

0.0012

ETKRR(S3)

2.663

1.119

1.130

1.934

0.077

0.077

0.0078

WLS

5.104

3.404

0.853

1.774

0.225

0.055

0.0010

M

5.748

4.048

0.947

1.715

0.284

0.064

0.0023

MM

5.388

3.688

1.032

1.740

0.259

0.070

0.0091

L1

5.342

3.642

1.000

1.743

0.256

0.067

0.0011

ETKRR(S3)

3.237

1.399

1.641

1.890

0.115

0.097

0.0090

WLS

6.474

4.774

1.020

1.687

0.312

0.065

0.0009

M

6.507

4.807

0.990

1.664

0.335

0.067

0.0022

MM

6.399

4.699

1.037

1.671

0.328

0.070

0.0091

L1

6.231

4.531

1.074

1.681

0.318

0.071

0.0012

ETKRR(S3)

5.104

3.410

1.525

1.758

0.241

0.105

0.0099

33

389

Figure 3 presents the 3D scatter plot of the data and the fitted plan for each

390

regression method. The fitted plan for the ETKRR method considers the esti-

391

mator S3 for the width hyper-parameter γ2 . The presence of Y-space outliers

392

can justify this fact. Notice that due the presence of the top-middle point, some

393

regression plans are being pulling up. Table 18 shows that the method ETKRR

394

presented the lower percentage of change in the parameter estimates, followed

395

by L1 regression. Particularly, the MM-Estimator demonstrated a bad perfor-

1000

mance for this data set.

Y

600



800

OLS WLS M MM L1 KRR

400

● ●

200



50

● ● ●●



40

30

●● ●



20

10

● ●● ●



0 10000

8000

6000

4000

2000

●● ● ●●

0

X1



60

0

X2

Figure 3: U.S. navy bachelors data set: 3D scatter plot between the response and explanatory variables. 396

397

6.2. International calls from Belgium data

398

This data set, available in the R library robustbase, considers the number

399

of international calls from Belgium between 1950 and 1973, taken from the Bel-

400

gian Statistical Survey, published by the Ministry of Economy. The response

34

Table 18: US navy bachelors data set: Percentage change (%) in the parameter estimates after suppression of the outlier observations.

βˆ 0

βˆ 1

βˆ 2

OLS

823.45

205.48

32.10

WLS

82.04

10.80

43.94

M

30.39

141.85

14.16

MM

42.40

142.47

15.68

L1

22.23

82.17

12.83

ETKRR(S3)

18.84

44.61

8.77

Method

401

variable represents the number of calls (in tens of millions) . The covariate is

402

the year. Figure 4 presents the scatter plot of the data and the straight lines estimated for each method.

15 10 0

5

Number of Calls

20

OLS WLS M MM L1 KRR

50

55

60

65

70

Year (1950−1973)

Figure 4: International calls from Belgium data: Estimated straight line according to method. 403

404

It can be observed the presence of some Y-space outliers. It is possible iden-

405

tify that the ETKRR and MM-Estimator methods (red and yellow lines, respec-

406

tively) exhibit very close lines. The OLS and WLS methods (black and blue

407

lines, respectively) are strongly affected by the outliers. 35

408

From Table 19, it is verified that the methods MM-Estimator and ETKRR

409

presented the lower percentage of change in the parameter estimates. The es-

410

timator S3 for the width hyper-parameter γ2 was considered for the method ETKRR due the presence of Y-space outliers (scenario 1). Table 19: International calls from Belgium data set: Percentage change (%) in the parameter estimates after suppression of the outlier observation.

βˆ 0

βˆ 1

OLS

79.77

78.07

WLS

85.18

83.34

M

50.55

47.79

0.74

0.92

L1

29.13

27.12

ETKRR(S3)

16.81

15.94

Method

MM

411

412

6.3. Cloud point data

413

This data set, also available in the R library robustbase, contains the mea-

414

surements concerning the cloud point of a liquid. The cloud point is a measure

415

of the degree of crystallization in a stock. The response variable is the cloud

416

point (Y) and the covariate is Percentage I-8. Figure 5 presents the scatter plot

417

of the data and the straight lines estimated for each method.

418

It can be observed the presence of 3 leverage points. Moreover, ETKRR (red

419

line) was the method less affected by these leverage points. Finally, according

420

to Table 20, the method ETKRR presented the lower percentage of change in

421

the parameter estimates take into account the estimator S3 for the width hyper-

422

parameter γ2 , as expected from the results obtained with the synthetic data sets

423

of section 5.3.

424

6.4. Delivery time data

425

This data set is available in the R library robustbase, as well as in Ref. [22],

426

and considers 25 observation and 3 variables. We consider as response variable 36

30 28 26 22

24

Cloud point

32

OLS WLS M MM L1 KRR

0

2

4

6

8

10

Percentage I−8

Figure 5: Cloud point data: Estimated straight line according to method.

Table 20: Cloud point data set: Percentage change (%) in the parameter estimates after suppression of the outlier observations.

βˆ 0

βˆ 1

OLS

4.37

14.92

WLS

3.64

14.40

M

4.39

15.18

MM

1.18

4.72

L1

0.69

3.63

ETKRR(S3)

0.11

0.61

Method

37

427

the delivery time (Y) and as explanatory variables the number of products (X1 )

428

and the distance (X2 ). Figure 6 presents the 3D scatter plot of the data as well as the fitted regression plans.



80



30

● ●

20

● ●

● ●

Y

40



25



● ●



● ●

15

● ●

● ●

10

20

X1



60

OLS WLS M MM L1 KRR



● ● ● ●

5 0

0 1500

1000

500

0

X2

Figure 6: Delivery time data: 3D scatter plot between the response and explanatory variables. 429 430

The presence of leverage points located in the top-left-corner of the graphic

431

have pulling up some regression plans. The fitted plan for the ETKRR method

432

considered the estimator S3 for the width hyper-parameter. This is in accor-

433

dance with the results provided in the simulation study (section 5.3), due the

434

presence of leverage points. In general, Table 21 shows that the method ETKRR

435

and L1 regression have presented the lower percentage of change in the pa-

436

rameter estimates. It is worthy to note that the MM-Estimator presented a bad

437

performance for this data set, in comparison with L1 and ETKRR methods.

38

Table 21: Delivery time data set: Percentage change (%) in the parameter estimates after suppression of the outlier observations.

βˆ 0

βˆ 1

βˆ 2

OLS

142.32

21.73

26.18

WLS

61.62

18.79

10.88

M

60.13

13.68

24.07

MM

17.19

12.55

8.77

L1

16.34

4.19

3.65 × 10−14

2.42

6.07

13.82

Method

ETKRR(S3)

438

6.5. Kootenay river data

439

This data set considers the water-flow measurement of the Kootenay river

440

in the cities of Libby (Montana, USA) and Newgate (British Columbia, Canada).

441

It is available in the R library robustbase.

442

Figure 7 presents the scatter plot of the data and the straight lines estimated for each method.

25 20

Newgate

30

OLS WLS M MM L1 KRR

20

30

40

50

60

70

80

Libby

Figure 7: Kootenay data: Estimated straight line according to method. 443

444

The methods ETKRR and MM-Estimator (red and yellow lines, respectively) 39

445

exhibit the closest lines from the sample points. The M-Estimator and OLS

446

methods (black and magenta lines, respectively) are strongly affected by the

447

point located at right of the graph (X-space outlier). The estimator S1 for the

448

width hyper-parameter γ2 presented the best results for ETKRR method in this

449

data set. Moreover, Table 22 shows that the method ETKRR presents the lower

450

percentage of change in the parameter estimates after the suppression of the outlier observation. Table 22: Kootenay data set: Percentage change (%) in the parameter estimates after suppression of the outlier observation.

βˆ 0

βˆ 1

OLS

76.29

4.5782 × 103

WLS

56.25

8.7152 × 101

M

76.69

3.4564 × 103

0.20

0.1276

72.92

1.1183 × 102

0.11

0.0671

Method

MM L1 ETKRR(S1) 451

452

6.6. Public schools data

453

The data set named “Public Schools” is available in the R library sandwich.

454

The response variable represents the per capita spending on public schools

455

(Expenditure). The covariate is the per capita income, by state, in the USA

456

[23]. We have removed the state of Wisconsin because information about it

457

was missing.

458

459

Figure 8 illustrates the scatter plot including the straight lines estimated for each method assessed in this paper.

460

Notice that the ETKRR and MM-Estimator methods (red and yellow lines,

461

respectively) present very close regression lines and are less affected by the

462

leverage point located at the top-right area and the X-space outlier located in

463

middle-right area in the Figure 8. This can also be verified in the Table 23,

40

WLS M

500

MM L1 KRR

300

Expenditure

700

OLS

6000

7000

8000

9000

10000

11000

Income

Figure 8: Public school data: Estimated straight line according to method.

464

where the MM-Estimator and the method ETKRR present the lower percent-

465

age of change in the parameter estimates. The method ETKRR presented the

466

second best performance using the estimator S1. Moreover, the OLS method

467

(black line) exhibits the worst regression equation due to its higher percent-

468

age of change in the parameter estimates. It is important to mention that this

469

particular dataset presents outliers in two parts of the scatter plot. This sce-

470

nario was not considered in the simulation settings. Nevertheless, the ETKRR

471

method presented a competitive performance.

472

6.7. ETKRR overall performance on the real data sets.

473

The performance of the method ETKRR in the six real data sets demon-

474

strates that the proposed approach represents a competitive method when com-

475

pared with the classical robust methods. Table 24 ranks the methods for each

476

data set based on the percentage of change of the parameter estimates (rank 1

477

for the lower percentage of change). As expected, the OLS is the most sensitive

478

method to the presence of outliers. The methods WLS and M-Estimator have

479

exhibited the worst performance between the classical robust methods. The

41

Table 23: Public school data set: Percentage change (%) in the parameter estimates after suppression of the outlier observation.

βˆ 0

βˆ 1

OLS

82.28

24.82

WLS

30.63

7.12

M

66.04

12.80

0.99

0.06

20.12

3.96

6.35

0.53

Method

MM L1 ETKRR(S1)

480

methods MM-Estimator and L1 have outperformed the previous approaches

481

and have demonstrated a similar performance. Moreover, the method ETKRR

482

exhibited the best performance based on the average of the rank, corroborating

483

with the results presented in the simulation section. Table 24: Overall performance of the method based on percentage of change’s rank.

484

Method

Public

Kootenay

US Navy

Int. calls

Cloud

Delivery

Average Rank

OLS

6

6

6

5

5

6

5.66

WLS

4

3

5

6

4

5

4.50

M

5

5

3

4

6

4

4.50

MM

1

2

4

1

3

3

2.33

L1

3

4

2

3

2

1

2.50

ETKRR

2

1

1

2

1

2

1.50

7. Concluding remarks

485

A regression method based on kernel functions is proposed in this pa-

486

per. The Exponential-Type Kernel function based Robust Regression (ETKRR)

487

method allows the use of exponential-type kernel functions to penalize bad fit-

488

ted observations. The weights given to each observation are updated at each 42

489

iteration of the parameter estimate algorithm. A proof for the convergence of

490

the parameter estimate algorithm was given. Due to mathematical facilities, we

491

considered the Gaussian kernel for the parameter estimation algorithm, that

492

belongs to the family of “exponential-type” kernels.

493

The performance of the proposed parameter estimate algorithm is sensi-

494

tive to the computation of the initial value of the vector of parameters of the

495

regression model as well as to the width hyper-parameter estimator of the

496

exponential-type kernel functions. A simulation study with synthetic data sets,

497

gave insights about the appropriateness of some width hyper-parameter esti-

498

mators to X-space and Y-space outliers as well as leverage points in order to

499

improve the performance of the proposed approach in terms of bias and MSE

500

of the parameter estimates. Moreover, the OLS method is suitable to com-

501

pute the initial value of the vector of parameters for the proposed regression

502

method.

503

A comparative study between the ETKRR method and some classical ro-

504

bust regression approaches also have demonstrated the usefulness of the pro-

505

posed method. We have considered synthetic data sets with X-space outliers,

506

Y-space outliers and leverage points, in a Monte Carlo simulation framework

507

with 10, 000 replications, different sample sizes and increasing percentage of

508

outliers. Particularly, the ETKRR method have presented a similar perfor-

509

mance to OLS method when only clean observations were considered. In the

510

scenario with outliers observations in the Y-space, the ETKRR method pre-

511

sented the best performance at the level of 15% of outliers. Moreover, the

512

ETKRR method had about half of computational time when compared with

513

MM-Estimador method. In the scenario with outliers observations in X-space,

514

the method ETKRR presented the best performance when the proportion of

515

outliers is at most 10% (if compared with MM-Estimator) and 15% (if com-

516

pared with WLS, L1 and M-Estimator). The ETKRR approach presented the

517

best performance until 15% of outliers in the scenario where bad leverage ob-

518

servations are considered. For more than 15% of outliers, all robust methods

519

presented a poor performance in this scenario. Thus, the results suggested that 43

520

the proposed approach presented a competitive performance (or best) in sce-

521

narios that are comparable to those often found in real life problems. Finally,

522

concerning benchmark real data sets, it was observed that the ETKRR method

523

presented the best overall performance in comparison with the classical robust

524

methods.

525

Acknowledgments

526

The authors are grateful to the anonymous referees for their careful revi-

527

sion, valuable suggestions, and comments which improved this paper. The

528

authors would like to thank CAPES (National Foundation for Post-Graduated

529

Programs, Brazil) and CNPq (National Council for Scientific and Technological

530

Development, Brazil) for their financial support. The first author would like to

531

thank also FACEPE (Reseach Agency from the State of Pernambuco, Brazil).

532

533

[1] P. J. Huber, Robust regression: asymptotic, conjectures and monte carlo, The Annals of Statistics 1 (1973) 799–991.

534

[2] P. J. Huber, Robust Statistics, John Wiley and Sons Inc., New York, 1981.

535

[3] V. J. Yohai, High breakdown point and high efficiency robust estimates for

536

537

538

539

540

541

542

regression, The Annals of Statistics 15 (1973) 642–656. [4] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of Royal Statistical Society - Series B 58 (1996) 267–288. [5] P. J. Rousseeuw, A. M. Leroy, Robust Regression and Outlier Detection, John Wiley and Sons Inc., New York, 1987. [6] T. P. Ryan, Modern Regression Methods, John Wiley and Sons Inc., Hoboken, NJ, 2009.

543

[7] X. Tong, Y. Jin, L. Li, An improved weighted total least squares method

544

with applications in linear fitting and coordinate transformation, Journal

545

of Surveying Engineering 137 (2011) 120–128. 44

546

547

548

549

550

551

552

553

554

555

[8] Y. Fan, G. Qin, Z. Zhua, Variable selection in robust regression models for longitudinal data, Journal of Multivariate Analysis 109 (2012) 156–167. [9] I. Naseem, R. Togneri, M. Bennamoun, Robust regression for face recognition, Pattern Recognition 45 (2012) 104–188. [10] B. Scholkoepf, A. Smola, Learning with kernels, MIT Press, Cambridge MA, 2002. [11] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge, 2004. [12] N. Cristianini, J. Shawe-Taylor, Introduction to Support Vector Machines, Cambridge University Press, Cambridge, 2000.

556

[13] T. Hofmann, B. Scholkoepf, A. Smola, A review of kernel methods in

557

machine learning, Tech. rep., Max-Planck-Institut fuer biologische Kyber-

558

netik (2006).

559

[14] P. J. Huber, Robust estimation of a location parameter, Ann. Math. Statist.

560

35 (1) (1964) 73–101. doi:10.1214/aoms/1177703732.

561

URL http://dx.doi.org/10.1214/aoms/1177703732

562

[15] P. Rousseeuw, V. Yohai, Robust Regression by Means of S-Estimators,

563

Springer US, New York, NY, 1984, pp. 256–272.

564

978-1-4615-7821-5_15.

565

URL http://dx.doi.org/10.1007/978-1-4615-7821-5_15

566

567

doi:10.1007/

[16] R. A. Maronna, R. D. Martin, V. J. Yohai, Robust Statistics: Theory and Methods, John Wiley and Sons Inc., Chichester, 2006.

568

[17] J. Mercer, Functions of positive and negative type and their connection

569

with the theory of integral equations, Philosophical Transactions of the

570

Royal Society A 209 (1909) 441–458.

45

571

[18] K. R. Mueller, S. Mika, G. R. Raetsch, K. Tsuda, B. Schloelkopf, An intro-

572

duction to kernel-based learning algorithms, IEEE Transactions on Neural

573

Networks 12 (2001) 181–202.

574

[19] R Core Team, R: A Language and Environment for Statistical Computing,

575

R Foundation for Statistical Computing, Vienna, Austria (2016).

576

URL https://www.R-project.org/

577

[20] B. Caputo, K. SIM, F. Furesjo, A. Mola, Appearance-based object recog-

578

nition using svms: which kernel should i use?, in: Proceedings of NIPS

579

workshop on Statistical methods for computational experiments in visual

580

processing and computer vision, 2002.

581

582

583

584

585

586

[21] R. H. Meyers, Classical and modern regression with applications, Duxbury Press, Belmont, CA, 1990. [22] D. C. Montgomery, Introduction to linear regression analysis, John Wiley and Sons Inc., New York, 1982. [23] W. H. Greene, Econometric Analysis, Macmillan Publishing Company, New York, 1993.

46