Nonparametric bivariate copula estimation based on shape-restricted support vector regression

Nonparametric bivariate copula estimation based on shape-restricted support vector regression

Knowledge-Based Systems 35 (2012) 235–244 Contents lists available at SciVerse ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier...

549KB Sizes 1 Downloads 36 Views

Knowledge-Based Systems 35 (2012) 235–244

Contents lists available at SciVerse ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Nonparametric bivariate copula estimation based on shape-restricted support vector regression Yongqiao Wang a,⇑, He Ni a, Shouyang Wang b a b

School of Finance, Zhejiang Gongshang University, Hangzhou 310018, Zhejiang, China Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 8 December 2011 Received in revised form 9 April 2012 Accepted 8 May 2012 Available online 17 May 2012 Keywords: Support vector regression Copula Nonparametric estimation Dependence Shape-restriction

Copula has become a standard tool in describing dependent relations between random variables. This paper proposes a nonparametric bivariate copula estimation method based on shape-restricted -support vector regression (-SVR). This method explicitly supplements the classical -SVR with constraints related to three shape restrictions: grounded, marginal and 2-increasing, which are the necessary and sufficient conditions for a bivariate function to be a copula. This nonparametric method can be reformulated to a convex quadratic programming, which is computationally tractable. Experiments on both five artificial data sets and three international stock indexes clearly showed that it could achieve significantly better performance than common parametric models and kernel smoother.  2012 Elsevier B.V. All rights reserved.

1. Introduction Since [1], copula has become a standard tool of dependence modeling in multivariate statistical analysis [2,3]. Copula summarizes all dependence information between random variables and separates marginal components of a joint distribution from its distribution structure. In the last 20 years, copula has been widely applied in a variety of areas, such as engineering, finance, insurances, economics, etc. See, for example, the recent monographs [4–6] (and references therein). In what follows we focus on bivariate copula only for simplicity. Let X1 and X2 be two continuous random variables of interest with joint distribution function H and marginal distributions F1 and F2, respectively. When the marginal distributions F1 and F2 are continuous, Sklar’s theorem [1] ensures that there exists a unique copula function C:[0, 1]2 ? [0, 1], which satisfies

on the data sets xi1, i = 1,. . . ,T, and xi2, i = 1,. . . ,T, respectively. Assume b F 1 and b F 2 are the estimated cumulative distributions of X1 and X2, respectively. The second step is to estimate the copula function C based b denote the estimaon the data set ð b F 1 ðxi1 Þ; b F 2 ðxi2 ÞÞ; i ¼ 1; . . . ; T. Let C tion of C, then the estimation for H is

b b b 1 ; x2 Þ ¼ Cð Hðx F 1 ðx1 Þ; b F 2 ðx2 ÞÞ;

8ðx1 ; x2 Þ 2 R2 :

ð1Þ

By copula technique, the estimation of the joint distribution H can be separated by two steps: marginal distributions construction and copula estimation. Assume we should estimate H based on the i.i.d observations from the distribution H: ðxi1 ; xi2 Þ 2 R2 , i = 1,. . . ,T. The first step is to estimate the two marginal distributions, F1 and F2, based ⇑ Corresponding author. Tel.: +86 571 28877720; fax: +86 571 28877705. E-mail addresses: [email protected] (Y. Wang), [email protected] (H. Ni), [email protected] (S. Wang). 0950-7051/$ - see front matter  2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.knosys.2012.05.004

ð2Þ

Due to the fact that the univariate cumulative distribution estimation problem has been extensively researched in statistics, this paper focuses on copula estimation and neglects the first step. Most often copula is obtained by Maximum Likelihood Estimation (MLE). Assume C comes from a copula family indexed by a real-valued parameter h. The MLE for h can be obtained by T Y ^h ¼ arg max Cð b F 1 ðxi1 Þ; b F 2 ðxi2 Þ; hÞ: h

Hðx1 ; x2 Þ ¼ CðF 1 ðx1 Þ; F 2 ðx2 ÞÞ;

8ðx1 ; x2 Þ 2 R2 :

ð3Þ

i¼1

Commonly used copula families in practice are:  Gaussian

Cðu1 ; u2 ; qÞ ¼ Uq ðU1 ðu1 Þ; U1 ðu2 ÞÞ

ð4Þ

where q 2 (1, 1), U() is the cumulative distribution function of the standard normal distribution, Uq(, ) is the bivariate normal distribution function with marginal distributions standard normal and correlation q.

236

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

 Student’s t

  1 Cðu1 ; u2 ; q; mÞ ¼ t q;m t 1 m ðu1 Þ; t m ðu2 Þ

ð5Þ

where q 2 (1, 1), m 2 N, tm() is the Student’s t distribution function with degree of freedom m, tq,m is the bivariate Student’s t distribution function with correlation q and degree of freedom m.  Clayton

Cðu1 ; u2 ; aÞ ¼ max

h

1=a i u1 a þ u2 a  1 ;0

ð6Þ

where a 2 [1, 0) [ (0, +1).  Frank

Cðu1 ; u2 ; aÞ ¼ 

  ðeau1  1Þðeau2  1Þ ln 1 þ ea  1 a 1

ð7Þ

where a 2 R; a – 0.  Gumbel

Cðu1 ; u2 ; aÞ ¼ expf½ð ln u1 Þa þ ð ln u2 Þa 1=a g

ð8Þ

where a 2 [1, +1). Though the MLE method has tractable computational complexity and nice asymptotic statistical properties, its performance severely depends on its guess on the copula family. For example, financial risk will be greatly underestimated if dependence between financial returns is assumed to be the Gaussian copula, which has zero lower tail dependence. For example, the Gaussian copula assumption in Collateralized Debt Obligations (CDOs) pricing [7] has been criticized to be one of the key reasons behind the global 2008–2009 Subprime Crisis.1 To overcome model specification error of MLE, some nonparametric methods have been proposed to estimate the underlying copula. Empirical copula, introduced by [8,9], extends the idea of empirical distribution for univariate variable to copula as follows

b ðu1 ; u2 Þ ¼ 1 C T

T X

If b F 1 ðx1j Þ 6 u1 ; b F 2 ðx2j Þ 6 u2 g:

ð9Þ

j¼1

Because empirical copula is highly discontinuous and wiggly, several methods have been proposed to smooth empirical copula, including kernel smoother [10], spline [11–14] and wavelets [15–17]. However, nonparametric copula estimation with explicit 2-increasing shape restriction can hardly be found. One obvious obstacle of nonparametric methods for copula estimation is that their estimators are often invalid to be a copula function. The necessary and sufficient conditions for a bivariate function C:[0, 1]2 ? [0, 1] to be a copula are. 1. grounded: "u1, u2 2 [0, 1], C satisfies

Cðu1 ; 0Þ ¼ 0;

Cð0; u2 Þ ¼ 0:

ð10Þ

2. marginal: "u1, u2 2 [0, 1], C satisfies

Cðu1 ; 1Þ ¼ u1 ;

Cð1; u2 Þ ¼ u2 :

ð11Þ

3. 2-increasing: "u1, u2, v1, v2 2 [0, 1] such that u1 6 v1 and u2 6 v2, C satisfies

Cðv 1 ; v 2 Þ  Cðv 1 ; u2 Þ  Cðu1 ; v 2 Þ þ Cðu1 ; u2 Þ P 0:

ð12Þ

Different from MLE, whose assumed functional form ensures its estimator to be valid, nonparametric methods should explicitly append constraints related to the above three shape restrictions.

These shape restrictions can be regarded as prior knowledge, which can be exploited to improve fitting performance. Its contribution is especially obvious when the size of the training data is small, in which case the common nonparametric estimators have great possibility of violating these shape restrictions. Shape-restricted regression dates back to the literature on isotonic regression [18,19]. There exists a large literature on the problem of estimating monotone, concave or convex regression functions. Because some of these estimators are not smooth, many efforts have been devoted to the search of a simple, smooth and efficient estimation of a shape-restricted regression function. Typical applications of shape-restricted regression include the study of dose response experiments in medicine and the study of utility functions, product functions, profit functions and cost functions in microeconomics. Shape-restricted regression has been incorporated into wavelets [20], spline [21,22] and Bernstein polynomials [23,24] etc. In machine learning area, shape restrictions also have been incorporated into support vector regression, such as monotone least squares support vector regression [25], monotone kernel quantile regression [26], boundary derivatives kernel regression [27], convex support vector regression [28]. Among them, [28] can tackle multivariate regression, while others are applicable only to univariate regression. [29] solved a support vector regression when one has some prior knowledge. However, it can only handle monotonicity or concavity of certain points. The main contribution of this paper is to propose a nonparametric copula estimation based on shape-restricted -support vector regression. In this paper, the method is called Kernel Copula Regression (KCR). In this method the estimator is obtained by fitting a bivariate function with -support vector regression based on samples (ui, Ci), i = 1, . . ., T, where ui ¼ ð b F 1 ðx1i Þ; b F 2 ðx2i ÞÞ and Ci is its corresponding empirical copula (9). To make the estimator satisfy the three shape restrictions (10)–(12), additional constraints are explicitly imposed on the classical -support vector regression [30,31]. To make the KCR estimator grounded and marginal, equality constraints are imposed on points that are spaced evenly on the four boundaries. To make it 2-increasing, nonnegative second-order mixed derivatives constraints are imposed on equidistant grid points of [0, 1]2. We expect to obtain better estimation by spanning a network of grounded, marginal and 2increasing points. The advantages of KCR are multifold. First it is a nonparametric estimation method, which can handle any complex dependence structure. Second this estimator is smooth, which is an obvious superiority over the empirical copula estimator. Third this estimator satisfies the three shape-restrictions of copula: grounded, marginal and 2-increasing, provided that sufficient constraints related to shape restrictions are appended. Fourth its training involves a convex quadratic programming, which is computationally tractable. The structure of the paper is as follows. Section 2 introduces how to apply the classical -support vector regression (-SVR) to estimate copula. A toy data set is used to show its qualitative shortcomings. Section 3 presents the novel nonparametric method for copula estimation. We will detail how to impose additional constraints related to the three shape restrictions and how to transform it to a convex quadratic programming. Section 4 presents the numerical comparison results of KCR with other state-of-theart methods. The paper is concluded in Section 5.

2. -Support vector regression estimator 1

Please refer to the two reports: Jones Sam, The formula that felled Wall Street, Financial Times, April 24, 2009; Salmon Felix, Recipe for disaster: the formula that killed Wall Street, Wired Magazine, March 2009.

This section discusses how to apply the classical -support vector regression to copula estimation based on observations (x1i, x2i),

237

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

i = 1, . . ., T. According to copula theory, the points (F1(x1i),F2(x2i)), i = 1, . . ., T, are i.i.d samples from the joint distribution C with support [0, 1]2. If b F 1 and b F 2 are the estimated univariate cumulative distributions of X1 and X2, respectively, copula estimation can be regarded as a bivariate cumulative distribution function estimation problem based on the data set (ui, Ci), i = 1, . . ., T, where ui = (u1i, u2i)0 , u1i ¼ b F 1 ðx1i Þ, u2i ¼ b F 2 ðx2i Þ and

Ci ¼

T 1X If b F 1 ðx1j Þ 6 u1i ; b F 2 ðx2j Þ 6 u2i g T j¼1

1 2

min  a ;a

Tþ4M X Tþ4M X i¼1

þ





j¼1

Tþ4M X





ai þ ai 

i¼1

s:t:

Tþ4M X





ai  ai aj  aj /ðui Þ0 /ðuj Þ Tþ4M X

  C i ai  ai

ð19Þ

i¼1



ai  ai ¼ 0

i¼1

ð13Þ

where I() is the indicator function. A natural first attempt to make the estimator grounded and marginal is to append some points from the four boundaries of the support [0, 1]2:

ai ; ai 2 ½0; c=ðT þ 4MÞ; i ¼ 1; . . . ; T þ 4M: According to the optimal conditions,



Tþ4M X





ai  ai /ðui Þ;

ð20Þ

i¼1

and the final -SVR estimator is    

boundary boundary boundary boundary

u2 = 0: u1 = 0: u2 = 1: u1 = 1:

(u1t, 0), (0, u2t), (u1t, 1), (1, u2t),

u1t 2 [0, 1] u2t 2 [0, 1] u1t 2 [0, 1] u2t 2 [0, 1]

b CðuÞ ¼

Cðut Þ ¼ w0 /ðut Þ þ b

ð14Þ

that minimizes the regularized risk functional

1 Rreg ½C , cRemp ½C þ w0 w 2

ð15Þ

where Remp(C) is the empirical risk functional Tþ4M X 1 rðui ; C i ; Cðui ÞÞ: T þ 4M i¼1

ð16Þ

Same as ridge regression, the regularization term w0 w/2 is added to overcome over-fitting. According to the statistical learning theory [33], it is necessary to control model capacity with this regularization term in dealing with few samples and high-dimensional space F . Or else, it will lead to over-fitting and thus bad generalization properties. The parameter c is used to trade-off the empirical error term Remp[C] and the regularization term w0 w/2. If r is the -insensitive loss

rðui ; C i ; Cðui ÞÞ ¼



if jC i  Cðui Þj 6 

0

jC i  Cðui Þj   others;

ð17Þ

the copula estimation problem (15) can be described by the following -SVR

min

w;b;n;n

s:t:

Tþ4M X

c Tþ4M



 ni þ ni þ 12 w0 w

i¼1

C i  w0 /ðui Þ  b 6  þ ni ; C i þ w0 /ðui Þ þ b 6  þ ni ; ni P 0;

ni P 0;





ai  ai /ðui Þ0 /ðuÞ þ b:

ð21Þ

i¼1

By intentionally appending these boundary points into the training data set of -SVR, we expect that the estimator can better fit the four boundaries. Though these points are not fetched directly by observation, they are certain correct from copula theory. If 4M points spaced evenly on the four boundaries are drawn, the training data set has size T + 4M. M controls the degree of the four boundaries being fitted. Larger M can make the four boundaries be fitted better, but leads to more computational effort. In -SVR, nonlinear regression power is realized by a implicitly specified mapping function /(), which maps the input space [0, 1]2 to a higher, possibly infinite, feature space F . A weighted twin version of -SVR can be found in [32]. In this feature space we should look for the linear function

Remp ½C ,

Tþ4M X

i ¼ 1; . . . ; T þ 4M

ð18Þ

i ¼ 1; . . . ; T þ 4M

i ¼ 1; . . . ; T þ 4M:

Let ai and a denote the Lagrangian multipliers of the first and second constraints corresponding to sample i, respectively. Its dual problem is a convex quadratic programming

Note that the above complete algorithm can be described in terms of inner products between the mapped points. Even when evaluating C(u), we need not compute w explicitly. These results will come in handy for the formulation of a nonlinear extension. Hence we can directly replace this inner product function with kernel function

Kðui ; uj Þ ¼ /ðui Þ0 /ðuj Þ:

ð22Þ

Typical kernels include  Gaussian or Radial Basis Function (RBF) kernel

(

) kui  uj k2 Kðui ; uj Þ ¼ exp  ; 2r 2

r 2 Rþ

ð23Þ

 Polynomial kernel

 p Kðui ; uj Þ ¼ 1 þ u0i uj ;

p 2 N:

ð24Þ

To get a qualitative impression on the performance of the classical -SVR estimation, a size-40 toy data set was randomly drawn from the Gaussian copula (4) with parameter q = 0.5. Twenty points spaced evenly on the four boundaries were appended for boundary fitting. Gaussian kernel with parameter r = 10 was applied in training. Other parameters were set as follows: c = 0.1 and  = 0.01. The above parameter settings were arbitrary. The contour graph of the true Gaussian copula is shown in Fig. 1. The boundaries and contour graph of the estimated copula are shown in Figs. 2 and 3, respectively. These figures clearly show that this estimation deviates much away from the true copula. Moreover, from Fig. 2, it is obvious that this copula estimator is not grounded and marginal. Contour lines in Fig. 3 clearly show that this estimation is not 2-increasing. From the above comparison, one can immediately conclude that copula estimation is not a common nonparametric regression problem. Extra constraints related to the three shape restrictions must be appended to make their estimation valid to be copula. At the best of our knowledge, there is no publication on 2-increasing nonparametric regression, which is the point that greatly retards their application in copula estimation. From statistical learning angle, these shape-restrictions can be regarded as prior knowledge, which can be exploited to improve learning performance in finite sample case. 3. Kernel copula regression

 i

This section puts forward the novel nonparametric copula estimation based on shape-restricted -SVR, namely KCR. Based on the

238

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

1

0.1

0

2

u

2

u

0.1

0

0.1

0.2

0.3

0.4

0.5 u1

0.6

0.7

0.8

0.9

1

0.1

0.1

0

0

−0.1

0.1

−0.2

0

0.5

1

1 C(u1,1)

−0.2

0

1

0.1

0

0.1

0.8

0.6

0.6

0.4

0.4

0.2

0.2 u1

0 0

0.5

Fig. 2. Four boundaries of the estimated copula.

1

-SVR

C(0,u ) 2

u2 0

0.5

1

C2C

T cX

T

0.3

0.4

0.5 u

0.6

0.7

0.8

0.9

1

not trivial. Hence, same as [25,26,28], this infinite programming has to be approximated with a (or a consequence of) finite programming. To make the estimator grounded and marginal, same as in Section 2, 4M points that space evenly on the four boundaries are ap i ; C i Þ; i ¼ 1; . . . ; 4Mg. Different pended. Assume this data set is fðu from common samples that are subject to sampling error, these 4M points are constants in any scenario. So violation is not allowed for these appended points, i.e.

 i Þ þ b: C i ¼ w0 /ðu

u2

0 0

0.5

1

estimator. Dashed: true copula. Solid:

1 rðui ; C i ; Cðui ÞÞ þ w0 w 2 i¼1

ð26Þ

To tackle the 2-increasing shape restriction, we should resort to the following theorem [6, page 2]: if a bivariate function C has secondary derivatives, the 2-increasing property is equivalent to

@ 2 Cðu1 ; u2 Þ P 0: @u1 @u2

ð27Þ

We construct an equidistant grid of [0, 1]2

above section, we know that nonparametric copula estimation is not a simple nonparametric regression problem. Actually we should look for a linear function in the feature space F that

minRreg ½C ¼

0.2

Fig. 3. Contour graph of the -SVR estimator.

1 C(1,u2)

0.8

0.2

0.1

0.1

−0.1 u1

0.4

1

0.2

C(u ,0)

0.5

0.2

Fig. 1. Contour graph of the true copula.

0.2

0.6

0.3

0.2

0.2

0.1

0.7

0.4

0.3

0.2

0.1

0.7

2 0.

0.2

0.5 0.4

0.3

0.2

0.3 0.1

0.4

0.3

0.4

0.3

0.5

0.3

6 0.5

0.6

0.1

0.4

0.8

0.

0.7

0.6

0.4

0.5

0.2

5

0.6

0.8

0.7

0.

0.7

0.1

6 0.

0.4

0.2

0.8

0.9

0.8

0.5

0.7

0.9

0.1

0.9

0.3

1

ð25Þ

where C is the set of all bivariate functions, C: [0, 1]2 ? [0, 1], which are grounded, marginal and 2-increasing. Constraints related to the three shape restrictions must be explicitly imposed to make the estimator valid to be a copula. 3.1. Problem formulation The problem (25) is computationally intractable, owing to the infinite number of constraints involved in the shape restriction C 2 C. Note that the grounded and marginal shape restrictions should be satisfied on every point of the four boundaries, while the 2-increasing should be satisfied on very rectangle in its domain [0, 1]2. Note that even verifying feasibility of a specified solution is

^ iNþjþ1 ¼ ði=ðN  1Þ; j=ðN  1ÞÞ; u

i; j 2 f0; 1; . . . ; N  1g;

ð28Þ

N 2 N, and impose Eq. (27) on each grid point. Our motivation is to span a network of 2-increasing knots to a 2-increasing surface. For ^ i , we have each grid point u

^ i Þ @ 2 ðw0 /ðu ^ i Þ þ bÞ ^iÞ @ 2 Cðu @ 2 /ðu ¼ ¼ w0 P 0: ^ ^ ^ ^ ^ ^ 2i @ u1i @ u2i @ u1i @ u2i @ u1i @ u

ð29Þ

Based on the above arguments, the copula estimation problem (25) can be approximated as

min

w;b;n;n

s:t:

c

T X 

T

 ni þ ni þ 12 w0 w

i¼1

C i  w0 /ðui Þ  b 6  þ ni ;

i ¼ 1; . . . ; T

C i þ w0 /ðui Þ þ b 6  þ ni ; i ¼ 1; . . . ; T  i Þ þ b; i ¼ 1; . . . ; 4M C i ¼ w0 /ðu 2

^

w0 @@u^1i/ð@uu^i2iÞ P 0; ni P 0;

i ¼ 1; . . . ; N 2

ni P 0;

8i ¼ 1; . . . ; T:

ð30Þ

239

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

3.2. Dual program

/ðui Þ0

The Lagrangian of the programming (30) is

Lðw; b; n; n ; a; a ; b; j; g; g Þ ¼

N  cX

T þ

i¼1

 1 ni þ ni þ w0 w þ 2

T X  

^ i Þ @ 2 /ðu ^ jÞ @ @ 2 /ðu ¼ ^ 1i @ u ^ 1j @ u ^ 2i @ u ^2j @u

T X

ai ðC i  w0 /ðui Þ  b    ni Þ

i¼1

0

ai C i þ w /ðui Þ þ b   

ni



4M X  i Þ  bÞ bi ðC i  w0 /ðu þ

ji w0

i¼1

N N X ^iÞ X @ 2 /ðu  gi ni  gi ni ^ 2i i¼1 b 1i @ u @u i¼1

ð31Þ

where the Lagrangian multipliers a ¼ ða1 ; . . . ; aT Þ0 2 RTþ , Tþ    0 a ¼ ða1 ; . . . ; aT Þ 2 R , b ¼ ðb1 ; . . . ; b4M Þ0 2 R4M and 0 N2 þ j ¼ ðj1 ; . . . ; jN2 Þ 2 R , where Rdþ denotes the set of d-dimensional column vectors with each element nonnegative real value. According to the Karush–Kuhn–Tucker conditions, we have T 4M N2 X X X ^Þ @L @ 2 /ðu  iÞ þ ðai  ai Þ/ðui Þ þ bi /ðu ji ^ ^i ð32Þ ¼0)w¼ @w @ ui1 @ ui2 i¼1 i¼1 i¼1

@L ¼0) @b

T X 

 

ai  ai

4M X bi ¼ 0 þ

i¼1

ð33Þ

i¼1

/ðui Þ0

min



ð35Þ

i¼1

0

þ 12 ða0  a0 b0

K11

T X 



ai  ai þ

i¼1

i¼1

^i; u ^jÞ @ 4 Kðu : ^1i @ u ^ 2i @ u ^ 1j @ u ^ 2j @u

ð38Þ

^ 1j Þðu2i  u ^2j Þ ðu1i  u

r4

ð39Þ

;

r

r

r

r

ð40Þ

For the polynomial kernel (24), we have

/ðui Þ0

 ^jÞ  @ 2 /ðu u u ^ j d  1i 2i  ; ¼ 1 þ u0i u ^ 1j @ u ^ 2j @u ^ d 1 þ u0 u i

ð41Þ

j

!

0

^j ^ i Þ @ 2 /ðu ^ jÞ   ^ 1j u ^ 2j u0i u @ 2 /ðu u1i u2i u ^ j d2 1 þ ¼ 1 þ u0i u þ  : ð42Þ 0 ^ ^ ^ ^ ^ 0 @ u1i @ u2i @ u1j @ u2j 1 þ ui uj ^j 2 1þuu i

b CðuÞ ¼

K12 K32

T X 



ai  ai Kðu; ui Þ þ

i¼1

4M X

iÞ þ bi Kðu; u

N2 X

ji /ðuÞ0

i¼1

i¼1

K13

10

K33

a  a b

j

ð43Þ

There is still one unknown b in Eq. (43). It can be easily determined  j , j 2 {1, . . . , 4M}, from the equal constraint for any u

1 C A

b u  jÞ  b ¼ Cð

4M X bi ¼ 0

ai ; ai 2 ½0; c=T; i ¼ 1; . . . ; T ji P 0; i ¼ 1; . . . ; N2 : where the block matrices are

i ¼ 1;

i; u jÞ ðK22 Þij ¼ Kðu 0

^ i Þ @ 2 /ðu ^ jÞ @ 2 /ðu ^ ^ 1j @ u ^ 2i @ u ^2j @ u1i @ u jÞ ðK12 Þij ¼ ðK21 Þji ¼ Kðui ; u

ðK33 Þij ¼

^jÞ @ 2 /ðu ^ 1j @ u ^ 2j @u 2 ^jÞ @ /ðu  i Þ0 ðK23 Þij ¼ ðK32 Þji ¼ /ðu ^ 1j @ u ^ 2j @u ðK13 Þij ¼ ðK31 Þji ¼ /ðui Þ0

The dual problem (36) is a convex quadratic programming with 2T + 4M + N2 decision variables and 1 + 4T + N2 linear constraints. Convex quadratic programming can be solved in polynomial time by interior points algorithms. Compared with the dual problem of -SVR (19) with 2T + 8M decision variables and 1 + 4T + 16M linear constraints, the extra computational complexity mainly depends on the number N2. Due to the implicit mapping function, two expressions still need to be evaluated:

T X 



ai  ai Kðu j ; ui Þ 

i¼1

ð36Þ

i¼1

ðK11 Þij ¼ Kðui ; uj Þ;

^ 2i ^ 1i @ u @u

¼

^ iÞ @ 2 /ðu  þ b: ^ 1i @ u ^ 2i @u

CB j0 ÞB @ K21 K22 K23 A@ K31

s:t

2j

^jÞ ^ j k2 @ 2 /ðu kui  u ¼ exp  2 ^ 1j @ u ^ 2j @u 2r

ð34Þ

4M T X X    ai  ai C i  bi C i þ  ai þ ai

i¼1

1j

( ) 0 ^ i Þ @ 2 /ðu ^jÞ ^ j k2 @ 2 /ðu kui  u ¼ exp  ^ 1i @ u ^ 1j @ u ^ 2i @ u ^ 2j @u 2r2 ( ) ^ 1j Þ2 ðu2i  u ^ 2j Þ2 ðu1i  u ^ 1j Þ2 ðu2i  u ^ 2j Þ2 1 ðu1i  u :    þ 4 6 6 8

So we have the following dual program

a;a ;b;j

  2 ^ ^ i Þ0 @@u^ /ð@uu^j Þ /ðu

The final estimator is

@L c ¼ 0 )  ai  gi ¼ 0 @ni T @L c ¼ 0 )  ai  gi ¼ 0: @ni T

T X 

2

ð37Þ

If Gaussian kernel (23) ( is applied, it )is easy to verify that

i¼1

N2 X



^jÞ ^ j Þ @ /ðui Þ0 /ðu ^jÞ @ 2 /ðu @ 2 Kðui ; u ¼ ¼ ^ 1j @ u ^ 1j @ u ^ 1j @ u ^ 2j ^2j ^ 2j @u @u @u 0

i¼1



 2



N2 X i¼1

4M X  j; u iÞ bi Kðu i¼1

2

^Þ @ /ðu ji /ðu j Þ0 ^ ^i : @ u1i @ u2i

ð44Þ

Compared with -SVR, KCR needs two additional hyper-parameters, M and N. It is self-evident that, the larger M, the better the four boundaries are fitted; the larger N, the better the 2-increasing shape restriction is satisfied. Large M and N can improve fitting performance, but will bring extra computational burden. In applications, one can starts with small M and N, such as M = N = 5, and then iteratively increases them until their marginal performance improvement is below a specified level. The copula estimation results of KCR based on the same data set in Section 2 are shown in Figs. 4 and 5. The 2-increasing constraints were imposed on an equidistant grid with N = 6. Other hyperparameters were identical to that of -SVR in Section 2. Fig. 4 clearly shows that the four boundaries of the estimator almost coincide with that of the true copula. We strongly believe that the nice result comes from the equal constraints on appended boundary points. The level curves in Fig. 5 also display nice 2increasing property. 4. Experiments Qualitative success of the KCR estimator has been clearly demonstrated by the figures in Sections 2 and 3. This section will illustrate its quantitative advantage by comparing its performance with that of one benchmark nonparametric estimator: kernel smoother, which has also been used as benchmark in [11,16]. Kernel smoother, also known as Nadaraya-Watson estimator [34,35], allows sam-

240

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

0.2

0.2

C(u ,0) 1

0.1

0.1

0

0

−0.1 −0.2

C(0,u ) 2

−0.1 u1 0

0.5

−0.2

1

1 C(u1,1)

0.5

1

0.8

0.6

0.6

0.4

0.4

0.2

0.2 u1

0 0.5

ð46Þ

 Gaussian kernel u2

0

1

K H ðu; ui Þ ¼ K h1 ðu1  u1i ÞK h2 ðu2  u2i Þ

where u1 and u2 are the coordinates of u, K h1 and K h2 are univariate kernels corresponding to the first and second coordinates respectively. Commonly used univariate kernel functions include

1 C(1,u2)

0.8

0

u2 0

where K is a kernel function that is a nonnegative symmetric, unimodal probability multivariate density function, and H is a bandwidth (or smoothing) 2  2 matrix that is symmetric and positive definite. Please do not confuse the kernel function K in kernel smoother and the kernel function K in support vector regression. Same as [10,16], we choose to work here with products of univariate kernels, i.e.

0

0.5

1

Fig. 4. Four boundaries of the KCR estimator. Dashed: true copula. Solid: estimated copula.

  1 s2 K h ðsÞ ¼ pffiffiffiffiffiffiffi exp  2 h 2p 2h

ð47Þ

 Epanechnikov kernel [36]

K h ðsÞ ¼

   0:75 s2 

s

1 2 I 61 h h h

ð48Þ

1

9

7

0.8

6

0.

0.8

0.3

5

0.6

0.4

0.6

0.5

2

0.

u

2

0.5

0.3 0.4

0.4 0.

1

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1 0

In this experiment five size-40 artificial data sets were generated from five copula functions: product copula, Gaussian copula with q = 0.5, Gaussian copula with q = 0.5, Student’s t copula with (q, m) = (0.5, 3) and Student’s t copula with (q, m) = (0.5, 5). In -SVR and KCR, both Gaussian kernel (23) and polynomial kernel (24) were used for evaluation. In -SVR, M was arbitrarily set as 5. All other hyper-parameters c, , r2 and p in -SVR were selected by the ordinary leave-one-out cross-validation. Their candidate sets were {10iji = 10, . . . , 10}, {2iji = 10, . . . , 1}, {2iji = 10, . . . , 10} and {1, . . . , 20}, respectively. In KCR, N was also arbitrarily set as 5. Other hyper-parameters M, c, , r2 and p were identical to that of -SVR. In kernel smoother, Gaussian kernel (47) and Epanechnikov kernel (48) were used. Kernel smoother has two hyper-parameters, h1 and h2, which play a crucial role in its training. Small h leads to a under-smoothed regression and large h leads to a over-smoothed regression. Same as -SVR, the two bandwidths were selected by ordinary leave-one-out cross-validation. Their candidate sets were same: {2iji = 10, . . . , 1}. Model performance was based on the out-of-sample Root Mean Square Error (RMSE)

0.

0.1

0.7

4.1. Monte Carlo analysis

0.

0.

0.4

0.2

0.9

0

0.1

0.2

0.3

0.4

0.5 u1

0.6

0.7

0.8

0.9

1

Fig. 5. Contour graph of the KCR estimator.

ples near the given point receive larger weights and estimates the copula value at u by

PT

b ðuÞ ¼ Pi¼1 C i K H ðu; ui Þ ; C T i¼1 K H ðu; ui Þ

ð45Þ

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 XI XI b RMSE ¼ ð Cði=I; j=IÞ  Cði=I; j=IÞÞ2 i¼0 j¼0 I2

ð49Þ

b is an estimator and C is the true copula. In this experiment, where C I = 1000. To decrease sampling error, for each copula 100

Table 1 Experiments on artificial data. Copula

Product

Student’s (q, m)

Gaussian (q)

Mean

Rank

0.5

0.5

(0.5,3)

(0.5,5)

6.59 ± 1.21 7.24 ± 1.23

9.44 ± 2.20 8.43 ± 2.61

5.71 ± 1.82 6.13 ± 1.54

8.53 ± 2.43 8.25 ± 2.32

6.62 ± 1.10 6.18 ± 1.41

7.40 7.25

6 5

-SVR Gaussian Polynomial

5.25 ± 1.59 5.31 ± 1.47

5.87 ± 2.15 6.47 ± 2.15

5.18 ± 1.70 5.15 ± 1.67

6.38 ± 2.61 6.62 ± 2.67

5.03 ± 1.16 4.93 ± 1.18

5.54 5.70

3 4

KCR Gaussian Polynomial

4.74 ± 0.84 4.82 ± 0.73

5.82 ± 1.13 5.43 ± 0.93

4.85 ± 0.71 5.11 ± 0.91

5.39 ± 1.20 5.20 ± 1.47

4.88 ± 0.94 4.66 ± 0.86

5.13 5.04

2 1

Kernel smooth Gaussian Epanechnikov

241

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

0.04

1

0.03 0.8

0.01

FTSE100

FTSE100

0.02

0 −0.01

0.6 0.4

−0.02 0.2

−0.03 −0.04

0

−0.06

−0.04

−0.02 0 S&P500

0.02

0.04

0.06

0

0.2

0.4 0.6 S&P500

0.8

1

0

0.2

0.4 0.6 S&P500

0.8

1

0

0.2

0.4 0.6 FTSE100

0.8

1

1

0.04 0.8

0.02

0.6 HSI

HSI

0 −0.02

0.4

−0.04 −0.06

0.2

−0.08 0

−0.1 −0.06

−0.04

−0.02 0 S&P500

0.02

0.04

0.06

1

0.04 0.8

0.02

0.6 HSI

HSI

0 −0.02

0.4

−0.04 −0.06

0.2

−0.08 0

−0.1 −0.04

−0.02

0 FTSE100

0.02

0.04

Fig. 6. Pair returns.

Fig. 7. Pair probabilities.

independent data sets were generated. Model comparisons were based on the average and standard deviation of the 100 RMSEs corresponding to these 100 data sets. Experimental results on these artificial data sets are shown in Table 1. In each notations a ± b, a and b are the average and

standard deviation of the 100 RMSEs corresponding to these 100 data sets, respectively. In the table, the first and second best scores are shown in bold and italic, respectively. The experimental results shown in Table 1 clearly verify the advantage of KCR over other methods. In the five copula functions,

242

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

KCR uniformly achieved best results. KCR with polynomial kernel obtained 3 firsts and 2 s and achieved the overall first best with average RMSE 5.04%. KCR with Gaussian kernel obtained 2 firsts and 3 s and achieved the overall second best with average RMSE 5.13%. -SVR with Gaussian kernel and -SVR with polynomial kernel achieved the third and fourth bests if model performance is measured with average RMSE. Two-tail t-test with different deviations was employed to test their significances. All other models were surpassed by the best model at 5% significance level due to the large sample size (100) of the t-test. 4.2. Empirical analysis This subsection illustrates the experiment on weekly returns of three major stock indexes: United States S&P500, United Kingdom FTSE100 and Hongkong HSI, which are usually regarded as representations of American, European and Asian stock markets. This data includes 518 weekly logarithmic returns recorded from January 2000 to December 2010. One can freely download these time series from finance.google.com or finance.yahoo.com. We will estimate three copula functions corresponding to the three pairs of returns: S&P500-FTSE100, S&P500-HSI and FTSE100-HSI. Their pair scatter plots with marginal histograms are shown in Fig. 6a–c. They clearly show that S&P500-FTSE100 has highest dependence, while S&P500-HSI has the lowest dependence. This also can be confirmed by their linear correlation ratios. S&P500-FTSE100, S&P500-HSI and FTSE100-HSI have linear correlation ratios 0.414, 0.113 and 0.367, respectively. Different from the experiment on artificial data sets, this experiment on real data sets must estimate the three marginal distributions before copula estimation. For simplicity, this paper used empirical distribution method to estimate their marginal distributions. If the estimated marginal distribution for a return time series rt, t = 1, . . ., T, is b F ðÞ, the probability time series is b F ðrt Þ; t ¼ 1; . . . ; T. Their pair scatter plots with marginal histograms are shown in Fig. 7a–c. Kolmogorov–Smirnov test results showed that all three probability series followed the [0, 1] uniform distribution at 1% significance level. Ljung-box test results showed that they were independently distributed at 1% significance level. So the distributions of these three probability times series may be regarded as being close to the i.i.d. [0, 1] uniform distribution. S&P500-FTSE100, S&P500-HSI and FTSE100-HSI have Spearman’s qS ratios 0.423, 0.136 and 0.353, respectively. Every total data set was partitioned into two parts: one with size 100 for training and the other with size 418 for test. To decrease of error caused by this random partition, we executed the above partition 10 times and measured model performance with the average RMSE of 10 experiments corresponding to these 10 partitions. Due to the fact that the true copula underlying the observations are never known, we cannot directly use Eq. (49) to measure performance. Because the test data had large size 418, this experiment replaced every true copula with its empirical copula based on the size-418 test data. That’s why the size of test data was set much larger than that of the training data. Thus RMSE is defined as follows in this experiment:

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 XT test b  i Þ2 ;  iÞ  C RMSE ¼ ð Cðu i¼1 T test

for all 10 possible choices for the held-out group. The hyperparameters were chosen to minimize the average RMSE of tenround trails. In both estimation and validation steps, empirical copula value for each point was computed based on the total training data. The candidate sets for hyper-parameters c, , r2 and p are same as that in SubSection 4.1. In KCR, the two hyper-parameters for shape control were set as M = N = 10. We found that larger M and N could achieve only minor improvement on validation performance (below 104). Other hyper-parameters were set same as that of -SVR. In this experiment, we also compared KCR with 6 parametric models: product, Gaussian, Student’s t, Clayton, Frank, Gumbel. Except for product copula, the parameters of all copula families were obtained by MLE. For these 6 parametric models, the total training data were used for MLE, because no hyper-parameter needs crossvalidation. Experimental results are shown in Table 2. Same as Table 1, the notation a ± b means a and b are the average and standard deviation of the 10 RMSEs corresponding to these 10 partitions, respectively. In the table, the first and second best scores are shown in bold and italic, respectively. Two-tail t-test with different deviations was employed to test statistical significances. All scores that are different from the best performance at 5% significance level are underlined. In the experiment on S&P500-FTSE100 pair, KCR with Gaussian kernel achieved the best performance 1.95%. The second and third bests were achieved by Clayton model and KCR with polynomial kernel, respectively. In the experiment on S&P500-HSI pair, KCR with Gaussian kernel, KCR with polynomial kernel and Clayton model achieved the first, second and third best performances, respectively. In the experiment on FTSE100-HSI pair, KCR with polynomial kernel achieved the best performance. The second performance was achieved by both Clayton method and KCR with Gaussian kernel with score 1.89%. In this experiment, KCR with Gaussian kernel achieved 2 first places and 1 s place and obtained the minimum average RMSE among the 13 methods. KCR with polynomial kernel gained the second best average RMSE. Clayton model achieved 2 s best places and achieved the overall third place. Because Clayton has a nonzero lower tail, which reflects the possibility of co-crash among stock markets, it is widely accepted in financial dependence modeling.

Table 2 Experiments on financial data.

Parametric Product Gaussian Student’s t Clayton Frank Gumbel

ð50Þ

b is the copula where Ttest is the number of samples in the test data, C  i ’s empirical copula based on the test estimator, Cˇi is the test point u data. In -SVR, M is 10 and the hyper-parameters c, , r2 and p were determined by 10-fold cross-validation. Every size-100 training data were divided into ten equal-size parts: one for validating and another 9 for estimation. This procedure was then repeated

S&P500FTSE100

S&P500HSI

FTSE100HSI

Mean

Rank

4.89 ± 0.27 2.61 ± 1.06

1.74 ± 0.21

12

2.44 ± 0.61 2.19 ± 0.63 2.60 ± 0.78 2.42 ± 0.95

4.04 ± 0.41 2.18 ± 0.66

3.56

1.76 ± 0.61 1.67 ± 0.55

2.18

7

1.96 ± 0.32

2.02

4

1.41 ± 0.30 1.55 ± 0.53

1.89 ± 0.55 2.02 ± 0.61

1.83 2.06

3 5

1.65 ± 0.31

2.38 ± 0.51

2.15

6

Kernel smoother Gaussian 2.80 ± 0.60 Epanechnikov 2.84 ± 0.66

2.73 ± 0.93

2.64 ± 0.73

2.72

8

3.21 ± 0.69

2.99 ± 0.81

3.01

10

-SVR Gaussian

2.71 ± 0.71

2.61 ± 0.87

11

2.81 ± 0.79

3.16 ± 0.73

3.78 ± 1.48 2.54 ± 0.77

3.03

Polynomial

2.84

9

KCR Gaussian Polynomial

1.95 ± 0.29 2.21 ± 0.36

1.26 ± 0.33 1.32 ± 0.60

1.89 ± 0.51 1.82 ± 0.77

1.70 1.78

1 2

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

5. Conclusion and future work 5.1. Conclusion This paper proposes a novel nonparametric bivariate copula estimation based on shape-restricted -support vector regression. This estimator smooths empirical copula with -SVR. But different from the classical -SVR, this method has additional constrains related to the three shape restrictions of copula functions. The additional shape constraints are used mainly for the determination of the shapes of the estimated surface, while the classical constraints are used to determine the location of the surface. The novelty of the estimator lies on three points. First, it has nonlinear smooth regression capability with the help of kernel trick. Second, it imposes equality constraints on boundary points to make the estimator grounded and marginal. Third, it imposes 2-increasing constraints on equidistant grid points to make its surface 2increasing. Its dual problem is a convex quadratic programming, which is computationally tractable. Qualitative results of experiments on a toy data set clearly showed that the KCR estimator nearly satisfied the three shape restrictions of copula functions: grounded, marginal and 2-increasing. We also compared its performance, measured by average RMSE, with kernel smoother, which is regarded as a state-of-theart copula estimation. The out-of-sample results obviously showed that the KCR method could achieve significantly better performance. The pair returns between three major stock indexes, S&P500, FTSE100 and HSI, were also used to test its applicability in financial risk management. Experimental results clearly demonstrated that this estimator could achieve significantly better performance than 6 parametric models and kernel smoother. 5.2. Future work Possible future work includes three directions. The first is the extension to multivariate copula. d-dimensional copula should satisfy the d-increasing condition, i.e. for each hyper-rectangle B ¼ di¼1 ½ui ; v i  # ½0; 1d , the C-volume of B is nonnegative:

X

ð1ÞNðzÞ CðzÞ P 0;

ð51Þ

z2di¼1 fui ;v i g

where N(z) = #{k:zk = xk}. According to [6], if the joint distribution function C has dth order derivatives, the d-increasing restriction is equivalent to the nonnegativity requirement of d-order mixed derivatives

@ 2 Cðu1 ; . . . ; un Þ P 0: @u1    @un

ð52Þ

So it is straightforward to extend KCR from bivariate copula to multivariate copula. It is also easy to evaluate the two expression Eqs. (37) and (38) for both RBF kernel and polynomial kernel. But its performance is open for further experimental research. The second is copula density estimation. The density c(u1, u2) associated with a copula C(u1, u2) is

cðu1 ; u2 Þ ¼

@ 2 Cðu1 ; u2 Þ : @u1 @u2

ð53Þ

Though copula density estimation can be obtained directly by KCR b , i.e., differentiating C b with respect to u1 and u2, it is estimator C widely believed in nonparametric statistics that density estimation is usually more involved than cumulative probability function estimation [37]. Efficient copula density estimation algorithms based on kernel regression call for further exploration. The third is computational complexity. For bivariate copula estimation, the training of KCR involves solving a convex quadratic

243

programming with 2T + 4M + N2 decision variables and 1 + 4T + N2 linear constraints. Additional computational complexity over the classical -SVR mainly comes from the term N2, which is the number of equidistant grid points. But, when one extends it to d-dimensional copula estimation, the number of equidistant grid points will be Nd. This exponentially increasing computational complexity requires some adaptive or iterative algorithms. Acknowledgments The author thanks the anonymous referees for their valuable comments and suggestions, which improved the technical content and the presentation of the paper. The work is supported by National Natural Science Foundation of China (71101127), Social Sciences Foundation of Chinese Ministry of Education (10YJC790265), Zhejiang Natural Science Foundation (Y7080205) and Zhejiang Province Universities Social Sciences Key Base (Finance). References [1] A. Sklar, Fonctions de répartition àn dimensions et leurs marges, Publications de l’ Institut Statistique de l’Universitè de Paris 8 (1959) 229–231. [2] E. Klement, R. Mesiar, On the axiomatization of some classes of discrete universal integrals, Knowledge-Based Systems (2012) 13–18. [3] E. Castin ß eira, C. Torres-Blanc, S. Cubillo, Measuring contradiction on a-ifs defined in finite universes, Knowledge-Based Systems (2011) 1297–1309. [4] U. Cherubini, E. Luciano, W. Vecchiato, Copula Methods in Finance, John Wiley & Sons, West Sussex, England, 2004. [5] A. McNeil, R. Frey, P. Embrechts, Quantitative Risk Management: Concepts, Techniques and Tools, Princeton University Press, Princeton, New Jersey, 2005. [6] N. Kolev, U. Anjos, B. Mendes, Copulas: a review and recent developments, Stochastic Models 22 (2006) 617–660. [7] D. Li, On default correlation: a copula function approach, Journal of Fixed Income 9 (2000) 43–54. [8] P. Deheuvels, A Kolmogorov-Smirnov type test for independence and multivariate samples, Revue Roumaine de Mathèmatiques Pures et Appliquèes 26 (1981) 213–226. [9] P. Deheuvels, Non parametric tests of independence, in: J. Raoult (Ed.), Statistique non Paramètrique Asymptotique, Springer Berlin, Heidelberg, 1980, pp. 95–107. [10] J. Fermanian, O. Scaillet, Nonparametric estimation of copulas for time series, Journal of Risk 5 (2003) 25–54. [11] X. Shen, Y. Zhu, L. Song, Linear B-spline copulas with applications to nonparametric estimation of copulas, Computational Statistics & Data Analysis 52 (2008) 3806–3819. [12] D. Dimitrova, V. Kaishev, S. Penev, GeD spline estimation of multivariate Archimedean copulas, Computational Statistics & Data Analysis 52 (2008) 3570–3582. [13] P. Lambert, Archimedean copula estimation using Bayesian splines smoothing techniques, Computational Statistics & Data Analysis 51 (2007) 6307–6320. [14] J. Hernández-Lobato, A. Suárez, Semiparametric bivariate archimedean copulas, Computational Statistics & Data Analysis 55 (2011) 2038–2058. [15] C. Genest, E. Masiello, K. Tribouley, Estimating copula densities through wavelets, Insurance: Mathematics and Economics 44 (2009) 170–181. [16] P. Morettin, C. de Castro Toloi, C. Chiann, J. de Miranda, Wavelet smoothed empirical copula estimators, Brazilian Review of Finance 8 (2010) 263–281. [17] F. Autin, E. Le Pennec, K. Tribouley, Thresholding methods to estimate copula density, Journal of Multivariate Analysis 101 (2010) 200–222. [18] H. Brunk, Maximum likelihood estimates of monotone parameters, Annals of Mathematical Statistics 26 (1955) 607–616. [19] C. Hildreth, Point estimates of ordinates of concave functions, Journal of the American Statistical Association 49 (1954) 598–619. [20] A. Antoniadis, J. Bigot, I. Gijbels, Penalized wavelet monotone regression, Statistics & Probability Letters 77 (2007) 1608–1621. [21] E. Mammen, C. Thomas-Agnan, Smoothing splines and shape restrictions, Scandinavian Journal of Statistics 26 (1999) 239–252. [22] M. Meyer, Inference using shape-restricted regression splines, Annals of Applied Statistics 2 (2008) 1013–1033. [23] I. Chang, L. Chien, C. Hsiung, C. Wen, Y. Wu, Shape restricted regression with random Bernstein polynomials, in: Complex Datasets and Inverse Problems: Tomography, Networks and Beyond, Institute of Mathematical Statistics, 2007, pp. 187–202. [24] S. Curtis, S. Ghosh, A variable selection approach to monotonic regression with Bernstein polynomials, Journal of Applied Statistics 38 (2011) 961–976. [25] K. Pelckmans, M. Espinoza, J. Brabanter, J. Suykens, B. Moor, Primal-dual monotone kernel regression, Neural Processing Letters 22 (2005) 171–182. [26] I. Takeuchi, Q. Le, T. Sears, A. Smola, Nonparametric quantile estimation, Journal of Machine Learning Research 7 (2006) 1231–1264.

244

Y. Wang et al. / Knowledge-Based Systems 35 (2012) 235–244

[27] Z. Sun, Z. Zhang, H. Wang, M. Jiang, Cutting plane method for continuously constrained kernel-based regression, IEEE Transactions on Neural Networks 21 (2010) 238–247. [28] Y. Wang, N. He, Multivariate convex support vector regression with semidefinite programming, Knowledge-Based Systems 30 (2012) 87–94. [29] F. Lauer, G. Bloch, Incorporating prior knowledge in support vector regression, Machine Learning 70 (2008) 89–118. [30] H. Drucker, C. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vector regression machines, in: M.C. Mozer, M.I. Jordan, T. Petsche (Eds.), Advances in Neural Information Processing Systems, vol. 9, MIT Press, Cambridge,MA, 1997, pp. 155–161. [31] A. Smola, B. Schölkopf, A tutorial on support vector regression, Statistics and Computing 14 (2004) 199–222.

[32] Y. Xu, L. Wang, A weighted twin support vector regression, Knowledge-Based Systems (2012). [33] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY, 2000. [34] E. Nadaraya, On estimating regression, Theory and Probability Application 9 (1964) 157–159. [35] G. Watson, Smooth regression analysis, Sankhya 26 (1964) 359–372. [36] V. Epanechnikov, Non-parametric estimation of a multivariate probability density, Theory of Probability and its Applications 14 (1969) 153. [37] L. Wasserman, All of Nonparametric Statistics, Springer-Verlag, New York, NY, 2006.