Discriminant sparse local spline embedding with application to face recognition

Discriminant sparse local spline embedding with application to face recognition

KNOSYS 3191 No. of Pages 9, Model 5G 1 July 2015 Knowledge-Based Systems xxx (2015) xxx–xxx 1 Contents lists available at ScienceDirect Knowledge-...

1MB Sizes 1 Downloads 169 Views

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 Knowledge-Based Systems xxx (2015) xxx–xxx 1

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys 5 6

4

Discriminant sparse local spline embedding with application to face recognition

7

Ying-Ke Lei a,b,c,⇑, Hui Han a, Xiaojun Hao a

3

8 9 10

11 1 8 3 2 14 15 16 17 18 19 20 21 22 23 24 25 26 27

a

The State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System (CEMEE), Luoyang, Henan 471003, China Science and Technology on Communication Information Security Control Laboratory, Jiaxing, Zhejiang 314000, China c Electronic Engineering Institute, Hefei, Anhui 230037, China b

a r t i c l e

i n f o

Article history: Received 25 September 2014 Received in revised form 16 June 2015 Accepted 24 June 2015 Available online xxxx Keywords: Sparse neighborhood graph Local spline embedding Sparse subspace learning Maximum margin criterion Discriminant sparse local spline embedding Manifold learning Face recognition

a b s t r a c t In this paper, an efficient feature extraction algorithm called discriminant sparse local spline embedding (D-SLSE) is proposed for face recognition. A sparse neighborhood graph of the input data is firstly constructed based on a sparse representation framework, and then the low-dimensional embedding of the data is obtained by faithfully preserving the intrinsic geometry of the data samples based on such sparse neighborhood graph and best holding the discriminant power based on the class information of the input data. Finally, an orthogonalization procedure is perfomred to improve discriminant power. The experimental results on the two face image databases demonstrate that D-SLSE is effective for face recognition. Ó 2015 Published by Elsevier B.V.

29 30 31 32 33 34 35 36 37 38

39 40 41

1. Introduction

42

It is well known that there are large volumes of high-dimensional data in numerous real-world applications. Operating directly on such high-dimensional image space is ineffective and may lead to high computational and storage demands as well as poor performance. From the perspective of pattern recognition, dimensionality reduction is a typical way to circumvent the ‘‘curse of dimensionality’’ problem [1] and other undesired properties of high-dimensional spaces. The goal of dimensionality reduction is to construct a meaningful low-dimensional representation of high-dimensional data. Ideally the reduced representation in the low-dimensional space should have a dimensionality that corresponds to the intrinsic dimensionality of the data. Researchers have developed many useful dimensionality reduction techniques. These techniques can be broadly categorized into two classes: linear and nonlinear. Classical linear dimensionality reduction approaches seek to find a meaningful low-dimensional subspace in a high-dimensional input space by linear

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

⇑ Corresponding author at: The State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System (CEMEE), Luoyang, Henan 471003, China. Tel.: +86 0551 65591108. E-mail address: [email protected] (Y.-K. Lei).

transformation. This subspace can provide a compact representation of high-dimensional input data when the intrinsic structure of data embedded in the input space is linear. Among them, the most well known are principal component analysis (PCA) [2] and linear discriminant analysis (LDA) [3]. Linear models have been extensively used in pattern recognition and computer vision areas and have become the most popular techniques for face recognition [4–8]. Linear techniques, however, may fail to discover the intrinsic structures of complex nonlinear data. In order to address this problem, a number of nonlinear manifold learning techniques have been proposed under the assumption that the input data set lies on or near some low-dimensional manifold embedded in a high-dimensional unorganized Euclidean space [9]. The motivation of manifold learning techniques is straightforward as it seeks to directly find the intrinsic low-dimensional nonlinear data structures hidden in the observation space. Examples include isometric feature mapping (ISOMAP) [10], locally linear embedding (LLE) [11], Laplacian eigenmaps (LE) [12], Hessian-based locally linear embedding (HLLE) [13], maximum variance unfolding (MVU) [14], manifold charting [15], local tangent space alignment (LTSA) [16], Riemannian manifold learning (RML) [17], and local spline embedding (LSE) [18], elastic embedding (EE) [19], Cauchy graph embedding (CGE) [20], adaptive manifold learning [21], and neighborhood preserving polynomial embedding (NPPE) [22].

http://dx.doi.org/10.1016/j.knosys.2015.06.016 0950-7051/Ó 2015 Published by Elsevier B.V.

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 2 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148

Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx

Each manifold learning algorithm attempts to preserve a different geometrical property of the underlying manifold. Local approaches, such as LLE, HLLE, LE, LTSA, and LSE, aim to preserve the proximity relationship among the data, while global approaches like ISOMAP and LOGMAP aim to preserve the metrics at all scales. Some experiments have shown that these methods can find perceptually meaningful embeddings for face or digit images. They also do yield impressive results on other artificial and real-world data sets. However, these manifold learning methods have to confront with the out-of-sample problem when they are applied to pattern recognition. They can yield an embedding directly based on the training data set, but, because of the implicitness of the nonlinear map, when applied to a new sample, they cannot find the image of the sample in the embedding space. It limits the applications of these algorithms to pattern recognition problems. To overcome the drawback, Bengio et al. proposed a kernel method to embed the new data points by utilizing the generalization ability of Mercer kernel [23]. He et al. proposed a method named locality preserving projection (LPP) to approximate the eigen-functions of the Laplace–Beltrami operator on the manifold and the new testing points can be mapped to the learned subspace without trouble [24]. Yan et al. utilized the graph embedding framework for developing a novel algorithm called marginal Fisher analysis (MFA) to solve the out-of-sample problem [25]. Recently, sparse representation has attracted considerable interests in machine learning and pattern recognition. Some researchers proposed some new methods integrating the theory of sparse representation and subspace learning. They are considered as a special family of dimensionality reduction methods which consider ‘‘sparsity’’. It has either of the following two characteristics: (1) Finding a subspace spanned by sparse base vectors. The sparsity is enforced on the projection vectors and associated with the feature dimensionality. The representative techniques are sparse principal component analysis (SPCA) [26] and nonnegative sparse PCA [27]. (2) Aiming at the sparse reconstructive weight which is associated with the sample size. The representative methods include sparse neighborhood preserving embedding (SNPE) [28] and sparsity preserving projections (SPP) [29]. In this paper, inspired by the idea of LSE [18] and sparse representation, we propose a novel sparse subspace learning technique, called discriminant sparse local spline embedding (D-SLSE). Specifically, A sparse neighborhood graph of the input data is firstly constructed based on a sparse representation framework, and then the low-dimensional embedding of the data is obtained by faithfully preserving the intrinsic geometry of the data samples based on such sparse neighborhood graph and best holding the discriminant power based on the class information of the input data. Finally, an orthogonalization procedure is perfomred to improve discriminant power. We now enumerate several characteristics of our proposed algorithm as follows: (1) D-SLSE does not have to encounter setting the neighborhood size in constructing a neighborhood graph incurred in LSE. An unsuitable neighborhood may result in ‘‘short-circuit’’ edges (see Fig. 1a) or a large number of disconnected regions (see Fig. 1b). In contrast, graph construction based on sparse representation makes our proposed method very simple to use in practice. (2) D-SLSE computes an explicit linear mapping from the input space to the reduced space, which attempts to manage the trade-off between holding discriminant power and preserving local geometry structure. (3) D-SLSE seeks to find a set of orthogonal basis functions and significantly improves its recognition accuracy.

(a) K=8

(b) K=2

Fig. 1. The K-nearest neighborhood graph of Swiss roll data. (a) Short-circuit edge. (b) Disconnected regions. 149

The rest of this paper is organized as follows: The D-SLSE algorithm is developed in Section 2. Section 3 demonstrates the experimental results. Finally, conclusions are presented in Section 4.

150

2. Discriminant sparse local spline embedding

153

2.1. Local spline embedding

154

Xiang et al. [18] proposed a general dimensionality reduction framework called compatible mapping. They used the compatible mapping framework as a platform and developed a novel local spline embedding (LSE) manifold learning algorithm. This method includes two steps: part optimization and whole alignment. Each data point is represented in different local coordinate systems by part optimization. But its global coordinate should be maintained unique. Whole spline alignment is used to achieve this goal. The algorithmic procedure is listed as follows:

155

1. Constructing the adjacency graph: Let G denote a graph with n nodes. We use KNN criterion to construct the adjacency graph, i.e., putting an edge between nodes i and j if i is among k nearest neighbors of j or j is among k nearest neighbors of i. 2. Obtaining tangent coordinates: For each data point xi , Let

164

X i ¼ ½xi1 ; xi2 ; . . . ; xik  2 RDk denote its k nearest neighbors. Perform a singular decomposition of the centralized matrix of X i , we have

169

2

X

X i Hk ¼ U i 4

3 5V T ;

i

i

i ¼ 1; . . . ; n;

156 157 158 159 160 161 162 163

165 166 167 168

170 171

172

174

where Hk ¼ I  ek eTk =k is the centering operator, I is a k  k identity matrix, ek is a k-dimensional vector with ek ¼ ½1; P 1; . . . ; 1T 2 Rk ; i ¼ diagðr1 ; . . . ; rk Þ contains the singular values in descending order. U i is a D  D matrix whose column vectors are the left singular vectors, and V i is a k  k matrix whose column vectors are the right singular vectors. The local tangent coordinates Hi of X i can be obtained from the following formula:

Hi ¼ ðU i Þ X i Hk ¼

152

ð1Þ

0ðDkÞk

T

151

ðiÞ ðiÞ ðiÞ ½h1 ; h2 ; . . . ; hk ;

i ¼ 1; . . . ; n;

175 176 177 178 179 180 181

182

ð2Þ

184

where hj is the local tangent coordinate of the jth nearest neigh-

185

bor of the data point xi . 3. Aligning global coordinates: For the ith local tangent space

186

ðiÞ

projection Hi , let Y i ¼ ½yi1 ; yi2 ; . . . ; yik  2 Rdk contain the corresponding global coordinates of the k data points. Further, ðrÞ

ðrÞ

ðrÞ

denote the rth row of Y i by ½yi1 ; yi2 ; . . . ; yik . We determine ðrÞ

the d spline functions g i : Rd # R; r ¼ 1; 2; . . . ; d, such that the coordinate components can be faithfully mapped: ðrÞ y ij

¼

ðrÞ ðiÞ g i ðhj Þ;

j ¼ 1; 2; . . . ; k:

ð3Þ

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

187 188 189 190 191 192

193 195

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx

3

196

However graph construction based on k-nearest-neighbor or 197 198 199

200

202

Because

ðrÞ yij

and

ðrÞ gi

are unknown, Solving the Eq. (3) is an

ill-posed problem. Xiang et al. [18] demonstrate that the spline developed in Sobolev space meets our requirements: l k X X g ðrÞ ðtÞ ¼ bri pi ðtÞ þ arj /j ðtÞ; i¼1

r ¼ 1; 2; . . . ; d;

ð4Þ

j¼1

1Þ!Þ; fpi ðtÞgli¼1

203

where l ¼ ðd þ s  1Þ!=ðd!ðs 

204

in Rd of total degree less than s, and /j ðtÞ is a Green’s function [18].

205

The coefficients a ¼ ½a1 ; a2 ; . . . ; ak T 2 Rk and b ¼ ½b1 ; b2 ; . . . ;

206

bl T 2 Rl of the d spline functions in Sobolev space can be solved via the following linear equations:

207

208

210 211

212 214 215 216 217 218

219

Ai 

a1 ; . . . ; ad b1 ; . . . ; bd

Ai ¼



223

226

K

P

T

0

P

0

ð5Þ

;



ðkþlÞðkþlÞ

2R

ð6Þ

;

in which K is k  k symmetrical matrix with elements K ij ¼ /j ðt i Þ, and P is a k  l matrix with elements P ij ¼ pj ðt i Þ. ð1Þ

ðdÞ

We intend to construct the d spline functions g i ; . . . ; g i by minimizing the following regularization reconstruction error:

221

224

¼

Y Ti

!

where

EðY i Þ  222

!

are a set of polynomials

d X k d X X T ðrÞ ðrÞ ðiÞ 2 ðyij  g i ðhj ÞÞ þ k ðar Þ K ar : r¼1 j¼1

r¼1

Xiang et al. [18] proved that if the parameter k is small enough, the objective function in Eq. (7) can be converted to the following form:

EðY i Þ /

d X T ðar Þ K ar ¼ trðY i Bi Y Ti Þ;

ð8Þ

r¼1

227

where trðÞ is the trace operator, and Bi is the upper left k  k

228

subblock of A1 i . Summing all the reconstruction errors together, we have

229

230

232

n X EðYÞ ¼ trðYSi Bi STi Y T Þ ¼ trðYSBST Y T Þ ¼ trðYMY T Þ;

ð9Þ

where Si be a column selection vector such that YSi ¼ Y i ;

234

S ¼ ½S1 ; . . . ; Sn ; B ¼ diagðB1 ; . . . ; Bn Þ, and M ¼ SBST . In order to avoid degenerate solutions, we add a constraint T

238

YY ¼ I. Then, the minimum of EðYÞ for the d-dimensional global embedding is given by the d eigenvectors of the matrix M, corresponding to the 2nd to ðd þ 1Þst smallest eigenvalues of M.

239

2.2. Graph construction based on sparse representation

240

It is well known that the pioneering works on manifold learning, e.g., ISOMAP, LLE, LE, HLLE, all rely on neighborhood graph construction. There exist two popular ways for graph construction:

236 237

241 242 243 244 245 246 247 248

(1) e-neighborhoods. Nodes i and j are connected by an edge if   xi  xj 2 < e where the norm is the usual Euclidean norm in RD . (2) k nearest neighbors. Nodes i and j are connected by an edge if i s among k nearest neighbors of j or j is among k nearest neighbors of i.

249 250 251

254

Given a signal (or an image with a vector pattern) x 2 RD , and a

269

matrix X ¼ ½x1 ; x2 ; . . . ; xn  2 RDn containing the elements of an over-complete dictionary in its columns, the goal of sparse representation is to represent x using as few entries of X as possible. The objective function can be described as follows:

270

255 256 257 258 259 260 261 262 263 264 265 266 267 268

271 272 273

274

min ksi k0 si

ð10Þ

s:t: xi ¼ Xsi

276 277

278

min ksi k0 si

s:t: kxi  Xsi k  e

ð11Þ

n

280

where si 2 R is the coefficient vector, and ksi k0 is the pseudo-‘0 norm which is equal to the number of non-zero components in si . Such an optimization problem is in general non-convex and NP-hard. This difficulty can be solved by the convex ‘1 optimization [30]. After repeating the l1 minimization problem for each point, the

281

sparse weight matrix can be expressed as S ¼ ½s1 ; . . . ; sn T . Then, the new constructed graph is G ¼ fX; Sg, where X is the training sample set and S is the edge weight matrix. Compared with k-nearest-neighbor and e-ball, graph construction based on sparse representation has the following characteristics:

286

(1) Datum-adaptive neighborhood. Graph construction based on sparse representation does not need to determine the model parameter such as the neighborhood size k of k-nearest-neighbor or the radius e of e-ball. In fact, the data distribution probability may vary greatly at different areas of the data space, which results in distinctive neighborhood structure for each instance. That is to say, graph construction with datum-adaptive neighborhoods is desired. Obviously, the advantage of automatically finding dynamic locality relations makes sparse representation easy to meet the goal. However, both k-nearest-neighbor and e-ball based methods use a predefined parameter to determine the neighborhoods for all the data. It seems to be unreasonable that all data points share the same parameter for k-nearest-neighbor or e-ball based methods, which may not characterize the manifold structure well, especially in under sampling case. (2) Robustness to data noise. The data noises are inevitable especially for visual data, and the robustness is a desirable property for a satisfactory graph construction method. The graph constructed by k-nearest-neighbor or e-ball method is in genral founded according to the pair-wise Euclidean distance, which is very sensitive to data noise. It means that the graph structure is easy to change when some

291

282 283 284 285

287 288 289 290

i¼1

233

235

253

size, which is generally difficult to set in practice. A large neighborhood might result in ‘‘short-circuit’’ edges, which drastically destroys the topological connectivity of the original manifold data. In contrast, a small neighborhood might separate the manifold into a large number of disconnected regions. Instead of the k-nearest-neighbor and e-ball graph constructions, we attempt to automatically construct a graph G based on sparse representation and make it well preserve the discriminative information. In the past few years, sparse representation has attracted a great deal of attentions, which was initially proposed as an extension of traditional signal processing methods such as Fourier and wavelet. Sparse representation is used to search for the most compact representation of a signal in terms of linear combination of patterns in an over-complete dictionary. Sparse representation has compact mathematical expression.

Or,

ð7Þ

252

e-ball has to encounter model parameter, i.e., the neighborhood

Then the graph edge weights are set by various approaches, e.g., binary, Gaussian-kernel and ‘2 -reconstruction.

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

292 293 294 295 296 297 298 299 300 301 302 303 304

305 306 307 308 309 310 311 312 313

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 4

Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx

unfavorable noises come in. However, the graph based on sparse representation utilizes the overall contextual information and explicitly considers data noises, which makes it robust to data noise.

314 315 316 317

Datum-adaptive neighborhood structure based on sparse representation

318 319

2.3. Discriminant sparse local spline embedding

yi1

320

In this subsection, we will discuss the solution of D-SLSE. Given

321

a set of training samples fxi gni¼1 , where xi 2 RD , let X ¼ ½x1 ;

322

x2 ; . . . ; xn  2 RDn be the data matrix including all the training samples in its columns. The goal of D-SLSE is to find a linear transformation matrix which can capture the local geometry of the data samples in the reduced space and maximize the margin between different classes simultaneously. We firstly construct a neighborhood graph based on sparse representation. For each sample xi , a sparse reconstructive weight vector si is sought by the following modified ‘1 minimization problem [29]:

323 324 325 326 327 328 329 330

xi1

xi 3

yi 3

y i1

x i1 x i2

xi

yi y i2

yi 2

xi 2

Fig. 2. Datum-adaptive neighborhood structure optimization of D-SLSE. The measurements with the same shape and color come from the same class.

331

min jjsi jj1 si

ð12Þ

s:t: xi ¼ Xsi 333

T

1 ¼ 1 si Or,

334

335

min jjsi jj1 si

ð13Þ

s:t: jjxi  Xsi jj  e 337 338 339 340 341 342 343

1 ¼ 1T si T

where si ¼ ½si;1 ; . . . ; si;i1 ; 0; si;iþ1 ; . . . ; si;n  is an n-dimensional vector in which the ith element is equal to zero (implying that xi is removed from XÞ, and the elements sij ðj – iÞ denote the contribution of each xj to reconstructing xi ; 1 2 Rn is a vector of all ones. By implementing the l1 minimization problem for all the data points, we obtain a new neighborhood graph G ¼ fX; Sg in which

351

S ¼ ½s1 ; . . . ; sn T . Once the sparse neighborhood graph G is constructed, each sample has its own datum-adaptive neighborhood structure. For a given data point xi , according to the label information, we can divide its adaptive neighbors into two groups: samples in the same class with xi and samples from different classes with xi . Let xi1 ; xi2 ; . . . ; xiki denote xi ’s neighbor measurements of a same class and xi1 ; xi2 ; . . . ; ximi denote xi ’s neighbor measurements of different

352

classes. By putting xi1 ; xi2 ; . . . ; xiki , and xi1 ; xi2 ; . . . ; ximi together,

353

we can build the datum-adaptive neighborhood structure for the data point xi as X i ¼ ½xi1 ; xi2 ; . . . ; xiki ; xi1 ; xi2 ; . . . ; ximi .

344 345 346 347 348 349 350

354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371

For each datum-adaptive neighborhood structure, the corresponding output in the low dimensional space is denoted by Y i ¼ ½yi1 ; yi2 ; . . . ; yiki ; yi1 ; yi2 ; . . . ; yim . In the low-dimensional i

space, we expect that local geometry structure between the given data point and the neighbor measurements of a same class are preserved, while distances between the given data point and the neighbor measurements of different classes are as far as possible. Fig. 2 illustrates this idea. The left part of the figure shows the ith datum-adaptive neighborhood structure in the original high dimensional space and the patch consists of xi , neighbor measurements of a same class (i.e., xi1 ; xi2 , and xi3 Þ, and neighbor measurements of different classes (i.e., xi1 and xi2 Þ. The expected results on the patch in the low dimensional space are shown as the right part of the figure. Local geometry structures between low-dimensional measurements yi1 ; yi2 ; yi3 , and yi are preserved, while low-dimensional measurements yi1 and yi2 are as far as possible from yi .

For each datum-adaptive neighborhood structure in the low-dimensional subspace, we expect that the local geometry structures between yi and the neighbor measurements of a same class are preserved by local spline embedding (LSE). So the tangent space in the neighborhood xi1 ; xi2 ; . . . ; xiki of each data point is firstly built in our method which can represent the local geometry of the intrinsic manifold structure. According to the notion of the compatible mapping in LSE, smooth splines are constructed to align those local tangent spaces to its own single low-dimensional global coordinates. That is to say, Eq. (9) should be met. So we have

372 373 374 375 376 377 378 379 380 381 382

383

T

arg min trðYMY Þ Y

ð14Þ

s:t: YY T ¼ I:

385

Meanwhile, we expect that distances between yi and the neighbor measurements of different classes are as far as possible. Let us define the between-class adjacency matrix H, whose elements are given below:

8 1; if xi is among datum  adaptive neighborhood > > > > > structureof xj whose class is different from x0i s; > < or xj is among datum  adaptive neighborhood Hij ¼ > > > structure of xi whose class is different from x0j s > > > : 0 otherwise:

386 387 388 389

390

ð15Þ

392

It is obvious that the between-class adjacency matrix H is a symmetric matrix. By virtue of the between-class adjacency matrix H, the cost function should be maximal as follows:

393

X 2 arg max yi  yj  Hij : Y

399

400

X 2 arg max yi  yj  Hij i;j

¼ arg max tr Y

Hij yi yTi

i;j

X X þ Hij yj yTj  2 Hij yi yTj i;j

T

396 398

i;j

X

395

ð16Þ

It follows from Eq. (16) that Y

394

T

¼ 2 arg max trðYDY  YHY Þ

i;j

! ð17Þ

Y

¼ 2 arg max trðYLY T Þ Y

/ arg max trðYLY T Þ; Y

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

402

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 5

Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx 403 404 405 406 407 408 409 410 411 412 413 414

415

where D is a diagonal matrix whose elements on diagonal are column (or row since H is a symmetric matrix) sum of H, i.e., P Dii ¼ nj¼1 Hij ; L ¼ D  H. In order to obtain the optimal linear discriminant embedding, we produce an explicitly linear mapping from X to Y, i.e.,

Table 1 The main procedure for the D-SLSE algorithm. Input: Training set X ¼ fxi gni¼1 Output: D  d feature matrix G extracted from X Step 1: For each data point xi , determine its datum-adaptive neighborhood structure by sparse representation. Step 2: Compute the ith local tangent space projection based on Eq. (2). Step 3: Compute matrix Ai based on Eq. (6). Step 4: Construct spline alignment matrix M by locally summing as follows: MðIi ; Ii Þ MðIi ; Ii Þ þ B i ¼ 1; 2; . . . ; n

Y ¼ V T X. This linear transformation matrix V is obtained by optimizing an objective function, which captures the discrepancy of the local geometries in the reduced space and strengthens the classification ability simultaneously. That is to say, this objective optimization satisfies Eq. (14) and Eq. (17) simultaneously. Then, the problem can be written as the following multi-object optimization problem:

with the initial M ¼ 0, where Ii ¼ fi1 ; . . . ; iki g denotes the set of indices for the xi ’s neighbor measurements of

8 T T < arg min trðV XMX VÞ

a same class and Bi is the upper left ki  ki subblock of A1 i .

Y

: arg max trðV XLX VÞ T

T

Step 7:

417

s:t: V T XX T V ¼ I:

418

We formulate this discriminator by using the linear manipulation as follows:

420

arg min trðV T XðM  LÞX T VÞ

ð19Þ

T

422

s:t: V XX V ¼ I:

423 424

To solve the above optimization problem, we use the Lagrangian multiplier:

427

o @ n T tr V XðM  LÞX T V  kðV T XX T V  IÞ ¼ 0: @V

425

428

429 431

432 433 434 435

436 438

439 440 441 442 443 444

445

447

ðXðM  LÞX T Þv ¼ kXX T v : Let the column vectors

ð20Þ

ð21Þ

v 1 ; v 2 ; . . . ; v d be the d smallest generalT

T

ized eigenvectors of XðM  LÞX and XX corresponding to the d smallest eigenvalues. The transformation matrix V which minimizes the objective function is obtained as follows:

V ¼ ½v 1 ; v 2 ; . . . ; v d :

ð22Þ

It is well known that the generalized eigenvectors obtained by solving Eq. (21) are nonorthogonal. We use the Gram-Schmidt orthogonalization to produce orthogonal basis vectors. Set g 1 ¼ v 1 , and assume that k  1 orthogonal basis vectors g 1 ; g 2 ; . . . ; g k1 have been worked out, thus g k can be computed as follows:

gk ¼ v k 

i g Ti g i i¼1

gi:

ð23Þ

449

Then G ¼ ½g 1 ; g 2 ; . . . ; g d  is the transformation matrix of D-SLSE. The main procedure for the D-SLSE algorithm is summarized in Table 1.

450

2.4. Time complexity analysis

451

In this section, we provide an analysis of the computational complexity of D-SLSE as a function of the number of samples n, the input dimension D, the intrinsic dimension d, and other related factors if necessary. In the tables, ki is the number of the xi ’s neighbor measurements of a same class, l indicates the number of polynomials in a spline function. The complexity of D-SLSE is dominated by four parts: graph construction based on sparse rep-

448

452 453 454 455 456 457 458 459 460 461 462 463

464

þOðDn2 þ D2 nÞ. It computes the matrix XLX T in OðDn2 þ D2 nÞ and

465

M

and

related

manipulation 2

Thus, we can get

k1 T X g vk

P 3 Oð ni¼1 ðki þ lÞ Þ

matrix

Y

T

Step 6:

ð18Þ

Y

419

Compute matrices XMX T and XLX T . Compute the d eigenvectors corresponding to the d smallest eigenvalues based on Eq. (21). Orthogonalize the d basis vectors based on Eq. (23) and obtain G ¼ ½g 1 ; g 2 ; . . . ; g d 

Step 5:

resentation, alignment matrix manipulation, matrix XLX T computation, and solving orthogonal eigenvectors. Here we omit the time complexity analysis of graph construction based on sparse representation, because there are a number of software packages to realize the algorithm of sparse learning and different package has different time complexity. D-SLSE performs the alignment

in 2

solves orthogonal basic vectors in OðdD þ d DÞ. The complexity of main steps in D-SLSE is listed as Table 2.

466

3. Experimental results

468

This section evaluates the performance of the proposed D-SLSE in comparison with five representative algorithms, i.e., MMC [31], LDA [4], SLPP [32], SLLTSA [33], and MFA [25] on two face image databases, i.e., Yale database1 and Olivetti Research Laboratory (ORL) database.2 Among these algorithms, SLPP, SLLTSA, and MFA are recently proposed manifold learning algorithms. Preprocessing was performed to crop all face images from the two databases. The size of each cropped image in all the experiments is 3232 pixels, with 256 gray levels per pixel. Thus, each image can be represented by a 1024-dimensional vector in image space. No further preprocessing is done. The k nearest neighborhood parameter for constructing the nearest neighbor graph in SLPP and SLLTSA can be chosen as k ¼ l  1, where l denotes the number of training samples per class. The justification for this choice is that the l samples of the same class should be located in the same local geometrical structure provided that within-class samples are well clustered in the observation space. For the MFA method, the important parameters include k1 (the number of the nearest in-class neighbors) and k2 (the number of the closest out-of-class sample pairs). We chose the best k1 between one and l  1. We similarly selected the best k2 between l and 8c, where c denotes the number of classes. Note that all algorithms involve a PCA phase. In this phase, we kept 100 percent image energy and selected all principal components corresponding to the non-zero eigenvalues for each method. Different pattern classifiers have been applied for face recognition, including KNN, Support Vector Machine, and Bayesian, etc. In this study, we adopt the 1-NN classifier for its simplicity.

469

3.1. Yale database

497

The Yale face database was constructed at the Yale Center for Computation Vision and Control. It contains 165 gray-scale images of 15 individuals (each person providing 11 different images). The

498

1 2

http://cvc.yale.edu/projects/yalefaces/yalefaces.html. http://www.cam-orl.co.uk/facedatabase.html.

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

467

470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496

499 500

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx

Table 2 The complexity of main steps in D-SLSE.

503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526

Constructing graph based on sparse representation

OðDn log nÞ

Calculating the matrix XMX T

P 3 Oð ni¼1 ðki þ lÞ Þ þ OðDn2 þ D2 nÞ

Calculating the matrix XLX T Constructing orthogonal eigenvectors

OðDn2 þ D2 nÞ 2

2

OðdD þ d DÞ

images demonstrate variations in lighting condition (left-light, center-light, right-light), facial expression (normal, happy, sad, sleepy, surprised, and wink), and with/ without glasses. Fig. 3 shows sample images of one person. Firstly, we test the impact of selecting different dimensions in the reduced subspace on the recognition rate. During the testing phases, the 1NN classifier was used. Note that, for LDA, there are at most c  1 nonzero generalized eigenvalues, and so an upper bound on the dimension of the reduced space is c  1. Fig. 4 illustrates the recognition rates versus the variation of subspace dimensions when 5 and 7 images per individual were randomly selected for training. In general, the performance of all these methods varies with the number of dimensions. At the beginning, the recognition rates improve with the increase of the dimensions. However, more dimensions will not lead to higher recognition rate after these methods attain the best results. Secondly, we randomly select the seven images as training sets and the rest four images as testing sets for each class. The training sets were used to learn the low-dimensional subspace with the projection matrix. The testing sets were used to report the final recognition accuracy. Fig. 5 shows the best mean recognition rates for 20 times. It can be found that our proposed method outperforms the other techniques. The recognition approaches the maximal average results at 77.83(±5.02)%, 82.25(±4.72)%, 82.92 (±4.71)%, 82.25(±5.52)%, 83.42(±4.03)%, and 86.83(±3.42)% for MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE, respectively.

0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 MMC

LDA

SLPP

SLLTSA

MFA

D-SLSE

Fig. 5. Performance comparison of best mean recognition rates using MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE.

Fig. 3. Sample images from one person in the Yale database.

0.9 0.8

0.8 0.7

0.7 0.6

0.6

Recognition rate

502

Complexity

Recognition rate

501

Doing

Thirdly, the experiments are conducted to examine the effect of the training number on the performance. For each method, five random subsets with three, four, five, six, seven images per individual were selected with labels for training. The rest of the database was used for testing. Such a trail was independently performed 20 times, and then the average recognition results were calculated. Table 3 shows the maximal average recognition accuracy, the corresponding standard deviations (std), the reduced dimensions, and training times for MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE. As can be seen, D-SLSE significantly outperforms the other algorithms among all the cases.

Standard deviation

6

0.5 0.4 0.3

MMC LDA SLPP SLLTSA MFA D-SLSE

0.2 0.1 0

5

10

15

20 Dims

(a)

25

30

35

0.5 0.4 MMC LDA SLPP SLLTSA MFA D-SLSE

0.3 0.2 0.1 40

0

5

10

15

20 Dims

25

30

35

40

(b)

Fig. 4. The recognition rates of MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE versus the dimensions on the Yale database. (a) 5 samples for training. (b) 7 samples for training.

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

527 528 529 530 531 532 533 534 535 536 537

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 7

Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx

Table 3 The maximal average recognition rates and the corresponding standard deviations (percent) with the reduced dimensions and training time (seconds) for MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE on the Yale database. Method

3 Train

MMC LDA SLPP SLLTSA MFA D-SLSE

61.08 ± 4.84 68.25 ± 4.21 67.92 ± 4.25 62.75 ± 4.30 64.04 ± 5.63 68.67 ± 5.21

4 Train (14, 0.257) (14, 0.058) (14, 0.151) (14, 0.089 ) (117, 4.450) (16, 1.368)

5 Train

67.43 ± 5.20 74.86 ± 5.51 75.14 ± 5.46 69.71 ± 5.24 71.48 ± 5.82 77.00 ± 4.25

(13, 0.265) (14, 0.063) (16, 0.180) (15, 0.105) (115, 6.348) (15, 2.094)

6 Train

72.11 ± 3.42 77.22 ± 3.50 77.22 ± 3.50 75.72 ± 4.62 77.06 ± 4.19 81.11 ± 4.05

(14, 0.276) (14, 0.069) (14, 0.204) (33, 0.139 ) (77, 9.798) (14, 2.809)

7 Train

75.40 ± 4.29 81.73 ± 5.05 81.60 ± 4.94 80.27 ± 4.69 80.80 ± 4.82 84.00 ± 3.84

(14, 0.309) (14, 0.075) (14, 0.217) (32, 0.148) (99, 11.090) (17, 3.453 )

77.83 ± 5.02 82.25 ± 4.72 82.92 ± 4.71 82.25 ± 5.52 83.42 ± 4.03 86.83 ± 3.42

(14, 0.341) (14, 0.080 ) (14, 0.222) (32, 0.172) (109, 13.679) (17, 3.855)

The first and second numbers in the parentheses are the selected subspace dimensions and training time, respectively.

1

1

0.9

0.9

0.8

0.8

0.7

0.7

Recognition rate

Recognition rate

Fig. 6. Sample images from one person in the ORL database.

0.6 0.5 0.4 MMC LDA SLPP SLLTSA MFA D-SLSE

0.3 0.2 0.1 0

5

10

15

20

25

30 35 Dims

40

45

50

55

0.6 0.5 0.4 MMC LDA SLPP SLLTSA MFA D-SLSE

0.3 0.2 0.1 60

0

5

10

15

(a)

20

25

30 35 Dims

40

45

50

55

60

(b)

Fig. 7. The recognition rates of MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE versus the dimensions on the ORL database. (a) 5 samples for training. (b) 7 samples for training.

0.99 0.98

Standard deviation

0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.9

MMC

LDA

SLPP

SLLTSA

MFA

D-SLSE

Fig. 8. Performance comparison of best mean recognition rates using MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE.

3.2. ORL database

538

The ORL face database contains 400 face images of 40 individuals (each one has ten images). The images were captured at different times and have different variations including expressions (open or closed eyes, smiling or nonsmiling) and facial details (glasses or no glasses). The images were taken with a tolerance for some tilting and rotation of the face up to 20 degrees. Ten sample images of one individual are displayed in Fig. 6. The experimental design is the same as before. We averaged the results over 20 random splits. The recognition rates versus the variation of dimensions with 5 and 7 images per individual randomly selected for training are illustrated in Fig. 7. From Fig. 7, we can see that the discrimination power of these methods will be enhanced with the increase of final projected dimensions, but they will not increase all the time. When the final dimensions are higher than some threshold, the final recognition rates will stand still. For each individual, seven images were randomly selected for training and the rest were used for testing. The best mean recognition rates are shown in Fig. 8. As can be seen, D-SLSE algorithm

539

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 8

Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx

Table 4 The maximal average recognition rates and the corresponding standard deviations (percent) with the reduced dimensions and training time (seconds) for MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE on the ORL database. Method

3 Train

MMC LDA SLPP SLLTSA MFA D-SLSE

83.45 ± 1.90 85.86 ± 1.68 85.84 ± 1.62 81.00 ± 2.13 81.38 ± 2.79 87.75 ± 1.58

4 Train (39, 0.332) (38, 0.091) (39, 0.912) (81, 0.177) (305, 13.241) (61, 4.273)

88.83 ± 2.15 90.33 ± 1.66 90.25 ± 1.63 86.02 ± 2.06 89.85 ± 2.26 93.17 ± 1.51

5 Train (38, 0.450) (39, 0.116) (39, 1.126) (83, 0.259) (319, 13.160) (57, 5.356)

92.47 ± 2.13 93.23 ± 1.91 93.28 ± 1.92 90.30 ± 1.64 93.47 ± 2.35 95.60 ± 1.44

(37, 0.578) (39, 0.152) (39, 1.227) (95, 0.361) (215, 13.302) (55, 5.943)

6 Train

7 Train

93.88 ± 2.34 (34, 0.743) 94.62 ± 1.96 (39, 0.194) 94.66 ± 1.91 (39, 1.257) 92.34 ± 2.6 (99, 0.491) 95.44 ± 1.69 (246, 13.431) 96.67 ± 1.51 (75, 6.330)

94.92 ± 1.77 95.71 ± 1.86 95.79 ± 1.90 94.21 ± 1.92 96.46 ± 1.97 97.21 ± 1.63

(36, 0.988) (39, 0.264) (39, 1.277) (111, 0.688) (242, 13.548) (128, 6.870)

The first and second numbers in the parentheses are the selected subspace dimensions and training time, respectively.

558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609

significantly performs the best, while SLLTSA performs poorly. Besides, LDA and SLPP almost achieve the same accuracy rate. We investigate the maximal average recognition accuracy at 36, 39, 39, 111, 51, and 128 dimensions for MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE, respectively. The best mean recognition rates of MMC, LDA, SLPP, SLLTSA, MFA, and D-SLSE are 94.92%, 95.71%, 95.79%, 94.21%, 96.46%, and 97.21%, and the standard deviations are 1.77%, 1.86%, 1.90%, 1.92%, 1.97%, and 1.63%, respectively. The corresponding face subspaces obtained by carrying out the methods mentioned above are called optimal face subspace for each method. Moreover, the effect of the training sample number is also tested in the following experiment. We randomly selected 3, 4, 5, 6, and 7 training samples and then the rest samples for test ones. We repeated these trails 20 times and computed the average results. The best result obtained in the optimal subspace and their corresponding standard deviations, dimensions, and training times for each method are shown in Table 4. It can be seen that our D-SLSE algorithm significantly performs the best among all the cases. MMC performs better than SLLTSA, and achieves comparable performance to LDA with the increase of the number of training samples. LDA and SLPP performed comparably to each other. The performance of MFA approaches that of our algorithm as the number of training samples is increased. Several experiments have been conducted on the two different face databases. The proposed D-SLSE consistently achieves the best recognition rate in all the experimental cases. From these results, we make several interesting observations: (1) The data sets used in this study are Yale and ORL face images. The images for each person vary from pose, illumination to expression. When applied to these complex nonlinear manifold data, the proposed method can extract intrinsic features efficiently. However, the classical linear methods fail to discover the nonlinear manifold structure hidden in the image space, which inevitably impairs their recognition accuracy. (2) SLPP, SLLTSA, and MFA use k-nearest-neighbor or e-ball based method to construct the neighborhood graph based on pair-wise Euclidean distance, which is very sensitive to data noise, while our proposed method utilizes sparse representation to construct the neighborhood graph which makes use of the overall contextual information and explicitly considers data noises. It means that D-SLSE is robust to data noise. (3) Compared to other manifold learning-based methods, D-SLSE encodes more discriminant information in the reduced feature subspace by incorporating the class information and implementing the Gram-Schmidt orthogonalization procedure. (4) MFA has comparative recognition rates with D-SLSE when training sample size is 6 or 7 in ORL database. This is because MFA can also effectively capture both the local geometry and

the discriminant information of data by setting k1 and k2 suitably. However, it is necessary to traverse all possible values of k1 and k2 for model selection. Therefore, the computational cost of the model selection procedure in MFA grows quickly as the size of the training set is increased.

610 611 612 613 614 615

4. Conclusions

616

In this paper, we have introduced a novel sparse subspace learning technique, called discriminant sparse local spline embedding (D-SLSE). The most prominent property for D-SLSE is the faithful preservation of the local geometrical structure hidden in the data and the improvement of the discriminant power based on the class information and orthogonalization procedure. We have applied our algorithm to face recognition. The experimental results on Yale and ORL databases show that the new method is indeed effective and efficient.

617

Acknowledgments

626

This work is supported by the grants of the National Science Foundation of China, Nos. 61272333 , 61273302 , 61171170 & 61473237, the Anhui Provincial Natural Science Foundation under Grant Nos. 1308085QF99 & 1408085MF129, and the Foundation of State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System of China under Grant No. CEMEE2014K0103B. The authors would like to thank all the guest editors and anonymous reviewers for their constructive advices.

627

References

636

[1] D.L. Donoho, High-dimensional data analysis: the curses and blessings of dimensionality, in: Proc. AMS Math. Challenges of the 21st Century, 2000. [2] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci. 3 (1) (1991) 71–86. [3] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, New York, 2001. [4] P.N. Belhumeour, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Trans. PAMI 19 (7) (1997) 711–720. [5] L.-F. Chen, H.-Y.M. Liao, M.-T. Ko, J.-C. Lin, G.-J. Yu, A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recogn. 33 (2000) 1713–1726. [6] P. Howland, J. Wang, H. Park, Solving the small sample size problem in face recognition using generalized discriminant analysis, Pattern Recogn. 39 (2006) 277–287. [7] J. Ye, R. Janardan, C.H. Park, H. Park, An optimization criterion for generalized discriminant analysis on undersampled problems, IEEE Trans. PAMI 26 (8) (2004) 982–994. [8] J. Ye, Q. Li, A two-stage linear discriminant analysis via QR-decomposition, IEEE Trans. PAMI 27 (6) (2005) 929–941. [9] H.S. Seung, D.D. Lee, The manifold ways of perception, Science 290 (2000) 2268–2269. [10] J. Tenenbaum, V. de Silva, J. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) 2319–2323. [11] S. Roweis, L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323–2326.

637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

618 619 620 621 622 623 624 625

628 629 630 631 632 633 634 635

KNOSYS 3191

No. of Pages 9, Model 5G

1 July 2015 Y.-K. Lei et al. / Knowledge-Based Systems xxx (2015) xxx–xxx 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687

[12] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (6) (2003) 1373–1396. [13] D. Donoho, C. Grimes, Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data, Proc. Nat’l. Acad. Sci. 100 (10) (2003) 5591–5596. [14] K. Weinberger, L. Saul, Unsupervised learning of image manifolds by semidefinite programming, Proc. of CVPR 2 (2004) 988–995. [15] M. Brand, Charting a manifold, in: Proc. of NIPS, 15, 2003, pp. 961–968. [16] Z. Zhang, H. Zha, Principal manifolds and nonlinear dimension reduction via local tangent space alignment, SIAM J. Sci. Comput. 26 (1) (2005) 313–338. [17] T. Lin, H. Zha, Riemannian manifold learning, IEEE Trans. PAMI 30 (5) (2008) 796–809. [18] S.M. Xiang, F.P. Nie, C.S. Zhang, Nonlinear dimensionality reduction with local spline embedding, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1285–1298. [19] M.A. Carreira-Perpinan, The elastic embedding algorithm for dimensionality reduction, in: Proc. of ICML2010, 2010, pp. 167–174. [20] D.J. Luo, C. Ding, F.P. Nie, H. Huang, Cauchy graph embedding, in: Proc. of ICML2011, 2011, pp. 553–560. [21] Z.Y. Zhang, J. Wang, H.Y. Zha, Adaptive manifold learning, IEEE Trans. PAMI 34 (2) (2012) 253–265. [22] H. Qiao, P. Zhang, D. Wang, B. Zhang, An explicit nonlinear mapping for manifold learning, IEEE Trans. Cybernet. 43 (1) (2013) 51–63. [23] Y. Bengio, J.-F. Paiement, P. Vincent, O. Delalleau, N. Le Roux, M. Ouimet, Outof-sample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering, in: Proc. of NIPS 16, 2003, pp. 177–184.

9

[24] X. He, S. Yan, Y. Hu, P. Niyogi, H.J. Zhang, Face recognition using Laplacianfaces, IEEE Trans. PAMI 27 (3) (2005) 328–340. [25] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, S. Lin, Graph embedding and extension: a general framework for dimensionality reduction, IEEE Trans. PAMI 29 (1) (2007) 40–51. [26] H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis, J. Comput. Graphical Stat. 15 (2006) 265–286. [27] R. Zass, A. Shashua, Nonnegative sparse PCA, Adv. Neural Inform. Process. Syst. 19 (2007) 1561. [28] B. Cheng, J.C. Yang, S.C. Yan, Y. Fu, T.S. Huang, Learning with l(1)-graph for image analysis, IEEE Trans. Image Process. 19 (2010) 858–866. [29] L.S. Qiao, S.C. Chen, X.Y. Tan, Sparsity preserving projections with applications to face recognition, Pattern Recogn. 43 (2010) 331–341. [30] I. Drori, D. Donoho, Solution of L1 minimization problems by LARS/Homotopy methods, in: 31th International Conference on Acoustics, Speech and Signal Processing, 2006, pp. 636–639. [31] H. Li, T. Jiang, K. Zhang, Efficient and robust feature extraction by maximum margin criterion, IEEE Trans. Neural Netw. 17 (1) (2006) 157–165. [32] D. Cai, X. He, J. Han, Using graph model for face analysis, Technical Report No. 2636, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign, 2005. [33] T.H. Zhang, J. Yang, D.L. Zhao, X.L. Ge, Linear local tangent space alignment and application to face recognition, Neurocomputing 70 (2007) 1547–1553.

Please cite this article in press as: Y.-K. Lei et al., Discriminant sparse local spline embedding with application to face recognition, Knowl. Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015.06.016

688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710