G Model
ARTICLE IN PRESS
ASOC 3282 1–15
Applied Soft Computing xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition
1
2
Q1
3
Manoj Kumar Naik a , Rutuparna Panda b,∗ a
4
b
5
Q2
School of Electronics, Institute of Technical Education and Research, SOA University, Bhubaneswar 751030, India Department of Electronics and Telecommunication Engineering, Veer Surendra Sai University of Technology, Burla 768018, India
6
a r t i c l e
7 21
i n f o
a b s t r a c t
8
Article history: Received 10 April 2015 Received in revised form 8 September 2015 Accepted 17 October 2015 Available online xxx
9 10 11 12 13 14
20
Keywords: Evolutionary algorithm Cuckoo search Principal component analysis Intrinsic discriminant analysis method Face recognition
22Q4
1. Introduction
15 16 17 18 19
This paper presents a novel adaptive cuckoo search (ACS) algorithm for optimization. The step size is made adaptive from the knowledge of its fitness function value and its current position in the search space. The other important feature of the ACS algorithm is its speed, which is faster than the CS algorithm. Here, an attempt is made to make the cuckoo search (CS) algorithm parameter free, without a Levy step. The proposed algorithm is validated using twenty three standard benchmark test functions. The second part of the paper proposes an efficient face recognition algorithm using ACS, principal component analysis (PCA) and intrinsic discriminant analysis (IDA). The proposed algorithms are named as PCA + IDA and ACS–IDA. Interestingly, PCA + IDA offers us a perturbation free algorithm for dimension reduction while ACS + IDA is used to find the optimal feature vectors for classification of the face images based on the IDA. For the performance analysis, we use three standard face databases—YALE, ORL, and FERET. A comparison of the proposed method with the state-of-the-art methods reveals the effectiveness of our algorithm. © 2015 Published by Elsevier B.V.
Q5
Over the years, a quite significant amount of research work on machine recognition of faces is found in the literature. Machine recognition of faces is important for person identification. Pattern recognition problems have different constraints in terms of complexity of implementations. Hence, we face difficult yet interesting technical challenges. Researchers in pattern recognition, image processing, neural sciences, and computer vision have raised up a number of stimulating issues linked to the face recognition by machines. Likewise, different methods facing the face recognition issues are studied, resulting in a large body of research. Existing methods for face recognition have been verified on different sets of images of varying modalities. Face recognition (FR) deals with many interesting steps. However, three important steps in face recognition are extraction/normalization of faces, feature selection and recognition. Existing methods for developing face recognition systems are based on different analysis. Out of all these FR techniques, appearancebased approaches, which relied on statistical exploration, have been found fit well for face recognition. The appearance-based
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Q3
∗ Corresponding author. Tel.: +91 6632431857; fax: +91 6632430204. E-mail addresses:
[email protected] (M.K. Naik), r
[email protected] (R. Panda).
methods are applied directly to images and process the image as two-dimensional patterns. In this method, a two dimensional face image of size m × n pixels is represented by a vector in a mn—dimensional space called as sample space. It is important to note here that sample space has a very large dimension, thereby demanding ample time for the pattern classification and sinking the accurateness of the prediction of different classes. Therefore, there is a strong need for dimensionality reduction before pattern recognition. The most common linear tools used are principal component analysis (PCA) [1,2], linear discriminant analysis (LDA) [2–5], the well known locality preserving projections (LPP) [6] etc. Some of the nonlinear tools used are the isometric mapping [7], the locally linear embedding (LLE) [8,9], and the Laplacian eigenmaps (LEM) [10,11]. Note that the record of the nonlinear methods used needs rigorous computation work to tune different parameters in order to achieve the best output [12–14] and lack in their ability to handle testing data [15]. Hence, the linear dimension reduction tools are extensively deployed in face recognition. It is interesting to mention here that PCA and LDA are the two popular linear techniques used for reduction of the dimension in face recognition. In general, PCA employs the resemblance features, among the different classes of samples. It projects the data (high dimension) onto a set of basis functions (which are mutually orthogonal to each other) to get a compact demonstration of the original data. Thus, a reduction in dimensionality is achieved. In a nutshell, PCA reduces the dimension of the problem. PCA based features are
http://dx.doi.org/10.1016/j.asoc.2015.10.039 1568-4946/© 2015 Published by Elsevier B.V.
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
G Model
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
ASOC 3282 1–15
ARTICLE IN PRESS
2
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
coined as eigenfaces [1] and are typically used for the development of a face recognition system. On the other hand, LDA uses the discriminatory features, among the different classes of samples. LDA is basically a linear supervised dimensionality reduction process. It maps the data points of different classes much apart from each other. But the data points of the same class are mapped very close to each other. Note that LDA based features are coined as Fisherfaces [16] and are used to develop the face recognition system. Another interesting idea of Intrinsicfaces based face recognition algorithm is reported by Wang and Yu [17]. They proposed a firsthand intrinsic face model, where an individual face model is separated into 3 constituents. These are—(i) the commonness difference; (ii) the individuality difference; (iii) the intrapersonal difference. By employing the intrinsic face model, intrinsic discriminant analysis (IDA) method is used to maximize the individuality difference. At the same time, the intrapersonal difference is minimized. Note that IDA based features are coined as Intrinsicfaces [17] and also used for the development of the face recognition system. The authors have handled the singularity problem through the use of a perturbation. However, the complexity of the algorithm increases by introducing perturbation. Further, the selection process for selecting optimal eigenvectors is not accurate. This has motivated us to investigate a novel perturbation free IDA based algorithm for face recognition. The idea is to use PCA + IDA, thereby increasing the performance accuracy. In the classical face recognition technique, the general practice is to select the top M principal components for the dimension reduction. Because, the top M principal components contain the most information of a face image, as they are related to the respective top M eigenvalues. The rest N–M lower order principal components are discarded. But, there is no guarantee (no concrete evidence) that the top M principal components provide the optimum performance in face recognition. There may be some important information, related to pattern classification, present in the lower order principal components. Some lower order principal components also play crucial role in pattern classification. Hence, it is important to select the M optimal principal components instead of the top M principal components. This leads to a search process that may be employed to find the M optimal principal components out of all N principal components obtained by the principal component analysis (PCA) [18]. The face recognition approach generally uses a minimum distance classifier for classification of facial image. Therefore there is a strong need to use optimal principal components for a better classification. Such a problem can be coined as an optimization problem; we need to explore the search space to get a candidate solution for a given problem. The optimization processes are used to find the optimal principal components that can be efficiently used for classification purposes. In a nutshell, the motivation behind this proposal is to integrate PCA and IDA with a new modified evolutionary technique in order to get improved results in face recognition. As per the above discussions, IDA is basically a perturbation based algorithm which requires more computation time. To make it perturbation free, we need to integrate PCA for avoiding the singularity problem. Further, there is a strong need to deploy an efficient evolutionary technique (optimization algorithm) to find the optimal principal components out of all principal components obtained by the principal component analysis (PCA). In this context, we are motivated to investigate a new optimization technique called the adaptive cuckoo search algorithm which is faster and useful for finding the optimal principal components. This clarifies the reason why the adaptive cuckoo search algorithm is integrated with PCA and IDA to form a face recognition algorithm. In the recent past, the evolutionary algorithms (EAs) have demonstrated the ability to handle the problem related to computer vision and pattern classification. The genetic algorithm (GA)
has been successfully implemented for eigenface based face recognition in the evolutionary pursuits (EP) [19] to find the optimal basis vectors. The EP has a better recognition rate as compared to PCA [18] and the Fisher’s Linear Discriminant (FLD) [2]. Further, the authors in [20] used GA and FLD to develop GA-Fisher algorithm for the face recognition algorithm. The mutation in GA only gives phenotypically changes, but never goes for a physical dispersal. The offspring of the GA never finishes at the matching location as their parents. So, this affects the search procedure. To overcome all these problems, the bacterial foraging optimization (BFO) technique was proposed in [21,22]. The BFO algorithm is based on the mathematical modeling of foraging behavior of E. coli bacteria living in human intestine. This reveals us to propose BFO-Fisher algorithm [23] based face recognition using BFO algorithm and FLD. In this paper [23], BFO-Fisher is also compared with the GA-Fisher [20], and shown that BFO has better ability than GA for sourcing of principal components for dimension reduction. The disadvantage in the BFO algorithm is that after reproduction, one bacterium gives rise to two offspring bacteria, and they start searching for nutrients from the same location. To overcome this problem, we proposed a novel crossover BFO algorithm [24] using the crossover mechanism of GA and foraging behavior of BFO. In this method [24], 50% of offspring bacteria are randomly placed in the search space. But for both BFO [21] or CBFO [24] algorithm, the step size is fixed. So the selection of the best step size for a given problem is a tedious job, which can be solved manually via experimentations and observations. An attempt was made by the authors in [25] to provide an analysis for the adaptive chemotaxis in BFO. However, the initial value of the step size was decided by several experiments. As the BFO algorithm has many parameters to initialize beforehand, it is computationally expensive, while considering the time required converging to the global minima or near global minima. In the recent past, cuckoo search (CS) [26,27] algorithm has come around, which has less parameters compared to BFO [21,22]. This reveals that CS is faster than the BFO. This warrants us to investigate the adaptive CS algorithm, which can be applied in face recognition for finding the optimal principal components for dimension reduction. In the CS algorithm, one has to define the step size, which is, generally, taken from the Levy distribution. So, here we proposed a novel adaptive cuckoo search (ACS) algorithm, which is free from the Levy step size and it adaptively decides the step size with zero parameter initialization. Further, we also introduce an efficient face recognition algorithm called ACS–IDA. To back up our statements, the ACS algorithm is compared to CS [26,27] algorithm using the standard benchmark functions. For the evaluation of our newly proposed ACS–IDA algorithm, experiments are presented using three different well known Yale, ORL, and FERET face databases. It is noteworthy to mention here that some interesting papers related to the face classification/recognition are also proposed recently. The idea of diversive curiosity to improvise learning in Backpropagation algorithm for improved pattern classification is proposed in [28]. Dai at al. [29] proposed a two-phased and Ensemble scheme integrated Backpropagation algorithm for avoiding the local minima problem of the standard Backpropagation algorithm and successfully used it for pattern classification. Several EAs proposed recently have been successfully applied in different optimization engineering. In [30,31], several EAs such as Cuckoo Optimization Algorithm (COA) have been employed to achieve optimal performance for synchronization of bilateral teleoperation systems in presence time delay and modeling uncertainties. Replication and comparison of computational experiments in applied evolutionary computing were discussed in [32]. For performing statistical tests, the readers can refer to [33]. They have explained how to use nonparametric statistical tests as a methodology for
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
G Model
ARTICLE IN PRESS
ASOC 3282 1–15
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
comparing evolutionary and swarm intelligence algorithms. The authors in [34] proposed PSO with adaptive mutation and inertia weight and its application in parameter estimation of dynamic systems. Identification of nonlinear systems using modified particle swarm optimization is presented in [35]. The authors in [36] discussed intelligent identification and control using improved fuzzy particle swarm optimization. An adaptive gradient descent-based local search in memetic algorithm for solving engineering optimization problems is proposed in [37]. Adaptive fuzzy sliding mode control for synchronization of uncertain non-identical chaotic systems using bacterial foraging optimization was presented in [38]. Authors in [39] discussed Swarm-based structure-specified controller design for bilateral transparent teleoperation systems via synthesis. In the literature, it is seen that the optimization algorithms (evolutionary algorithms) proposed earlier use many parameters. There is no clear indication how to choose different initial values for different parameters. This has motivated us to investigate a parameter free algorithm. In the CS algorithm, one has to define the step size, which is, generally, taken from the Levy distribution. But, here we propose a novel adaptive cuckoo search (ACS) algorithm, which is free from the Levy step size and it adaptively decides the step size with zero parameter initialization. To be precise, our proposed ACS algorithm is a parameter free algorithm. This may be useful for face recognition. The outline of this paper is as follows. Section 2 provides the basics of the related work like PCA, IDA and CS algorithm. In Section 3, we propose a new adaptive cuckoo search algorithm known as ACS, and the algorithm is validated with standard benchmark functions. In Section 4, we propose a method called ACS–IDA for dimension reduction of IDA using PCA with the help of ACS. Section 5 presents the experimental results and discussions. Finally, conclusions are given in Section 6.
2.1.1. Principal component analysis (PCA) The PCA catches an optimum projection axes from a blend scatter of anticipated samples. Therefore the total scatter matrix is represented as Ni k
SM =
2. Related work
235
2.1. Basics on PCA, and IDA
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
k
N , and each image is denoted by a m × n class ci and N = i=1 i matrix. Then, each image is modeled as D-dimensional (D = m × n) vector formed via lexicographic ordering of a 2-D pixel array. Thus, 1 , . . ., xk , . . ., xk in D , where we are given a set of points x11 , . . ., xN N 1 1
k
xji is the jth sample in the ith class. Now we assemble all these points into a single data matrix X of size D × N with the first N1 columns that are poised of the samples from the 1st class followed by the next class N2 . From this, we can denote the assembled i matrix of ith class as X i = x1i , x2i , . . ., xN
i
∈ D×Ni , and then X
can be represented as X = X 1 , X 2 , . . ., X k . The linear dimension reduction method finds a transform matrix G of size D × q such that D-dimensional data xji are mapped onto a lower q-dimensional space, so that
yji
=
GT xji .
(1)
k N
i xi is the mean face. Note that the where = 1/N j=1 j i=1 objective function of PCA for this purpose is given by
GPCA = argmax GT SM G .
(2)
G
Thus, the optimum projection axes of PCA, g1 , g2 , . . ., gq are basically orthogonal eigenvectors of SM connected with the 1st q largest eigenvalues 1 , 2 , . . ., q . Interestingly, these eigenvectors are face-like. Face images can be described in terms of these 1st q eigenvectors. Face recognition features based on PCA are known as eigenfaces [1]. 2.1.2. Intrinsic discriminant analysis (IDA) A face model consisting of 3 important components to describe a person is proposed by Wang and Yu [17]. Face image differences are classified by IDA into 3 constituents—(a) common facial difference analogous to the common nature of all facial images; (b) individuality difference that differentiates one class from another; (c) intrapersonal difference equivalent to different expressions of a person, illumination etc. Assume that the matrix: HM =
x11
− , x21
k − , . . ., xN k
D×N
− ∈
261 262 263
264
(3)
where Ur ∈
D×r
266 267
268 269 270 271 272 273
274 275 276 277 278 279 280 281 282 283
˜r ∈ and U
284 285 286
D×(D−r) ,
where Vr ∈ N×r and V˜ r ∈ N×(N−r) ,
265
The singular value decomposition (SVD) of HM is HM = U VT , where U and V are orthogonal to each other. Partitioning U, V, and as
(4)
287
where r = rank (HM ), one can get a common factor of facial images
288
belonging to the ith class. Let us consider the fol-
289
Face recognition problem deals with high dimensionality of the image space under consideration. So we need to find a method that can reduce the dimension with less computational burden. In this connection, the oldest method used is principal component analysis (PCA) [18], which projects input image with lower dimensional feature vector coined as eigenfaces [1]. In the intrinsic discriminant analysis (IDA), Wang and Wu [17] attempt to maximize the individuality difference and minimize the intrapersonal difference for classification of facial images. Let us assume that there is an N image sample of k different classes c1 , c2 , . . ., ck (each class has Ni numbers of samples in
T
xji −
260
i=1 j=1
V = Vr , V˜ r ,
236
xji −
˜r , U = Ur , U
234
3
=
˙1
0
0
0
,
i x1i , x2i , . . ., xN i
where ˙1 ∈
lowing matrix:
r×r
i
i i HM = x1i − i , x2i − i , . . ., xN − i
is nonsingular.
290
∈ D×Ni
(5)
i = U i ˙ i V iT is the SVD of H i . So Ui can be partiAssume that HM M tioned into 2 different clusters given as
˜ i i , where U i i ∈ D×r i , and U ˜ i i ∈ D× Ui = Uii , U r
r
r
r
D−r i
(6)
i . where r i = rank HM In IDA, an appropriate extent is to maximize the difference between the separate classes of images while minimizing the difference within the similar classes of images. To be precise, IDA tries to maximize ratio of the individuality difference scatter matrix (SC ) and the intrapersonal difference scatter matrix (SI ). The objective function of IDA is given as:
GT SC G GIDA = argmax G GT SI G
(7)
Note that in Eq. (7), SC = XC XCT and SI = XI XIT
291
292 293
294
295 296 297 298 299 300 301
302
303
(8)
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
304
G Model
305
306
ASOC 3282 1–15
ARTICLE IN PRESS
4
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
where XC =
Ur UrT
307
− Ur11 Ur1T 1
x11 , . . .,
Ur UrT
− U kk U kT r rk
1 1 1T 1 k kT k XI = Ur11 Ur1T 1 x1 , . . ., Ur 1 Ur 1 xN , . . ., U k U k xN r
1
r
k xN k
k
,
(9) (10)
311
The optimal transformation vector g1 , .. ., gq is taken from the largest q number of eigenvalues such that 1 ≥ 2 ≥ · · ·q . This can be achieved by solving the well-known eigenvalue problem SC g = SI g.
312
2.2. Cuckoo search (CS) algorithm
308 309 310
329
330
Xi (t + 1) = Xi (t) + ˛ ⊕ Levy ()
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328
(11)
337
where ˛ is the step size related to the problem search space, ˛ > 0 usually taken as 1, ⊕ is the entry wise multiplication, and Levy () is the random walk through the Levy flight. The Levy flight [41] takes the random walk from the Levy distribution, which can be approximated as Levy ∼ u = t− , and 1 < < 3. The most popular algorithm to get the Levy step is Mantegna’s algorithm [42]. The Levy step using Mantegna’s algorithm can be calculated as:
338
Levy =
331 332 333 334 335 336
339 340 341 342 343 344 345 346
u
(12)
1/(−1)
|v|
The parameters like number of the Cuckoo, abandon probability, and maximum generation of cuckoo within the lifetime are initialized by the user, but the first time initialization of parameters can be achieved byLet t represents the current generation, tMAX represents the predetermined maximum lifetime iteration, the Cuckoo may go within lifetime. So, here the d-dimension problem for the Cuckoo at initial generation (t = 1) can be generated as
347
348 349 350 351 352
353 354 355 356 357 358
359
Then we get the best solution among all the host nests.
The cuckoo search (CS) algorithm is a nature inspired optimization approach, which is introduced in [26]. The CS was developed by considering the social thinking ability of Cuckoo birds. Generally, the Cuckoo bird leaves eggs in the nest of another horde bird. The horde bird may be recognizing the eggs of the Cuckoo bird with a probability pa ∈ [0, 1]. Then, the horde bird through the eggs from the nest or abandons the next to form a new nest [40]. Here each egg in the host nest represents a solution. To form the mathematical model of CS algorithm, three assumptions are taken. The first assumption is the available host nests are fixed. Then the second assumption is each Cuckoo lays one egg at a time at the randomly chosen host nest. The best nests with utmost quality eggs will carry over to subsequent generations. Let Xi (t) is the current search space of Cuckoo i (for i = 1, 2, . . ., N) at time t, represented as Xi = xi1 , xi2 , . . ., xid in the d-dimension problem. Then, the new solution Xi (t + 1) for the next generation of time t + 1 can be represented as:
313
do{ A. Take the Levy flight for ith cuckoo, and evaluate the fitness fi for Xi . B. Then choose a nest ‘j’ randomly from all the nests, evaluate the fitness fj for Xj . C. if (fi > fj ), Then replace Xi with Xj for minimization problem, or replace Xj with Xi for maximization problem. End D. Then the worst nests are abandoned with a probability (pa) and the new one is built. E. t = t + 1. } while (t < (tMAX + 1)) or stop the criterion reached.
d
xid (t = 1) = rand × Upperd − Lower
+ Lowerd .
where Lowerd and Upperd are called the lower and upper boundary of the search space limit of the dth attributes, respectively. This provides the control over the optimization algorithm, must stay within the specified boundaries. Where u and v are taken from a normal distribution. 2.2.1. The pseudo code for the CS algorithm is given as First, identify the range and dimension (d) of the search problem, with the objective function f (X). Then, define some parameters for the algorithm such as: N, pa , ˛, , tMAX , and t= 1. Randomly initialize the search space Xi (t = 1) = xi1 , xi2 , . . ., xid for i host birds with d dimension for i (for i = 1, 2, . . ., N).
360
3. The proposed adaptive cuckoo search (ACS) algorithm
361
The Cuckoo search (CS) algorithm is a heuristic search algorithm, which normally uses the Levy step to explore the search space. The Levy step is taken from the Levy distribution given by either Mantegna algorithm or McCulloch’s algorithm. In [43], the authors suggested that the Levy distribution using McCulloch’s algorithm is more efficient than the Mantegna’s algorithm. Usually, the Cuckoo search algorithm follows the Levy distribution. Here, an effort is made by us to make the Cuckoo search, adaptive; without using the Levy distribution. The standard Cuckoo search algorithm does not have any mechanism to control the step size in the iteration process, which can guide the algorithm to reach global minima or maxima. Here, we try to incorporate a step size which is proportional to the fitness of the individual nest in the search space in the current generation. However, in the literature, the ˛ has been taken as a fixed parameter. But, here we ignore the parameter. Here, in the adaptive Cuckoo search algorithm, the step size is modeled as: stepi (t + 1) =
1 ((bestf (t)−fi (t))/(bestf (t)−worstf (t)))
(13)
t
where t is the Generation of the Cuckoo search; fi (t) is the Fitness value of ith nest in the generation t; bestf (t) is the best fitness value in the generation t; worstf (t) is the worst fitness value in the generation t. The step size is initially high, but when the number of the generation increases, the step size decreases. That indicates when the algorithm reaches to the global optimal solution, the step size is very small. From Eq. (13), it is clear that the step size is adaptive in nature and decides its value from the fitness value. Hence, the adaptive Cuckoo search algorithm (ACS) is modeled as: Xi (t + 1) = Xi (t) + randn × stepi (t + 1) .
(14)
Eq. (14) gives the new search space for the adaptive Cuckoo search (ACS) algorithm; from the current solution. Another advantage of the ACS is that it does not require any initial parameter to be defined. It is faster than the Cuckoo search algorithm. Further, if we consider the step size to be proportional to the global best solution, then Eq. (14) is modeled as:
Xi (t + 1) = Xi (t) + randn × stepi (t + 1) × Xi (t) − Xgbest
(15)
where Xgbest is the global best solution among all Xi for i (for i = 1, 2, . . ., N) at time t. Here, we have investigated Eqs. (13)–(15), which are useful for optimization with a faster rate.
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378
379
380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400
3.1. Pseudo code for ACS
401
1Randomly initialize the N number of host nests Xi (t = 1) = d 2
402
xi , xi , . . ., xi for i = 1, 2, . . ., N; d dimension problem; and define the objective function f(X). Initially take t = 1 and evaluate the objective function of the host nests f (Xi ) for i = 1,2,. . .,N for the first time.
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
403 404 405 406
G Model
ARTICLE IN PRESS
ASOC 3282 1–15
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
5
Table 1 Description of the unimodal benchmark test functions used with d = 30. Function
Mathematical representation
Range of search
f1 (X) =
(−100, 100)
d 2 xi i=1 d d |xi | + |xi | f2 (X) = i=1 i=1 2
Sphere Model Schwefel’s Problem 2.22
f3 (X) =
Schwefel’s Problem 1.2
d
i
(−10, 10)
xj
|xi | , 1 ≤ i ≤ d
2 d−1 2 i=1
Schwefel’s Problem 2.21
f4 (X) = maxi
Generalized Rosenbrock’s Function
f5 (X) =
Step Function
f6 (X) =
Quartic Function i.e. Noise
f7 (X) =
j=1
100 xi+1 − xi
2 d xi + 0.5 i=1 d 4 i=1
i=1
+ (xi − 1)2
=0 f1 0 =0 f2 0
d
(−100, 100)
d
=0 f3 0
(−100, 100)
d
=0 f4 0
(−30, 30)
=0 f5 0
d
(−100, 100)
ixi + random [0, 1)
Theoretical optima
d
=0 f6 0
d
(−1.28, 1.28)
=0 f7 0
d
Table 2 Description of multimodal benchmark test functions used with d = 30. Function
Mathematical representation
Generalized Schwefel’s Problem 2.26
f8 (X) =
Generalized Rastrigin’s Function
f9 (X) =
Ackley’s Function
Generalized Griewank Function Generalized Penalized Function 1
d i=1
− xi sin
Range of search
|xi |
(−500, 500)
d
xi2 − 10 cos (2xi ) + 10
i=1
f10 (X) = −20 exp
d
f11 (X) = 1/4000 f12 (X) =
d
−0.2
i=1
xi2 −
10 sin (yi ) +
d i=1
i=1
i=1
Generalized Penalized Function 2
2
sin (3x1 ) +
2 (xd − 1) ·
2
d
k(x
m i − a) , 0, k(−xi − a)m ,
2 (xi − 1) ·
409
410 411 412 413 414 415 416 417
−12569.5
=0 f9 0
=0 f10 0
d
d
=0 f11 0
where,
(−50, 50)
d
f12
(−50, 50)
d
f13
−→ 1
=0
+
m i − a) , 0, k(−xi − a)m ,
i=1
xi > a −a < xi < a xi < −a
2
1 + sin (3xi + 1) +
−→ 1
=0
u (xi , 5, 100, 4)
xi > a −a < xi < a xi < −a
Do { A. Find the bestf and worstf of the current generation among the host nests. B. Calculate the step size using Eq. (13). C. Then calculate the new position of Cuckoo nests using Eq. (15). D. Evaluate the objective function of the host nests f (Xi ) for i = 1,2,. . .,N. E. Then chose a nest among N (say j) randomly. If (fi > fj ) Then replace Xi with Xj for minimization problem, or replace Xj with Xi for maximization problem. End F. Then the worst nests are abandoned with a probability (pa) and new one is built. G. t = t + 1. } while (t ⇐ tMAX ) or End criterion not satisfied. 407 408
(−32, 32)
(−600, 600)
k(x
d
i=1
1 + sin (2xd )
where, u (xi , a, k, m) =
cos (2xi )
u (xi , 10, 100, 4)
yi = 1 + (xi + 1) /4 , and u (xi , a, k, m) = f13 (X) = 0.1
i=1
2
i=1 d
d
d
1/n
2 2 (yi − 1) [1 + 10 sin (yi+1 ) + (yd − 1)
− exp
xi2
+20 + e
√ cos xi / i + 1
d +
d
1/n
−−−−−−−→ f8 420.9687 =
d
(−5.12, 5.12)
Theoretical optima
Then report the best solution by ranking them.
3.2. Performance evaluation of the ACS algorithm For the performance evaluation of our proposed algorithm, here we consider 23 benchmark test functions [44] described in Tables 1–3. Table 1 consists of 7 numbers of unimodal benchmark test functions, in which the convergence rate is important to validate the optimization algorithm. In Table 2, next 6 numbers of multimodal benchmark test functions with many local minima are shown, so it is very difficult to optimize these functions. Finally, in Table 3, the rest 10 number multimodal benchmark test functions
with fixed dimension is presented. Some detail description of the benchmark test functions of Table 3 is given in Appendix A. The parameters used for evaluation of the performance of ACS and CS algorithm are given in Table 4. From Table 4, we can claim that the ACS algorithm is parameter free when compared to the CS algorithm. For the performance evaluation, we have taken 100,000 function evaluations for the benchmark test functions described in Tables 1 and 2, and 10,000 function evaluations for benchmark test functions given in Table 3. Each algorithm is run for 50 times each. For the comparison of the performance, we have reported the best, mean, standard deviation, and average time to run 100,000/10,000 function evaluations of the 23 standard benchmark test functions reported in Tables 1–3. The word ‘Best’ implies the minimum (minimization problem)/maximum (maximization problem) value within the 50 independent run of the algorithm. The word ‘Mean’ implies the average value of the best result and word ‘Std.’ implies the standard deviation among the 50 independent run of the algorithm. Finally, the word ‘Atime’ represents the average time to evaluate either 100,000 (for f1 − f13 )/10,000 (for f14 − f23 ) function evaluation among the 50 independent run of the algorithm. For the benchmark test functions given in Tables 1–3, the lower the value of ‘Best’, ‘Mean’, ‘Std.’ and ‘Atime’, better is the algorithm. Table 5 and Fig. 1, reports the performance of the unimodal benchmark test functions presented in Table 1. Table 6 and Fig. 2 present the performance of the multimodal benchmark
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442
G Model ASOC 3282 1–15
ARTICLE IN PRESS
6
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
Table 3 Description of multimodal benchmark test functions with fixed dimension. Function
Mathematical representation
Shekel’s Foxholes Function
f 14 (X) =
Kowalik’s Function
f15 (X) =
25 2
1/500 +
11 i=1
Range of search
j=1
ai − x1
j+
1/
i=1
xi − aij
6
2 + bi x2 / b2i + bi x3 + x4
6 4 2 4
f16 (X) = 4x12 − 2.1x1 + 1/3 x1 + x1 x2 − 4x2 + 4x2
Branin Functio
f17 (X) = x2 − 5.1/42 x12 + 5/ x1 − 6 +
Goldstein-Price Function
f 18 (X) = [1 + (x1 + x2 + 1) (19 − 14x1 + 3x12 − 14x2 +
10 1 − 1/8
6x1 x2 )] × [(2x1 − 3x2 ) × (18 − 32x1 + 36x1 x2 + 27x22 ) + 30] f 19 (X) = −
i=1
4
F 20 (X) = −
ci exp
i=1
ci exp
12x12
i=1
2
j=1
6
2
−
−
aij xj − pij
j=1
aij xj − pij
(X − ai ) (X − ai ) + ci
F1
5
10
f15 (0.1928, 0.1908, 0.1231, 0.1358) ≈ 0.0003075
(−5, 5)
2
f16 (0.08983, −0.7126) or f16 (−0.08983, 0.7126) = −1.0316285
2
(0, 1)
3
f19 (0.114, 0.556, 0.852) = −3.86
(0, 1)
6
f20 (0.201, 0.15, 0.477, 0.275, 0.311, 0.657) = −3.32
−→ f21 X = −10.1532
(0, 10)
4
(0, 10)
4
f22
(0, 10)
4
f23
F4
2
10
−→ X = −10.4028
−→ X
= −10.5363
F7
3
10
ACS CS
0
f17 (−3.142, 12.275) or f17 (3.142, 2.275) or f17 (9.425, 2.425) = 0.398 f18 (0, −1) = 3
+ 48x2 −
3
−1 5 T (X − ai ) (X − ai ) + ci −1 i=1 7 T F 22 (X) = − (X − ai ) (X − ai ) + ci i=1 −1 10 T F 23 (X) = −
4
(−2, 2)
F21 (X) = −
Shekel’s Family
(−5, 5)
−5 ≤ x1 ≤ 10, 0 ≤ x2 ≤ 15
2
Hartman’s Family
f14 (−32, −32) ≈ 1
cos x1 + 10 2
4
Theoretical optima
2
b2i
Six-Hump Camel-Back Function
( − 65.5, 65.5)
ACS CS
ACS CS
10
2
10
-5
10
-10
10
Fitness
Fitness
Fitness
1
1
10
10
0
10
-15
10
-1
10
-20
10
-25
10
0
0
500
1000
1500
2000
10
-2
0
500
Iteration
1000
1500
2000
10
0
500
Iteration
1000
1500
2000
Iteration
Fig. 1. Performance of the three unimodal benchmark test functions using ACS and CS algorithm. Table 4 The parameters used for the benchmark test functions.
443 444 445 446 447 448 449 450 451 452 453 454
Algorithm
Parameters
ACS CS
N = 25, pa = 0.25, tMAX = 2000 N = 25, pa = 0.25, ˛ = 1, = 1.5, tMAX = 2000
test functions given in Table 2. The performance of the multimodal benchmark test functions displayed in Table 3, is presented in Table 7 and Fig. 3. From Tables 5–7 and Figs. 1–3, it is clear that our proposed ACS algorithm has better convergence than the CS algorithm. So, we can claim that the proposed algorithm can be used for solving various engineering optimization problems efficiently with a faster rate. By the performance of ACS algorithm, we are encouraged to investigate its application in face recognition. In the following section, we propose an ACS–IDA algorithm for face recognition using the ACS algorithm and intrinsic discriminant analysis (IDA).
4. The proposed perturbation free IDA based face recognition In face recognition systems, researchers have faced with the difficulty in solving the singularity problem, which arises due to the small sample size ‘SSS’. To be precise, the number of images included in the training set N is usually much smaller than the number of pixels in each facial images D. It is noteworthy to mention here that the rank of SI is at most N–k. This warrants us to choose another matrix G, if the intrapersonal difference scatter matrix of the projected samples is exactly zero. In order to overcome the complication of the singular matrix SI , the Intrinsicfaces [17] approach uses perturbation techniques to modify the matrix SI to SI + I. Here, (perturbation) represents a positive number (which is very small). Note that I is the identity matrix. Then, the optimal projection vector needed is constructed by using the well-known eigen-decomposition formulae SC g = (SI + I) g. To overcome the perturbation problem, we proposed dimension reduction of IDA using PCA in Section 4.1 and ACS based IDA face recognition in Section 4.2.
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
455 456
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473
G Model
ARTICLE IN PRESS
ASOC 3282 1–15
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
7
Table 5 Performance evaluation of the unimodal benchmark test functions described in Table 1. Function
Algorithm
Best
Mean
Std.
f1
ACS CS
3.1968e − 023 2.4525e − 011
4.0098e − 021 5.4084e − 010
9.1913e − 021 8.6399e − 010
Atime 4.8933 6.1911
f2
ACS CS
2.0102e − 013 1.5424e − 005
1.0975e − 011 5.1062e − 005
1.7537e − 011 2.5343e − 005
5.1294 6.4689
f3
ACS CS
2.2786 5.3252
7.7366 15.2403
4.1152 7.8359
21.6408 23.2127
f4
ACS CS
0.2631 1.7152
1.7192 4.8760
1.4583 2.3787
8.3765 10.4827
f5
ACS CS
8.8894 14.0877
24.1858 28.9901
17.6688 18.0758
5.9723 8.4696
f6
ACS CS
0 0
0 0
0 0
6.5244 6.3235
f7
ACS CS
0.0055 0.0159
0.0169 0.0357
0.0057 0.0229
5.9411 7.1627
Table 6 Performance evaluation of multimodal benchmark test function described in Table 2. Function
Algorithm
Best
Mean
Std.
f8
ACS CS
−1.0130e + 004 −9.8950e + 003
−8.6419e + 003 −9.3976e + 003
597.1854 296.7141
5.6882 6.8855
f9
ACS CS
39.8083 33.4352
63.7177 52.8627
19.4711 10.4973
5.5512 6.7652
f10
ACS CS
2.7023e − 011 1.1034e − 004
9.1723e − 008 0.5599
2.5994e − 007 0.7291
5.3931 6.7084
f11
ACS CS
0 4.8847e − 010
4.9307e − 004 2.7343e − 004
0.0019 0.0013
6.0274 7.2946
f12
ACS CS
3.1602e − 020 9.0092e − 008
0.0069 0.2804
0.0379 0.3255
17.8563 20.0244
f13
ACS CS
1.1801e − 021 2.8804e − 009
3.6625e − 004 0.5256
0.0020 2.1405
16.9331 19.9875
F10
2
10
ACS CS
0
-2
5
0
0
-6
10
Fitness
-4
10
-5
10
-10
10
-8
10
-15
-15
10
-10
10
-20
0
500
1000
1500
2000
10
-5
10
-10
10
10
ACS CS
10
10
Fitness
Fitness
ACS CS
10
10
F13
10
10
5
10
10
F12
10
10
Atime
-20
0
500
Iteration
1000
1500
2000
10
0
500
1000
1500
2000
Iteration
Iteration
Fig. 2. Performance of the three multimodal benchmark test functions using ACS and CS algorithm.
474
475 476 477 478 479
4.1. PCA + IDA: Dimension Reduction of IDA using PCA Here we propose an alternate method to overcome the singularity problem faced by the IDA approach. To achieve this, we use PCA to decrease the dimension of the feature space to N–k. The resulting intrapersonal difference scatter matrix SI is non-singular. We apply it to the IDA defined in Eq. (7) to reduce the dimension up to
l (=rank (SI )), which is less than the rank(SM ) [20]. Here the optimal projection Gopt of PCA + IDA is given as T T T Gopt = Gmida GPCA
(16)
where
GPCA = argmax GT SM G G
480 481
482
483
(17)
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
484
G Model ASOC 3282 1–15
ARTICLE IN PRESS
8
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
F14
3
10
F20 0.2
ACS CS
ACS CS
-10
2
10
ACS CS
0
-10
0.3
Fitness
Fitness
Fitness
-10
0.4
-10
1
10
F23
-1
-10
1
-10
0.5
-10 0
10
2
0
50
100
150
200
0
50
Iteration
100
150
200
-10
0
50
Iteration
100
150
200
Iteration
Fig. 3. Performance of the three multimodal benchmark test functions with fixed dimension using ACS and CS algorithm. Table 7 Performance evaluation of multimodal benchmark test function with fixed dimension described in Table 3.
485
Function
Algorithm
Best
Mean
Std.
Atime
f14
ACS CS
0.9980 0.9980
0.9980 0.9980
1.2820e − 016 8.1539e − 008
1.7573 1.8661
f15
ACS CS
4.4776e − 004 6.0822e − 004
6.8340e − 004 7.1978e − 004
1.1062e − 004 7.5665e − 005
0.5371 0.6187
f16
ACS CS
−1.0316 −1.0316
−1.0316 −1.0316
6.4539e − 016 1.4842e − 010
0.4491 0.5250
f17
ACS CS
0.3979 0.3979
0.3979 0.3979
8.1387e − 011 1.1037e − 008
0.4407 0.5140
f18
ACS CS
3.0000 3.0000
3.0000 3.0000
1.1516e − 015 1.1932e − 008
0.4407 0.5189
f19
ACS CS
−3.8628 −3.8628
−3.8628 −3.8628
2.6543e − 015 2.3914e − 008
0.6060 0.6681
f20
ACS CS
−3.3220 −3.3220
−3.3219 −3.3208
3.1508e − 004 0.0011
0.5982 0.6804
f21
ACS CS
−10.1532 −10.1529
−10.1532 −10.1370
2.1701e − 008 0.0300
0.6328 0.7143
f22
ACS CS
−10.4029 −10.4027
−10.4029 −10.2672
1.5336e − 009 0.4300
0.6950 0.7750
f23
ACS CS
−10.5364 −10.5349
−10.5364 −10.4773
3.1719e − 007 0.1013
0.7884 0.8677
GT GT SC GPCA G PCA Gmida = argmax G GT GT SI GPCA G
(18)
PCA
486 487 488 489
490 491 492
493 494 495 496
Note that GPCA has the dimension of D × (N–k) matrix with orthogonal columns, while the dimension reduction of Gmida is performed over (N–k) × l matrix with orthogonal columns. In computing GPCA , we only take the largest (N–k) principal components.
4.1.1. PCA + IDA algorithm For better clarity, the algorithmic method of the proposed PCA + IDA technique is presented below: (a) Lexicographic ordering: Store the available samples of a different class compactly in the unique learning data matrix X using lexicographic ordering, i.e. X = X 1 , X 2 , . . ., X k , which can be
1 , . . ., xk , . . ., xk again simplified as X = x11 , . . ., xN N 1 1
k
.
i , i = 1, 2, . . ., (b) SVD decomposition: Compute the SVD of HM and HM i k to obtain Ur and U i , i = 1, 2, . . ., k. Then obtain the SC and SI r using Eq. (17). (c) Projection vector using PCA: Form the total scatter matrix SM uses Eq. (1). Calculate the GPCA corresponding to N–k largest eigenvalues of SM .
T S G (d) Then compute Gmida = arg maxG GT GPCA C PCA G /
GT GT SI GPCA G . PCA
T = GT (e) Let Gopt GT . Then, Gopt is the optimal projection for mida PCA PCA + IDA.
The block diagram of our proposed PCA + IDA based face recognition method is shown in Fig. 4. 4.2. ACS–IDA: An IDA based face recognition From the literature, we know that the appearance based method like PCA, LDA, and IDA require an eigen - decomposition process to
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
497 498 499 500 501 502 503 504 505 506
507 508
509
510 511
G Model
ARTICLE IN PRESS
ASOC 3282 1–15
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
9
Fig. 4. Face recognition using PCA + IDA.
512 513 514 515 516 517 518 519 520 521 522 523 524 525
526 527 528 529 530 531 532 533 534 535 536 537 538 539 540
541
find the projection vector of face samples. In fact, we do not find any concrete mathematics to find which principal components are better for face recognition process. So, most of the researcher suggested that taking q (⇐D) number of principal components for face recognition process is useful. In [20], the researchers try to find the LDA based optimal feature vector using optimization algorithm, and validate it with a different face database. They also reported that the recognition rate using optimal feature vector from the optimization domain has more than the feature vector given by the classical method. In this paper, we use a novel optimization method proposed in Section 3 known as ACS to get q (= N − k) optimal eigenvectors from the t (= rank (SM )) eigenvectors using a new fitness function described in the following section.
4.2.1. Selection of principal components using ACS–PCA The main objective of theACS–PCA algorithm is to select a set of eigenvectors g1 , g2 , . . ., gq from the eigenvectors {g1 , g2 , . . ., gt } generated by SM matrix. There may be some discriminatory features, we may lose by reducing the dimension from t eigenvectors to q eigenvectors. This warrants us to propose ACS–PCA to choose q (= N − k); the most effective feature vectors; from the t (= rank (SM )) largest eigenvectors of SM matrix that will be judged by the new fitness function proposed in this section. The dimension of the search space will be t (= rank (SM )); out of which we have to select q (= N − k); the number of optimal principal components in the ACS algorithm. Note that the search process is guided by the fitness function. The fitness function for dimension reduction using the PCA is defined as: fa (P) f (P) = fs (P)
(19)
554
where fa (P) is the advantage term, fs (P) is the class separation (“generalization”) term, and P = gi1 , gi2 , . . ., giq is the set
of eigenvectors corresponding to the eigenvalues q = diag i1 , i2 , . . ., iq . The fs (P) serves the scatter measurement of different classes and which is a measure of the distance between the mean images in a class to mean of the whole set of training images. The fs (P) work to select more members in the recognition process. Let x is the face image. After the dimension reduction, the individual face image x will be x¯ = P T x, the mean image of class Xi will be ¯ i = P T i , and mean image total training sample will be ¯ = P T . By considering the Mahalanobis distance [45], the class separation term fs (P) is represented as:
555
fs (P) = mini=1,2,...,k (di )
542 543 544 545 546 547 548 549 550 551 552 553
The advantage term reflects that during the search process, the eigenvectors P having a greater number of eigenvalues has to be considered first. So, the advantage term fa (P) is defined as:
556 557 558
where di is the distance between ¯ i and ¯ defined by di =
T
¯i − ¯
−1 ¯i − ¯ , and is the generalization ability term
given as = 1/N
k i=1
¯ (¯x − ) ¯ T. x − ) x¯ ∈ X i (¯
560 561
q fa (P) =
j=1 ij
t
(21)
562
l=1 l
where l are eigenvalue generated from the total-class scatter matrix SM . At last, after searching the optimal principal components by the ACS–PCA leads to
Popt = arg max f (P) .
(22)
563 564 565 566
567
P
4.2.2. ACS–IDA algorithm Here, we combine the dimension reduction method (perturbation free method) PCA + IDA explained in Section 4.1, dimension reduction approach using the ACS–PCA described in Section 4.2, and a whitening procedure proposed in [19,20] to develop the ACS–IDA algorithm. The detailed steps of our new algorithm are as follows. Step 1: Lexicographic ordering: Store the samples of a different class data matrix compactly in the original training X using lexicographic ordering, i.e. X = X 1 , X 2 , . . ., X k , which can be
1 , . . ., xk , . . ., xk again simplified as X = x11 , . . ., xN N 1 1
k
.
i , i = 1, Step 2: SVD decomposition: Compute the SVD of HM and HM 2, . . ., k to obtain Ur and U i i , i = 1, 2, . . ., k. Then obtain the r SC and SI using Eq. (17). Step 3: Eigenvector vector using PCA: Form the total scatter matrix SM uses Eq. (1). Calculate the eigenvector of SM with t (= rank (SM )) largest eigenvalues. The corresponding largest eigenvectors are g1 , g2 , . . ., gt , and eigenvalues are 1 , 2 , . . ., t . Step 4: ACS–PCA: Then select the q (= N − k) principal components from the t (= rank (SM )) using ACS–PCA procedure described in Section 4.2.1. The selected eigenvectors are gt1 , gt2 , . . ., gtq and the corresponding eigenvalues
568 569 570 571 572 573
574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589
are t1 , t2 , . . ., tq . Let GACS−PCA = gt1 , gt2 , . . ., gtq , and
590
−1/2 ACS−PCA
591
=
−1/2 −1/2 −1/2 t , t , . . ., tq 1 2
.
Step 5: Whitening procedure: Perform the whitening procedure to −1/2 GACS−PCA , i.e. GACS−PCA−w = GACS−PCA ACS−PCA .
T T G G SC GACS−PCA−w G / ACS−PCA−w SI GACS−PCA−w G .
Step 6: Compute GIDA =argmaxG
G T G T
ACS−PCA−w
(20)
559
Step 7: Optimal projection for ACS–IDA: Calculate the optimal proT = GT GT jection for ACS–IDA as GO . IDA ACS−PCA−w The block diagram of the proposed ACS–IDA algorithm is given in Fig. 5. Using this ACS–IDA algorithm, we develop a face recognition system.
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
592 593 594 595 596 597
598 599 600
G Model ASOC 3282 1–15
ARTICLE IN PRESS
10
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
Fig. 5. Face recognition using ACS + IDA algorithm.
601
5. Experimental results and discussions
training image gives effect to the recognition rate, we used a different number of training images from a class. For this experiment, we use n number of training images and rest (8 − n) images for testing purposes. Note that here n can be either 3 or 4 or 5. For this experiment, we use frontal images with different illumination, various facial expressions, different ages, and the images are with or without glasses [48]. These are extracted from 4 different sets, called Fa, Fb, Fc, and the duplicate [48]. All images are aligned with the center of eyes, which are provided in the database. Finally, the images are normalized (in scale and orientation) and cropped to 63 × 75 pixels for our experiment, with the two eyes of each face image aligned at the same location. One class of cropped and aligned images of the FERET face database is shown in Fig. 6.
613
The results presented in this section are divided into three parts. In the first and second part, we consider the Yale face and ORL face database, respectively. In the third part, we use the FERET face database with smaller subset. These databases are used to evaluate the proposed PCA + IDA vs. IDA based face recognition algorithm. For the evaluation, we use the Identification rate. In all the experiments using simple IDA, is set to 0.05 and the simple minimum distance classifier is used to classify after applying dimensionality reduction. Experiments are carried out on Intel dual core 2.0 GHz processor with 4 GB RAM running under the Windows 7.0 operating system. The algorithms are implemented using MATLAB, R 2009.
614
5.1. Face databases and their parameters for experiments
5.2. Experiments
The different face databases used for the face recognition system are described as:
Here, we deploy the proposed methods ACS–IDA and PCA + IDA in the face recognition. The ACS–IDA employed the ACS–PCA for dimension reduction and PCA + IDA for finding the optimal projection vectors. For a comparable purpose, IDA reported in [17] is also operated on all three face databases explained in Section 5.1. The proposed PCA + IDA efficiently guide us for implementation of perturbation free dimension reduction and hence, better performance is expected. All the experimental results are interpreted in a tabular form displayed in Table 8. Here rank 1 to 10 is interpreted for YALE, ORL, and FERET face database. The graphical interpretation of our ACS–IDA and PCA + IDA approach vs. IDA are presented in Figs. 7–9 for the rank 1 to rank 15. For more details of rank calculation, one can refer [48]. From Table 8, it is observed that the proposed algorithms have a better recognition rate than the classical method IDA. The YALE face database consists of varying illumination. The ORL face database consists of pose variation. The FERET face database is composed of varying illumination, age, and facial expression. So, by considering the three variant face databases, we try to evaluate the performance of the ACS–IDA and PCA + IDA algorithms. From Table 8, and Figs. 7–9, it is observed that the proposed algorithm has significant improvement over IDA. If we compare our two proposed algorithms, the ACS–IDA has a better recognition rate than the PCA + IDA, but the former has a little bit constraint over time. The ACS–IDA takes more time than the PCA + IDA, because it employs the adaptive cuckoo search (ACS) algorithm to find optimal principal component vectors for dimension reduction in IDA. As the PCA + IDA approach is perturbation free, we try to compare the time required to evaluate the Gopt in our proposed algorithm PCA + IDA vs. IDA. The last column of Table 8, shows a comparison of methods related to the time required for various face
602 603 604 605 606 607 608 609 610 611 612
615 616
617 618 619 620 621 622 623 624 625 626 627
628 629 630 631 632 633 634 635 636 637 638 639
640 641 642 643
5.1.1. The Yale face database The Yale face database (Yale University Face Database, 2002) [46] comprises 165 gray scale face images with wide-ranging lighting condition of 15 individuals. There are 11 dissimilar facial images per individual. One can notice several facial expressions (happy, normal, sad, sleepy, wink and surprise), illumination conditions (left-right, center-right, and right-right), and facial details (with or without glasses). Note that the original size of the images found in the database is 243 × 320 pixels. Images are cropped to 61 × 80 from 243 × 320 to make computationally efficient. One class original image of the Yale face database is displayed in Fig. 6. 5.1.2. The ORL face database The ORL face database (Olivetti & Oracle Research Laboratory Face Database, 1994) [47] comprises 400 face images of 40 persons, 10 images per individual. The original size of the facial images is 112 × 92 pixels. We notice some deviations in facial expressions (smiling or not) with some different poses (center, left and right). Moreover, one can also notice some variations in scale of up to about 10 percent. The images are normalized and cropped to 28 × 23 pixels. One class original image of ORL face database is shown in Fig. 6. For the experimental purposes, we take arbitrary n (n = 3 or 4 or 5) number of training images, and rest (10 − n) images from a class for testing purpose. 5.1.3. The FERET database The U.S. Army FERET face database is used in the third part of our experiment. In these experiments, we use a smaller subset of 61 people with eight images for each person. To see how the number of
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
644 645 646 647 648 649 650 651 652 653 654 655 656
657
658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689
G Model
ASOC 3282 1–15
No. of Training Image
Algorithm
Rank 1 (%)
Rank 2 (%)
Rank 3 (%)
Rank 4 (%)
Rank 5 (%)
Rank 6 (%)
Rank 7 (%)
Rank 8 (%)
Rank 9 (%)
Rank 10 (%)
Run time (in s)
YALE
3 Train
ACS–IDA PCA + IDA IDA ACS–IDA PCA + IDA IDA ACS–IDA PCA + IDA IDA
78.3 75.0 75.0 78.1 77.1 75.2 88.9 86.7 85.6
82.5 80.8 75.0 81.9 81.0 77.1 90.0 86.7 85.6
87.5 85.0 75.0 84.8 83.8 79.0 91.1 88.9 87.8
91.6 90.0 88.3 89.5 89.5 79.0 92.2 88.9 88.9
93.3 91.6 88.3 93.3 92.4 89.5 93.3 88.9 91.1
93.3 94.1 90.0 94.3 94.3 90.5 94.4 94.4 92.2
94.1 94.1 91.6 95.2 94.3 90.5 94.4 94.4 93.3
96.6 95.0 92.5 96.2 94.3 92.4 96.7 95.6 93.3
97.5 96.6 92.5 97.1 96.2 93.3 96.7 95.6 95.6
97.5 96.6 93.3 97.1 96.2 93.3 97.8 95.6 95.6
9.11 60.45 – 9.06 62.96 – 9.10 62.76 –
ACS–IDA PCA + IDA IDA ACS–IDA PCA + IDA IDA ACS–IDA PCA + IDA IDA
71.4 67.1 66.1 75.0 70.8 70.8 80.5 80.5 78.0
74.6 73.9 67.9 78.3 78.3 72.9 82.0 81.5 80.5
76.4 76.1 70.7 80.8 79.2 75.4 84.0 83.5 82.0
77.1 77.9 75.4 83.3 82.5 77.5 85.5 85.5 82.5
80.4 80.0 75.7 85.8 85.4 80.8 87.5 87.5 83.5
82.9 82.1 77.1 87.1 86.7 82.9 89.0 89.0 86.5
84.3 84.3 79.3 87.9 87.5 84.2 91.5 91.5 87.0
85.4 85.4 81.4 89.6 89.2 85.0 93.0 93.0 87.5
87.1 86.1 83.2 89.6 89.6 87.1 94.0 94.0 88.0
87.9 87.9 84.3 91.3 90.8 87.9 95.0 94.5 89.5
1.34 3.87 – 1.25 3.81 – 1.22 3.75 –
ACS–IDA PCA + IDA IDA ACS–IDA PCA + IDA IDA ACS–IDA PCA + IDA IDA
79.1 75.4 74.1 73.8 73.8 70.6 84.1 80.4 78.5
80.4 75.4 75.4 76.6 73.8 70.6 84.1 84.1 81.3
81.0 77.3 77.3 78.0 76.6 73.8 87.9 85.0 85.0
81.3 79.1 80.1 79.4 79.9 76.6 89.7 87.9 86.0
82.2 79.1 80.7 80.4 81.3 77.6 89.7 88.8 86.9
82.6 81.3 82.6 80.4 82.7 79.0 90.7 90.7 87.9
84.4 82.9 83.2 82.7 84.1 79.4 92.5 91.6 89.7
85.4 83.5 83.2 83.6 86.0 80.4 92.5 91.6 90.7
86.3 83.8 83.5 83.6 86.0 82.7 92.5 91.6 91.6
86.6 84.1 83.8 83.6 86.4 82.7 92.5 92.5 92.5
8.65 24.45 – 8.69 23.82 – 9.52 23.30 –
4 Train
5 Train
ORL
3 Train
4 Train
5 Train
FERET
3 Train
4 Train
5 Train
ARTICLE IN PRESS
Database
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx 11
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
Table 8 Identification Performance.
G Model ASOC 3282 1–15
ARTICLE IN PRESS
12
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
Fig. 6. The images of one class from each database.
(3 Train)
100
(4 Train)
(5 Train)
96
98
94
85
Identification Rate (%)
90
90 88 86 84 82 80
80
ACS - IDA PCA + IDA IDA 75
96
92
Identification Rate (%)
Identification Rate (%)
95
2
4
6
8
10
12
76
14
2
4
6
Rank
8
10
12
92
90
88
ACS - IDA PCA + IDA IDA
78
94
ACS - IDA PCA + IDA IDA
86
14
2
4
6
Rank
8
10
12
14
Rank
Fig. 7. Identification performance of PCA + IDA vs. IDA on YALE Face Database.
693
67% and 59% for ORL and FERET face databases (with 5 Train), respectively. Hence, our algorithm PCA + IDA is computationally efficient than IDA based (perturbation-based) face recognition proposed earlier in [17]. Here, it is claimed that our algorithm is
(4 Train)
(3 Train)
(5 Train)
98
90
96 90
94
85
80
75
70
ACS - IDA PCA + IDA IDA 2
4
6
8
Rank
10
12
14
Identification Rate (%)
692
databases. From Table 8, it is observed that the computation time is reduced significantly by using PCA + IDA method. It is noticed that the computation time is reduced by about 85% of Yale face database (with 5 Train). Similarly, the computation time is reduced by about
Identification Rate (%)
691
Identification Rate (%)
690
85
80
92 90 88 86 84 82
75
ACS - IDA PCA + IDA IDA 2
4
6
8
10
12
14
ACS - IDA PCA + IDA IDA
80 78
Rank
2
4
6
8
10
12
14
Rank
Fig. 8. Identification performance of PCA + IDA vs. IDA on ORL Face Database.
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
694 695 696 697
G Model
ARTICLE IN PRESS
ASOC 3282 1–15
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
(3 Train)
90
13
(4 Train)
(5 Train) 94
86 92
85
80
Identification Rate (%)
Identification Rate (%)
Identification Rate (%)
84 82 80 78 76
75 2
4
6
8
10
12
14
88
86
84
82
74
ACS - IDA PCA + IDA IDA
90
ACS - IDA PCA + IDA IDA
72 2
4
6
Rank
8
10
12
ACS - IDA PCA + IDA IDA
80
14
2
4
6
Rank
8
10
12
14
Rank
Fig. 9. Identification performance of PCA + IDA vs. IDA on FERET Face Database. Table 9 Identification rate (%) of different methods on FERET face database at Rank 1. ACS–IDA
PCA + IDA
IDA
Eigenfaces
Fisherface
ICA
LSDA
OLDA
79.12
75.38
74.14
68.22
69.78
68.22
71.02
71.96
729
5.3. Comparison with the state-of-the-art technology
699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727
730 731
In the Section, we compare our proposed algorithm ACS–IDA and PCA + IDA with the IDA [17]. In the recent past, much work has
90 88 86
Identification Rate (%)
728
suitable for improvement of face recognition, object recognition and character recognition systems. The number of training Images shown in Table 8 are 3, 4 and 5, which are enough for reliable data. In the face recognition research, it is important to achieve improved recognition using less number of training images. In Table 8, there is an improvement in ACS–IDA when the number of train images are 3, 4 and 5, as compared to PCA + IDA and IDA. This is true for all three face databases. But, there is an indication of improvement in the recognition rate when the number of training images increases. This is true for all three databases. Further, the identification performance has been improved by using the optimal eigenvectors. Note that the optimal eigenvectors are obtained by deploying the newly developed adaptive cuckoo search algorithm. Interestingly, the recognition rate is improved while using ACS–IDA algorithm, which is desirable. In this context, different numbers of training images are considered to evaluate the performance of our proposed method. It is observed that the ACS–IDA algorithm performs well even with 3 numbers of training images. It may reiterate the fact that we consider 11 face images per individual for the experiment purpose using the YALE face database. We consider 10 face images per individual for the ORL face database while 8 face images per individual are considered for the FERET face database. For example: when we use 3 face images for the training purpose, the rest 8 face images are used for the testing purpose for the YALE face database. Similarly, when we use 3 face images for the training purpose, the rest 7 face images are used for the testing purpose for the ORL face database. Further, when we use 3 face images for the training purpose, the rest 5 face images are used for the testing purpose for the FERET face database.
698
84 82 80 78 76
ACS - IDA PCA + IDA IDA Eigenface Fisherface ICA LSLDA OLDA
74 72 70 2
4
6
8
10
12
14
16
18
20
Rank Fig. 10. Comparison of identification performance various state-of-art algorithm on FERET Face Database.
been cited in the direction of dimension reduction in face recognition methodology. This warrants us to provide a comparison to state-of-the-art methods. For the comparison, we have taken a FERET face database presented in the Section 5.1 with three numbers of the training image for each class. The comparison is done with the IDA [17] method, Fisherfaces [2] method, Orthogonal LDA [49], eigenfaces [1] method, Independent Component
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
732 733 734 735 736 737 738
G Model ASOC 3282 1–15
ARTICLE IN PRESS
14
M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
Analysis (ICA) based face recognition approach [50], and Lacality Sensitive Discriminant Analysis (LSDA) [51]. Here, for simplicity, we use Euclidian distance as a measure of the nearest neighbor 741 Q6 classifier. Fig. 11 shows the identification rate vs. rank using these 742 algorithms. Table 9 shows the identification rate at rank 1. From 743 Table 9 and Fig. 10, it is seen that the proposed method performs 744 better than the state-of-the-art methods. 745 739
A.3. Hartman’s family (f19 )
782
740
746
747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772
773
774 775
There are so many modifications done to the cuckoo search optimization algorithm in the recent past to improve the learning process and the rate of convergence. However, this paper presented a novel adaptive cuckoo search (ACS) algorithm that adaptively decides the step size, from the knowledge of its fitness function value, and its current location. The proposed adaptive cuckoo search is a parameter free algorithm, thinking beyond the Levy step. The performance evaluation is carried out with the help of twenty three standard benchmark test functions. From Tables 5–7 and Figs. 1–3, we observe that our proposed ACS algorithm outperforms the standard CS algorithm in a faster way to find the local optimal or suboptimal solutions. The ACS algorithm has potential and can be implemented to obtain the optimal solution in various applications. The proposed algorithm ACS is used to develop a face recognition algorithm known as adaptive cuckoo search–intrinsic discriminant analysis (ACS–IDA). Also, we develop a faster perturbation-free PCA + IDA algorithm for an efficient way of dimension reduction in IDA. The ACS–IDA algorithm uses the ACS optimization algorithm along with the PCA + IDA to select the optimal feature vectors. The proposed algorithms ACS–IDA and PCA + IDA are validated using three well-known face databases, i.e. Yale, ORL and FERET. Our proposed ACS–IDA algorithm outperforms the other algorithms. Finally, it is believed that the proposed algorithms are useful for development of face recognition, object recognition and character recognition systems. Acknowledgement We sincerely thank the U.S. Army Research Laboratory for providing us the FERET database for our research work.
777
778
A.1. Shekel’s foxholes function (f14 )
779
780
781
aij =
−32 −16 −32 −32
ai1
ai2
ai3
ci
pi1
pi2
pi3
1 2 3 4
3 0.1 3 0.1
10 10 10 10
30 35 30 35
1 2 3 4
0.3689 0.4699 0.1091 0.038150
0.1170 0.4387 0.8732 0.5743
0.2673 0.7470 0.5547 0.8828
A.4. Hartman’s family (f20 )
6. Conclusion
Appendix A. Detail description of some of the benchmark test functions given in Table 3.
776
i
0 −32
16 −32
32 −32
−32 −16
... ...
0 16 32 32
A.2. Kowalik’s function (f15 ) i
ai
b−1 i
1 2 3 4 5 6 7 8 9 10 11
0.1957 0.1947 0.1735 0.1600 0.0844 0.0627 0.0456 0.0342 0.0323 0.0235 0.0246
0.25 0.5 1 2 4 6 8 10 12 14 16
32 32
i
ai1
1 2 3 4
10 0.05 3 17
i
pi1
ai2 3 10 3.5 8
783
784
ai3
ai4
ai5
ai6
ci
17 17 1.7 0.05
3.5 0.1 10 10
1.7 8 17 0.1
8 14 8 14
1 1.2 3 3.2
pi3
pi4
pi5
pi6
785
1 2 3 4
pi2
0.1312 0.2329 0.2348 0.4047
0.1696 0.4135 0.1415 0.8828
0.5569 0.8307 0.3522 0.8732
0.0124 0.3736 0.2883 0.5743
0.8283 0.1004 0.3047 0.1091
0.5886 0.9991 0.6650 0.0381
A.5. Shekel function (f21 , f22 , and f23 )
786
i
ai1
ai2
ai3
ai4
ci
1 2 3 4 5 6 7 8 9 10
4 1 8 6 3 2 5 8 6 7
4 1 8 6 7 9 5 1 2 3.6
4 1 8 6 3 2 3 8 6 7
4 1 8 6 7 9 3 1 2 3.6
0.1 0.2 0.2 0.4 0.4 0.6 0.3 0.7 0.5 0.5
References [1] M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, in: Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’91), 1991, pp. 586–591. [2] A.M. Martinez, A.C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2) (2001) 228–233. [3] R.A. Fisher, The use of multiple measurements in taxonomic problems, Ann. Eugen. 7 (7) (1936) 179–188. [4] H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data with application to face recognition, Pattern Recognit. 34 (2001) 2067–2070. [5] J.L. Kostantinos, N. Venetsanopoulos, Face recognition using LDA-based algorithm, IEEE Trans. Neural Networks 14 (1) (2003) 195–200. [6] H.F. He, P. Niyogi, Locality preserving projections, in: Proceedings of the Conference on Advances in Neural Information Processing Systems, 2003. [7] J.B. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for Q7 nonlinear dimensionality reduction, Science 290 (2000) 2319–2323. [8] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500) (2000) 2323–2326. [9] L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds, J. Mach. Learn. Res. 4 (2003) 119–155. [10] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Proceedings of the Conference on Advances in Neurla Information Processing System, 2001, p. 5. [11] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (2003) 1373–1396. [12] D. de Ridder, R.P.W. Duin, Locally linear embedding for classification, in: Technical report PH-2002-01, Pattern Recognition Group, Department of Imaging Science and Technology, Delft University of Technology, Delft, The Netherlands, 2002. [13] O. Kouropteva, O. Okun, M. Pietikäinen, Selection of the optimal parameter value for the locally linear embedding, in: Proceedings of the conference on Fuzzy Systms and Knowledge Discovery, 2002. [14] O. Samko, A.D. Marshall, P.L. Rosin, Selection of optimal parameter value for the Isomap algorithm, Pattern Recognit. Lett. 27 (2006) 968–979. [15] Y. Bengio, J. Paiement, P. Vincent, O. Delalleau, N. Roux, M. Ouimet, Out-ofsample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering, Q8 Adv. Neural Inf. Proc. Syst. 16 (2004). [16] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711–720.
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039
787
788
789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827
G Model ASOC 3282 1–15
ARTICLE IN PRESS M.K. Naik, R. Panda / Applied Soft Computing xxx (2015) xxx–xxx
[17] Y. Wang, Y. Wu, Face recognition using Intrinsicfaces, Pattern Recognit. 43 (2010) 3580–3590. 830 [18] M. Kirby, L. Sirovich, Application of the Karhunen–Leove procedure for the char831 acterization of human faces, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1) (1990) 832 103–108. 833 [19] C. Liu, H. Wechsler, Evolutionary pursuit and its application to face recognition, 834 IEEE Trans. Pattern Anal. Mach. Intell. 22 (6) (2000) 570–582. 835 [20] W.S. Zheng, J.H. Lai, P.C. Yuen, GA-Fisher: a new LDA-based face recogni836 tion algorithm with selection of principal components, IEEE Trans. Syst. Man 837 Cybern., B: Cybern. 35 (5) (2005) 1065–1078. 838 [21] K.M. Passino, Biomimicry of bacteria foraging for distributed optimization and 839 control, IEEE Control Syst. Mag. (2002) 52–67. Q9 840 [22] Y. Liu, K.M. Passino, Biomimicry of social foraging bacteria for distributed opti841 mization: models, principles and emergent behaviours, J. Optim. Theor. Appl. 842 115 (3) (2002) 603–628. 843 [23] R. Panda, M.K. Naik, B.K. Panigrahi, Face recognition using bacterial foraging 844 strategy, Swarm Evol. Comput. 1 (2011) 138–146. 845 [24] R. Panda, M.K. Naik, CBFOA: a crossover bacterial foraging optimization algo846 rithm, Appl. Comput. Intell. Soft Comput. (2012) 1–7. 847 [25] S. Dasgupta, S. Das, A. Abraham, A. Biswas 1., Adaptive computational chemo848 taxis in bacterial foraging optimization: an analysis, IEEE Trans. Evol. Comput. 849 13 (4) (2009) 919–941. 850 [26] X.S. Yang, S. Deb, Cuckoo search via Levy flights, in: Proc. IEEE Conf. of World 851 Congress on Nature & Biologically Inspired Computing, 2009, pp. 210–214. 852 [27] X.S. Yang, S. Deb, Cuckoo search: recent advances and applications, Neural 853 Comput. Appl. 24 (1) (2013) 169–174. 854 [28] Qun Dai, Back-propagation with diversive curiosity: an automatic conversion 855 from search stagnation to exploration, Appl. Soft Comput. 1 (2013) 483–495. 856 [29] Qun Dai, Z. Ma, Q. Xie, A two-phased and ensemble scheme integrated back857 propagation algorithm, Appl. Soft Comput. 24 (2014) 1124–1135. 858 [30] Hamid Shokri-Ghaleh, Alireza Alfi, A comparison between optimization algo859 rithms applied to synchronization of bilateral teleoperation systems against 860 time delay and modeling uncertainties, Appl. Soft Comput. 24 (2014) 447–456. 861 [31] Hamid Shokri-Ghaleh, Alireza Alfi, Optimal synchronization of teleopera862 tion systems via cuckoo optimization algorithm, Nonlinear Dyn. 78 (2014) 863 2359–2376. ˇ 864 sek, S. Lib, M. Mernik, Replication and comparison of computational [32] M. Crepinˇ 865 experiments in applied evolutionary computing: common pitfalls and guide866 lines to avoid them, Appl. Soft Comput. 19 (2014) 161–170. 867 [33] J. Derrac, S. García, D. Molina, F. Herrera, A practical tutorial on the use of non868 parametric statistical tests as a methodology for comparing evolutionary and 869 swarm intelligence algorithms, Swarm Evol. Comput. 1 (1) (2011) 3–18. 870 [34] A. Alfi, PSO with adaptive mutation and inertia weight and its application in parameter estimation of dynamic systems, Acta Autom. 37 (5) (2011) 541–549. 828 829
15
871 [35] A. Alfi, M.M. Fateh, Identification of nonlinear systems using modified particle 872 swarm optimization: a hydraulic suspension system, J. Veh. Syst. Dyn. 46 (6) 873 (2011) 871–887. 874 [36] A. Alfi, M.M. Fateh, Intelligent identification and control using improved fuzzy 875 particle swarm optimization, Expert Syst. Appl. 38 (2011) 12312–12317. 876 [37] A. Arab, A. Alfi, An adaptive gradient descent-based local search in memetic 877 algorithm for solving engineering optimization problems, Inf. Sci. 299 (2015) 878 117–142. 879 [38] A. Alfi, A. Akbarzadeh Kalat, M.H. Khooban, Adaptive fuzzy sliding mode control 880 for synchronization of uncertain non-identical chaotic systems using bacterial 881 foraging optimization, J. Intell. Fuzzy Syst. 26 (2014) 2567–2576. 882 [39] A. Alfi, A. Khosravi, A. Lari, Swarm-based structure-specified controller design 883 for bilateral transparent teleoperation systems via synthesis, IMA J. Math. 884 Control Inf. 31 (2014) 111–136. 885 [40] S. Chakraverty, A. Kumar, Design optimization for reliable embedded system 886 using cuckoo search, in: Proc. International Conf. on Electronics, Computer 887 Technology, vol. 1, 2011, pp. 164–268. 888 [41] P. Barthelemy, J. Bertolotti, D.S. Wiersma, A Levy flight for light, Nature 453 889 (2008) 495–498. 890 [42] X.S. Yang, Nature-Inspired Metaheuristic Algorithms, second ed., Luniver, Bris891 tol, U.K., 2010. 892 [43] H. Soneji, R.C. Sanghvi, Towards the improvement of Cuckoo search algorithm, 893 in: Proc. World Cong. On Information and Communication Technologies, 2012, 894 pp. 878–883. 895 [44] X. Yao, Y. Liu, G. Lin, Evolutionary programming made faster, IEEE Trans. Evol. 896 Comput. 3 (1999) 82–102. 897 [45] W.S. Yambor, B.A. Draper, J.R. Beveridge, Analyzing PCA-based face recogni898 tion algorithms: eigenvector selection and distance measures, in: Proc. Second 899 Workshop Empirical Evaluation Computer Vision, 2000. [46] Yale University Face Database, , 2002, http://cvc.yale.edu/projects/ Q10 900 901 yalefaces/yalefaces.html. 902 [47] Olivetti & Oracle Research Laboratory Face Database of Faces, 1994, 903 http://www.cam-orl.co.uk/facedatabase.html. 904 [48] P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss, The FERET evaluation methodology 905 for face-recognition algorithm, IEEE Trans. Pattern Anal. Mach. Intell. 22 (10) 906 (2000) 1090–1104. 907 [49] J. Ye, Charcterization of a family of algorithms for generalized discriminant 908 analysis on under sampled problems, J. Mach. Learn. Res. 6 (2005) 483–502. 909 [50] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent 910 component analysis, IEEE Trans. Neural Networks 13 (2002) 1450–1464. 911 [51] D. Cai, X. He, K. Zhou, J. Han, H. Bao, Locality sensitive discriminant analysis, in: 912 Proceedings of the 20th International Joint Conference on Artificial Intelligence, 913 San Francisco, CA, USA, 2007, pp. 708–713.
Please cite this article in press as: M.K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Appl. Soft Comput. J. (2015), http://dx.doi.org/10.1016/j.asoc.2015.10.039