Author’s Accepted Manuscript Swarm intelligence inspired classifiers for facial recognition Salima Nebti, Abdallah Boukerram
www.elsevier.com/locate/swevo
PII: DOI: Reference:
S2210-6502(16)30109-2 http://dx.doi.org/10.1016/j.swevo.2016.07.001 SWEVO225
To appear in: Swarm and Evolutionary Computation Received date: 9 January 2016 Revised date: 19 May 2016 Accepted date: 6 July 2016 Cite this article as: Salima Nebti and Abdallah Boukerram, Swarm intelligence inspired classifiers for facial recognition, Swarm and Evolutionary Computation, http://dx.doi.org/10.1016/j.swevo.2016.07.001 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Regular paper
Swarm intelligence inspired classifiers for facial recognition Salima Nebti*, Abdallah Boukerram Department of Computer Science, Ferhat Abbas University, Sétif, 19000, Algeria
ARTICLE INFO
ABSTRACT
Article history:
Facial recognition is a challenging issue in pattern recognition arising from the need for high security systems capable of overcoming the variability of the acquisition environment such as illumination, pose or facial expression. A broad range of recognition methods have been suggested, yet most are still unable to yield optimal accuracy. More recently, new methods based on swarm intelligence or classifiers combination have been devised in the field of facial recognition. Swarm intelligence based methods aim to achieve effective recognition accuracy by exploiting their global optimization capability. The combination of classifiers is a new trend allowing cooperation between multiple classifiers. In this work, two classifiers inspired from swarm intelligence are proposed: a bees algorithm based classifier and a decision tree based binary particle swarm optimization classifier. The two are then combined with a decision tree based fuzzy support vector machine by using the majority vote as an attempt to compensate for the weakness of single classifiers. Moreover, the impact of different characteristic features and space reduction methods has been examined namely, the Gabor magnitude and the Gabor phase congruency features in combination with PCA, LDA or KFA reduction space methods. The experiments were conducted on four popular databases: ORL, YALE, FERET and UMIST. The results revealed that the proposed swarm intelligence based classifiers are very effective compared to similar classifiers in terms of recognition accuracy.
Received 00 January 00 Received in revised form 00 February 00 Accepted 00 March 0000 Available online 00 April 0000 Keywords: Face recognition BA BPSO FSVM Decision tree Classifier combination
© 2016 Published by Elsevier Ltd
1. Introduction The old security systems based on encryption and passwords have proven vulnerable and easily breakable. This made the biometric technology crucial for many domains. Biometrics is a new trend aiming to reinforce security based on the unique human physiological or behavioural traits, such as face, gait, fingerprint, signature, voice, iris etc. These characteristics could not be stolen or forged and access to specific resources is not permitted except in the presence of authorized users, ensuring in that way secured identification or authentication [1]. Facial recognition is generally favoured over other biometric techniques as it is intuitive, simple and robust against hacking. In addition, it can be suitable for several domains, such as electronic commerce, network access control, bank and airport security, passport control, search for wanted criminals, automatic processing of identity cards and driver’s licenses etc. [2],[3]. Currently, facial recognition has attained a 100% recognition rate in a controlled environment without changes in illumination, pose, expression, or any occlusion by other objects. However, facial recognition remains a major challenge in a real environment that changes over time, since facial features can differ significantly from one image to another due to variation in illumination, pose, expression, or occlusion by other objects, such as hat, glasses, beard, or moustache etc. Therefore, acquiring images in such conditions decreases the system performance as the characteristic features of the same individual face greatly change from one image to another [3]. To overcome these difficulties, many solutions have been proposed, such as the use of thermal infrared imagery due to its invariance to illumination, expression and facial disguises [4]. This technique is based on heat radiation obtained from the vascular structure of an individual’s face [1]. This allows face acquisition under any light condition, even if the environment is entirely dark, and provides much robustness against expression variations [4]. It permits a higher level of accuracy in facial recognition compared to the visible light based technique. Some of the shortcomings of this technique, however, include sensitiveness to changes in temperature and or the subject’s emotional state [1]. Another solution is to combine thermal and visible images. This multi-modal technique may achieve higher accuracy by taking advantage of their complementary information [4]. Photometric normalization algorithms can also be used to reduce the effects of illumination variation [3, 5]. To deal with pose variation many solutions have been suggested such as the use of invariant geometric features, synthesis of multiview images or three-dimensional face representation. 3D imagery is an excellent solution to construct a robust facial recognition system against pose and illumination variations [6]. However, 3D reconstruction requires a long time of computation which makes its application a challenge in real environments. Another solution for pose alignment is to construct target data based on information from both source and target data. This method, called nonlinear latent sparse domain transfer (NLSDT), proved its efficiency for face recognition across pose compared to other methods [7].
* Corresponding author. Tel.: +213 031789105. E-mail address:
[email protected]
S.Nebti et al / Swarm and Evolutionary Computation 00 (2016) 000–000
Face recognition systems can be categorized into two types namely, verification or identification systems. In a verification system the user’s identity is known in advance, then through a 1 to 1 comparison the system verifies whether the present person is an impostor or not. This is useful in access control and authentication in e-commerce or mobile devices. While in a face identification system, the person’s identity is unknown and the system through a 1 to many comparisons selects from the database the most similar face image to the person’s face being identified. Face identification systems reinforce security by preventing individuals from acquiring multiple identities. This is useful in passport security, driver’s licenses, ID cards etc. [13]. Identification can also be used to verify the non-membership of persons to particular lists by their examination through 1 to many comparisons. This mode is generally used in airport security, surveillance processes and the security of specific places etc. [1]. Face recognition methods can be distinguished into three groups namely, appearance based methods, local methods and hybrid ones. Appearance based methods use the whole face area to extract features, including Eigenface [14], Fisherface [15], [16], linear discriminant analysis (LDA) [17], support vector machines (SVM) [18] etc. These methods need a large memory space and a long computational time. The local methods are based on local characteristic regions such as eyes, mouth, nose etc. These methods are not easy to implement since they must be more precise in comparison than the appearance based methods. The popular local methods are: local binary patterns (LBP) [19], elastic bunch graph matching BGM) [52] etc. The combination of appearance and local based methods, known as hybrid methods, can provide higher accuracy by exploiting the complementarity of statistical colour distribution and the form of the characteristic regions, an example of such methods is the hidden Markov models [20]. More recently, efforts aiming at applying swarm optimization methods have given promising results in many fields, such as concentration estimation problems where a set of swarm intelligence based methods have been used to improve the learning parameters of a multi-layer perceptron namely, through the use of genetic algorithm (GA), inertia weight approach PSO (IWA-PSO), adaptive PSO (APSO), attractive and repulsive PSO (ARPSO), PSO based on diffusion and repulsion (DRPSO), PSO based on bacterial chemotaxis (PSOBC), and PSO based on adaptive genetic strategy (PSOAGS) [8], [9]. Swarm optimization methods have also been used in facial recognition to improve features selection for higher recognition accuracy and a lower computing time [21–25]. They have also been used for dimensionality reduction and features selection based on an adaptive cuckoo search algorithm [26]. PSO method may improve SVM accuracy by finding its optimal parameters [27]. PSO has also been found to improve the parameters of correlation filters [28]. PSO has equally been used to find the optimal weights for a weighted fusion of visible and infrared features [29]. The main idea addressed in this work is the combination of new swarm intelligence based classifiers with a decision tree based fuzzy support vector machine (DT-FSVM) using Gabor magnitude as characteristic features to retain a high level of recognition accuracy due to the complementarity of dissimilar classifiers. The novelty in our work can be stated in the use of swarm intelligence methods to construct new and effective classifiers for face recognition, as well as in the way used for combining dissimilar classifiers. The proposed work has not been previously investigated and their performance has been evaluated on four famous databases namely, YALE, ORL, FERET and UMIST. The remainder of this paper consists of a theoretical background of the methods employed in this work, followed by a presentation of the proposed algorithms for face recognition. Section 4 is devoted to the explanation of the strategy used for the combination of classifiers. Afterwards, in Section 5, the experimental results are discussed. Finally, a conclusion and future remarks are given in section 6. 2. Theoretical background In this section, we present the algorithms used which are respectively the Gabor magnitude and the Gabor phase based features extraction, the DT-FSVM classifier, the binary particle swarm optimization (BPSO) and the bees algorithm (BA).
2.1 Gabor wavelets The Gabor wavelets have been widely used for fingerprint identification [30, 31] and face recognition [32, 33], where they proved a great improvement in recognition accuracy compared to the old reduction space methods, such as PCA, LDA, KFA etc. In addition, Gabor wavelets are a good choice for texture discrimination since they analyse the texture of a given object for different resolutions and different angles. In a 2D space, a Gabor filter is a kernel Gaussian function modulated by a complex plane wave [34, 35]: (
( , )
)
(1)
Where: cos sin sin cos f: the frequency of the complex sinusoid γ: the spatial width of the wavelet along the sinusoidal plane wave ω is the wavelet’s spatial width perpendicular to the wave θ : the wavelet orientation x and y : the pixel coordinates In most works, 5 frequencies (f ) and 8 orientations (v) are used to create an appropriate filter bank of 40 Gabor filters using the following parameters [34]: √ , ,
.
(√ ) ,
.,
2.1.1 Gabor magnitude features
,
..,
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
As mentioned before, the Gabor analysis is based on the convolution of the face image ( , ) with a set of Gabor filters , ( , ) characterized by different orientations and resolutions [35]. ( , ) ( ) , ( , ) , ( , ) ( ) Where , is the result of image filtering by convolution with a Gabor filter having the resolution and orientation . This , results in a complex number that can be decomposed into real , and imaginary , parts: Based on these values, the magnitude responses , ,
( , )
√
( , )
( , )
,
and phase
( , )
,
,
of a Gabor filter are computed as follows [34]:
( )
( , ) ) ( , )
,
arctan (
,
,
( )
2.2 The support vector machines SVMs are among the most frequently used classifiers which have proved their efficiency in face recognition. They are based on two key ideas: the margin maximisation and the kernel notion. An SVM is a quadratic optimization algorithm that maximizes, in linearly separable problems, the separating margin between support vectors. An SVM uses a kernel function in high dimensional problems to transform the data space into a greater space in which linear separating hyperplanes can be found. FSVMs are a variant of support vector machines where a fuzzy membership function is usually introduced to classify the uncertain data points instead of fixed values. FSVMs provide a better classification of the noisy data compared to the basic SVMs [36]. Below we describe the principle of basic SVMs and the used FSVM classifier for face identification.
2.2.1 Two-class SVMs In a linearly separable space, the decision function between classes i and j is given by: ( ) ( )
.
( ) ( ),
1
( : number of data points)
.
is a d-dimensional vector, is a scalar, 1 1
is a d-dimensional input point. Classes i and j have target labels:
The weight vector and the bias are adjusted by maximization of the margin 1 ‖ ‖ with the constraint that the points must be classified outside the margin [38]: ‖ ‖
{
( .
)
1,
( )
1
The solution of this problem is the extreme point of the primal Lagrangian [38]: 1
‖ ‖
∑
( ( .
)
1)
( )
To solve this problem, we maximize the dual Lagrangian with respect to the Lagrange multipliers [10],[38]: 1 ma ∑ ∑ ( . ) ( ) ,
With
, ∑
The decision function is thus: ( )
(∑
( .
are the support vectors;
) ,
)
( )
and are defined by resolving the previous quadratic optimization problem Eq. (8)
2.2.2 Two-class FSVMs In SVM or FSVM an input point x can be classified using equations (10), (11), (12) [39]: ( )
∑ ,
(
( ))
(1 )
training
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
( ) ( )
1, { ,
( ))
(
(11)
( ) ( ) (1 ) ,.. is unclassifiable if Eq.(12) is satisfied for more than one class identity, that is: ( ) 1, 1, , To resolve the unclassifiable regions we used the fuzzy membership function described in [37, 39, 40] ( ) 1, ( ) { (1 ) ( ), The membership function of ( )
for class
( )
,
( )
( )
,
Consequently, the class of
is:
is found by:
( )
.
(1 )
(1 )
The extension to nonlinearly separable problems (the case of face identification) is simple using a kernel function K, the new decision function between classes i and j is given by [11]: ( )
( ∑
( ,
)
)
(1 )
,
Where: , are the Lagrange multipliers associated to support vectors and a scalar, these parameters are estimated during the training phase by solving the SVM quadratic optimization problem [43]. ( , ) is a kernel function which can be polynomial, Gaussian or sigmoid. In our work, we used the exponential radial basis function (ERBF) defined as follows: ( , )
e p(
‖
‖
)
(1 )
2.3 Decision tree based fuzzy support vector machines The two-class SVM (or FSVM) permits only the binary classification between two classes. To deal with multi-class problems, many ideas have been suggested, such as the one against-all, the one-against-one and the binary tree based SVM. In the one against all technique, N SVMs are constructed for N classes where each SVM is trained with all the examples and its outputs are generally +1 for its corresponding class and -1 for all the other classes [41]. In the one against one technique, also called the pairwise strategy, classes separation is performed per pairs of classes, an N-class problem involves . ( 1) binary classifiers. The final decision is done by majority vote, that is the class of an input point x is the most voted class by the set of binary classifiers [42]. In the binary tree based technique, SVMs are organized in a binary tree structure; each node makes a binary decision by an SVM. The binary tree structure needs only ( 1) SVMs for N-class problems [41]. In our experiments, we have used this latter strategy based on 3-node tree, each three nodes represent three different classes used by an FSVM for comparison with the requested face [39].
2.4 Binary particle swarm optimization In recent decades, algorithms which are able to self-organize and complete hard tasks in a collective way have been suggested. They imitate the behaviour of social insects, such as PSO, BA, and firefly algorithm (FA) etc. The PSO algorithm is inspired by the social behaviour of birds flocking; starts with a swarm of particles randomly initialized in the search space. Next, at each time step each particle evaluates the quality of its position, saves its personal best visited position according to the objective function being optimized, and saves the global best position found in its neighbourhood. Afterward, particles change their speeds according to their best found positions and the best found positions in their neighbourhood. Then the positions of particles change accordingly. Therefore, each particle represents a potential solution to the problem being optimized. The particle trajectory is influenced by its personal best position and the best position found in its neighbourhood following these rules [46]: ( (
1)
( )
(
1)
(1 )
Where: ( ) : The current position of particle p in dimension i and at iteration t : The particle speed
1) (
( ) )
(
)
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
: Its personal best position : The global best position found in its neighbourhood , : are two constants called the cognitive and social factors; generally = . , : are uniformly distributed values within the range ]0, 1[ : is the inertia weight which controls the balance between exploration and exploitation, in our study we used a decreasing inertia weight from 0.78 to 0.1. In the binary version of PSO (BPSO), the personal best and the global best solutions are updated using the same equations of PSO. The main difference is that in BPSO the particles speeds and positions are restricted within the range [0, 1]. A normalization function as the sigmoid function is generally used to map the real values into the range [0,1] [44]: (
1)
1 1
(
( )
)
And the new position of each particle is updated using the following equation [44]: (
1 1) = {
(
(
1))
( 1)
Where is a random number within the range ]0, 1[ The BPSO algorithm can be summarized as follows where Gbest is the best solution of the entire swarm: Binary Particle Swarm Optimization Initialize randomly binary positions Initialize random velocities repeat for each particle p do evaluate its fitness(p) if fitness(p) < Pbest(p); save fitness (p) save its Pbest end if fitness(p) < Gbest save the Gbest fitness save the global best position (Gbest) end end % for each Update the velocities and positions of particles using Eq. (19) Normalize the particles velocities using Eq. (20) Normalize the particles positions using Eq. (21) Until a stopping criterion is met
2.5 The Bees algorithm The bees algorithm is a recent population based method which imitates the foraging behaviour of the bees colony during the harvest of flowers nectar. A simplified algorithm of the bees algorithm is summarized as follows [45]: The Bees Algorithm 1. Initialize in a random way a population of N bees 2. Evaluate the fitness of each bee Repeat 3. Select M solutions from N bees to intensify research in their neighbourhoods 4. Recruit bees in the neighbourhood of the M selected sites and evaluate their fitness 5. Recruit more bees in the neighbourhood of the E best elite bees among the M sites and evaluate their fitness 6. Select the best bee in each neighbourhood 7. Initialize the rest of bees randomly and evaluate their fitness Until a stopping criterion is met The algorithm starts with N foraging bees randomly placed in the search space. The fitness values of the visited sites are evaluated at step 2. Bees having the highest fitness are selected and the sites they visited are chosen for local search (step 3). Step 4 illustrates the local search phase, recruiting more bees in the neighbourhood of the best selected sites. Bees can be selected according to their fitness values. At Step 6, the best bee in each neighbourhood is selected to form the next population. The remaining bees (N-M) continue their search in a random way to locate promising new sites (step 7). These steps are repeated until the satisfaction of a stopping criterion.
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
The random search in combination with the intensified search in the neighbourhood of the best bees is the key idea of the bees algorithm effectiveness. In addition, this algorithm has an intrinsic parallelism where each bee is only interested with the information available in its neighbourhood; good quality solutions appear indirectly in a collective way without specific programming. 3. The proposed work A face recognition system compares the salient features of a submitted image with features of all the images stored in a database. It comprises three essential modules: face detection, features extraction and the classification module. The face detection module aims at localizing and e tracting a person’s face; in our work, we evaluated the system on static image faces acquired by camera or extracted from well-known databases. The features e traction module e tracts the most discriminant features of each person’s face; the characteristic features used are the Gabor magnitude and the Gabor phase congruency features. The classification module compares the extracted features of a submitted face image with the database images to identify or reject it. The system must also contain a database module to store the new faces with their identities [1]. In this study, two new classifiers, adapted from swarm optimization methods, are proposed to solve the face identification problem. The combination of these two classifiers with an effective DT-FSVM classifier is studied. This work examined also the dimensionality reduction space methods (PCA, LDA and KFA). In the following sections, we explain the proposed face identification methods namely, the decision tree based binary particle swarm optimization classifier (DT-BPSO), the bees algorithm based classifier (BA), and their combination with the DT-FSVM classifier.
3.1 The decision tree BPSO based classifier The tree is one of the most used storage structures in computer science since it permits faster access to information and requires less memory space. Our choice is encouraged by the fact that classifiers based on decision trees are generally faster than the well-known one-versus-all classifiers. In decision trees, the number of training samples decreases in the tree passing from leaves to the root. The proposed work combines the strengths of decision trees with binary particle swarm optimization. The number of leaves of the employed tree is equal to that of classes, its degree is 3, i.e. the maximum number of a leaves node is 3. The tree was implemented using a static array of all possible classes. These classes are then divided into groups of three classes. Afterwards, each group is sent to the BPSO based classifier, the outputs of BPSO classification are then arranged in a new array which will be in turn divided into groups of three classes. The classification process continues recursively until obtaining only one class per array representing the class of the face submitted to the system. This implementation has been inspired by the DT-FSVM classifier [39]. In our work, the decision tree is used in combination with the BPSO based classifier, where the fuzzy rules are maintained and used as described in the DT-BPSO algorithm below. When traversing the tree, the different encountered cases are processed as follows: two or three leaves of the same level will be delivered to the BPSO based classifier. The class of a single leaf is the class of its root. If the node is a root, its class is the label of the corresponding class in the possible classes array. These operations are applied recursively until the final array contains only one value representing the class of the requested face. Fig.1 is an illustrative example of the proposed classifier where the class of the testing face is 8. In the first level, all the classes are candidates for classification, where each group of three classes is sent to the BPSO based classifier. The winner classes of the second level will be arranged in another array to
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
continue classification by groups of three classes until obtaining one candidate class; the latter represents the class of the testing face. The DT-BPSO algorithm can be summarized as follows: DT-BPSO algorithm 1. For each test-sample do 2. Step 1: Set class-array 1 .Nclass 3. Set result-array Zero (1 .Nclass) 4. Step 2: Repeat 5. Indices =1 6. For k=1 : 3 : Nclass do Set sub-group = [the 3 current classes] 7. For i= 1:3 do For j= 1:3 do Classify the test-sample into one of the two classes: subgroup (i) or subgroup (j) using the BPSO classifier End 8. End The result is a 3 by 3 matrix Aij 9. Evaluate the fuzzy membership of the test-sample to the 3 classes of the current subgroup by Eq. (14) : ( ) ( ) , 10. evaluate the most similar class to a test-face by Eq. (15) ( ) ( ) . 11. End %For k 12. result-array (indices) = found-class %The result-array contains classes of the second level 13. indices = indices+1 14. %For the next level put: 15. Class-array = result-array & result-array zero (1 Nclass) 16. Go to step 2 17. Until the result-array contains only one value 18. %This value represents the class of the current test-sample 19. End % for each test-sample
Level 3
8
BPSO
Level 2 2
6
8
BPSO
BPSO
BPSO
Level 1 1
2
3
4
5
6
7
Fig.1. Example illustrating the DT-BPSO principle
8
S.Nebti et al / Swarm and Evolutionary Computation 00 (2016) 000–000
Before the presentation of the BPSO based classifier, we show the used particle structure allowing the algorithm to decide which class the testing sample is most similar to. In the considered BPSO based classifier, each particle is initialized with three training faces (represented by their characteristic features); these three vectors are randomly selected from the training data of the two submitted classes by the already described DT-BPSO algorithm (loop 7–8). We have chosen three classes to allow better distinction between two classes using the final global best found particle. If the final best particle (Gbest position) contains only the values “1” then the testing sample class is clearly the first, otherwise, if the sum of the Gbest elements is < 2, the testing sample class is the second. Randomly selected features Particle position Binary particle
P1 BP1 P2 BP2
F1 1 F2 -1
F2 -1 F1 1
F1 1 F1 1
PN BPN
F1 1
F2 -1
F1 1
Fig.2. the structure of particles positions and their correspondent binary particles
Therefore, the swarm is constituted of PN particles, where each particle contains randomly selected three-feature vectors (F1, F2, F3) from the training faces which are the concatenation of the training faces of the two candidate classes selected by the DT-BPSO algorithm (loop 7–8). Furthermore, each particle has its correspondent binary particle that memorizes the target value of each selected face vector. The particle position is used for comparison against the testing sample, whereas the binary particle is used to save the target values and hence the corresponding classes in the best found particle by the BPSO algorithm. The BPSO based classifier can be seen as a minimization problem driven by a fitness function which is the sum of the Euclidian distances between the current particle and the testing sample features. The fitness of each particle is thus its Euclidean distance from the features of the testing face. For d-dimensional features, suppose the testing face and by the DT-BPSO algorithm. The fitness of a particle p is evaluated as follows: Fitness (p) = √∑
∑
(
)
is the set of training data included in the three selected classes (i=1…3)
(
)
The motivation behind the use of swarm optimization methods in facial recognition is their ability to adaptation, modularity, autonomy, and parallelism [56]. Thus, it is easy to improve DT-BPSO recognition accuracy using other similarity metrics such as the bilinear function and cosine similarity metrics, the Mahalanobis distance metric or the fusion of multi-metrics for improved recognition accuracy [12]. BPSO classification 1. %Initialization 2. Num-class = 3 3. %Initialize the BPSO parameters 4. C1 =C2 =1.49 5. W : a decreasing inertia weight from 0.78 to 0.1 6. Nb-Particles = 20 7. Max-iterations = 30 8. Initialize the particles positions with random values from the training faces of the two candidate classes: subgroup(i)& subgroup(j) specified by the previous DT- BPSO algorithm 9. Initialize their corresponding binary positions: Bposition with “1” for training faces of the first class and “-1” for the training faces of the second class. 10. Initialize randomly the particles velocities 11. For iteration= 1: max-iterations do 12. For p = 1 : Nb-particles do 13. Evaluate the fitness (p) using Eq.(22) 14. If fitness (p) < Pbest(p) 15. Pbest (p)=fitness(p) 16. Pbest-position = current-Bposition 17. End %if 18. If Pbest (p) < Gbest 19. Gbest = Pbest(p); 20. Gbest-position = current-Bposition 21. end %for p 22. for p = 1:Nb-Particles do 23. update particles velocities using Eq. (19) 24. normalize the particle velocities using Eq. (20) 25. normalize the particle positions using Eq. (21)
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.
end %for p end %for iteration % class decision Sum=0 for i =1:Num-class do if Gbest-position(i) =1 Sum =Sum + 1; End End if Sum >=2 test-sample-class = 1 % first class else test-sample-class = -1 % second class End
3.2 The bees algorithm based classifier To resolve the face recognition problem using the bees algorithm (BA), we have adopted a honeybee constituted of a predefined number of face samples randomly selected from the training database Fig.3. . Fig.3. the structure of a honeybee Where
are N randomly selected characteristic features from the training faces.
The BA’s objective is to search for the best training faces which are the most similar to the faces being recognized based on the optimization of a fitness function. This algorithm can be seen as a minimization problem during iterations where the BA minimizes the sum of the Euclidean distances between the randomly selected training faces and the faces being recognized, then saves the honeybee having the lowest fitness function. Thus, during the optimization stages the BA saves the best found faces providing the best fitness. The found faces are then considered as centres of clusters for their identification. The classes providing the minimum value of the sum of squared errors (SSE) between the found cluster centres and the training faces are the final result of classification. The BA based classifier can be viewed as an optimization algorithm which finds the best centres representing the classes of the testing data; the different steps of this classifier are displayed in fig.4. The bees colony initialization Elite bees
Site bees
Bee 1
Bee E+1
Bee M+1
Bee 2
Bee E+2
Bee M+2
Bee E
Bee M
Bee N
The rest of the colony bees
.
- Evaluate each elite bee against a testing face
- Evaluate each site bee against a testing face
- Replace the elite bee by the best found bee in
- Replace the site bee by the best found bee
Evaluate the rest of bees against a testing face
in its neighbourhood
its neighbourhood
- Construct the new colony= new elite bees + new site bees + the rest of bees - Sort the new colony of bees according to their fitness - Reinitialize randomly the rest of bees (M+1: N)
Repeat until a maximum number of iterations
The found labels constitute the classes
The features included in the best bee are used as cluster centres
of the testing data
for their labelling based on the minimum SSE
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
Fig.4. The bees algorithm based classifier The Bees Algorithm for face recognition can be summarized as follows: The bees algorithm based classifier Initialize a swarm of bees with randomly selected feature vectors from the training faces Initialize the size of the neighbourhood patch: ngh= 0.0234 Initialize the number of scout bees, site bees, and of elite bees For each iteration do For each test-face do For i=1: Nb-elite-bees do compute the elite fitness % its distance to the current testing face For j=1: Nb-neigh-elite do create a neighbour bee to the elite bee % for example eliteBee – ngh + 2*ngh*rand(1) compute its fitness If its fitness < fitness-elite replace the elite bee by its neighbour bee end % if end % for j end % for i For i=1: (Nb-elite-bees + 1): Nb-site-bees do compute the site Fitness For j=1: Nb-neigh-site do create a neighbour bee to the site bee compute its fitness If its fitness < fitness-site replace the site bee by its neighbour bee
end % if end % for j end % for i - Initialize the rest of scout bees randomly - compute their fitness - sort the scout bees according to their new fitness values end % for each test-face end % for iteration The best found faces are considered as clusters centres The final results are the classes of training faces providing the minimum value of the SSE between the best found centres and the training faces
4. Classifiers combination Classifier combination is a new alternative which proved its effectiveness in recognition and many other classification tasks. The key idea is the possibility of minimizing classification error through complementary classifiers [47]. Classifier combination can be accomplished in two main ways: combination based on dissimilar classifiers or combination based on complementary features. In our work, we have studied these two strategies. First, through the fusion of the output labels of a DT-FSVM, a DT-BPSO and a BA based classifier supplied with the Gabor magnitude features (Fig.5). Second, through the fusion of the outputs of the same classifier based on Gabor magnitude and Gabor phase congruency features. Furthermore, these two combination strategies have been examined with three reduction space methods PCA, KFA and LDA. In both cases, the majority vote has been adopted as a combination rule.
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
A submitted face
DT-FSVM Template
Comparison with templates
BA Templates
Vote 1
Vote 2
Vote 3
COMBINATION AND FACE IDENTIFICATION
DT-BPSO
Features extraction
DT-BPSO Templates
DT-FSVM
Lda / Pca/ Kfa Lda / Pca/ Kfa Lda / Pca/ Kfa
BA classifier
Gabor magnitude Gabor magnitude
The training faces
Templates generation
Gabor magnitude
Features extraction
Fig.5. Classifiers combination 4.1 The Majority vote The majority vote is a simple idea used to fuse the output labels of a number of classifiers. This rule considers each output as a vote (acceptance or rejection). Hence, the most voted class will be considered the final decision that is, if the majority of classifiers vote for an example in class 1 then the example belongs to class 1, otherwise to class 2 and so on[49,50]. Voting methods employ a threshold that represents the classifiers proportion which must vote the same class for its decision. The class which has received the highest number of votes than the predefined threshold will be the final decision otherwise, it is rejected. In the majority vote, the threshold is 0.5, i.e. the class which has received a number of votes more than half the number of classifiers is the final decision otherwise, it is rejected [58]. The majority vote can be formulated as follows [48]: ( )
{
∑
,
∑
,
. ,
Where: E: matrix of k classifiers’ outputs th for a testing sample x. , : is the j output of classifier ( , , i 1 k, j 1 c) is equal to 1 if the classifier classifies the test sample
.
in the class
(
)
and 0 otherwise
c: number of possible classes k: number of classifiers : a possible class . : a threshold 5 Experiments To show the effectiveness of the proposed algorithms, detailed comparative studies have been conducted on four popular databases namely, ORL, YALE, FERET and UMIST. The ORL database contains 400 face images of 40 different persons where ten aspects for the
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
same person have been taken, such as smiling, eyes closing, adding glasses, varying light etc. [30]. The YALE database contains 165 face images of 15 persons where 11 different expressions and other aspects have been taken for each person, such as happy, sad, sleepy, surprised, glasses, and varying light direction etc. [30]. The FERET is a large database of 1199 individual faces, with 11 images for each individual under changed pose, expression and light variation. The FERET database has been used for many commercial systems evaluation [31, 8]. The UMIST database contains 564 images of 20 persons. Each person is shown in different poses from side view to frontal views [55]. Samples extracted from each database are respectively shown in Fig.6, Fig.7, Fig.8 and Fig.9.
Fig.6. Samples of ORL images
Fig.7. Samples of YALE images
Fig.8. Samples of FERET images
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
Fig.9. Samples of UMIST images
In our experiments, 15 classes of YALE have been considered, 40 classes of ORL and 100 classes selected from FERET (that is 150 faces from YALE, 400 of ORL, and 700 faces from FERET). Half of each dataset was used for training and the other half for testing, except for the FERET dataset where we have employed 300 image faces for training and other 400 for testing in tables 1, 2, 3 and 4, whereas, 400 image faces have been used for training and 400 for testing in tables 5, 6, 7, 8, 9 and 10. We have also utilized the Phdtoolbox implemented by Vitomir Struc [35] for extraction of the Gabor magnitude features, the Gabor phase congruency features, and for space reduction using Linear Discriminant Analysis (LDA), the Principal Component Analysis (PCA) and the Kernel Fisher Analysis (KFA). The used number of principal components in PCA is equal to the number of linearly independent training data. The number of LDA basis vectors is 39, and the dimensionality of KFA subspace is equal to the number of classes -1[35]. We started our experiment with a well-known effective classifier (DT-FSVM) to understand the impact of the employed characteristic features and the reduction space methods on recognition accuracy. Then, we compared and presented the recognition rates of the proposed methods using the best found space reduction algorithms and the best characteristic features. The achieved recognition rates over three independent runs are shown in the following tables: Table 1 Combination of two DT-FSVM using PCA for dimensionality reduction Classifier Gabor Ma + PCA+ DT-FSVM Gabor Pha + PCA+ DT-FSVM Classifier combination
ORL 97.25% 90.00% 96.50%
YALE 96.67% 98.00% 98.67%
FERET 77.14% 73.14% 76.57%
Table 1 shows the obtained recognition rates of one DT-FSVM with Gabor magnitude reduced by PCA, a second DT-FSVM with the Gabor phase congruency reduced by PCA, and their combination based on majority vote. Table 2 Combination of two DT-FSVM classifiers using LDA for dimensionality reduction Classifier ORL Gabor Ma + LDA+DT-FSVM 99.50% Gabor Pha + LDA+DT-FSVM 92.50% Classifiers Combination 99.50%
YALE 100% 97.33% 99.33%
FERET 80.14% 69.29% 77.71%
Table 2 shows the results of a recognition system with the same structure as the one described in table 1, expect that PCA is replaced by LDA as a reduction space algorithm. The purpose of PCA or LDA is to reduce the space dimensionality into a less redundant space with the most discriminant values [51]. Tables 1 and 2 show that the majority vote is unable to improve the recognition rate in the case of two classifiers, because the majority vote is based on the largest number of classifiers voting for the same class. Hence, when the two classifiers do not agree for the same class, the recognition rate tends to decrease because the majority vote classifies only the faces which are voted by the two classifiers at the same time. Table 1 and 2 show also that the Gabor magnitude features give better recognition rates compared to those based on Gabor phase congruency features. Table 2 proves that the LDA reduction space method is able to retain the key informative features and hence greatly enhance recognition compared to the results based on PCA. To further show the influence of the reduction space algorithms, we have conducted other experiments based on Gabor magnitude features reduced by the popular reduction space methods namely, PCA, KFA, and LDA together with DT-FSVM for classification. The obtained recognition rates are listed in Table3. Table 3 Recognition rates using the DT-FSVM classifier with PCA, LDA and KFA Classifier Gabor Ma + KFA+DT-FSVM Gabor Ma+ LDA+DT-FSVM Gabor Ma+ PCA+ DT-FSVM Classifier combination
ORL 98.25% 99.50% 97.25% 99.00%
YALE 96.67% 100% 96.67% 98.67%
FERET 83.43% 80.14% 77.14% 80.14%
Table 4 Recognition rates using the K-nearest neighbour classifier with KFA, LDA or PCA Classifier ORL KFA 94.50% LDA 92.00% PCA 95.50% Classifier combination 95.50%
YALE 92.67% 93.33% 95.33% 94.00%
FERET 74.71% 67.57% 68.14% 70.29%
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
Table 3 shows that the highest recognition accuracy has been achieved when LDA was used to reduce the dimensionality of the Gabor magnitude features. Table 4 shows that omitting the Gabor wavelet features and replacing the DT-FSVM classifier with the K-nearest neighbour classifier results in low recognition accuracy. Table 4 shows also that the combination of more than 2 classifiers can improve accuracy. Moreover, we can see that LDA is not superior to PCA or KFA when used directly on face images without extraction of the Gabor wavelet characteristic features. The recognition rates of the proposed algorithms on ORL, YALE and FERET using different characteristic features and combined in different ways are shown in tables 5, 6, 7, 8, 9 and 10 Table 5 Recognition rates using DT-BPSO classifier DT-BPSO classifier features Gabor Ma + LDA Gabor Pha + LDA LDA Classifier combination
ORL 99.25% 91.75% 92.25% 95.00%
YALE 100% 96.67% 93.33% 99.33%
FERET 94.25% 77.71% 76.43% 78.29%
Table 5 shows the results obtained using DT-BPSO applied to the Gabor magnitude and phase features as well as the face images reduced by LDA. We can see that the Gabor magnitude features are the best for face recognition and the majority vote combination of these three classifiers is unable to improve the recognition rate. The same was observed for the bees algorithm classifier in Table 6. Table 6 Recognition rates using BA classifier Characteristic features Gabor Ma + LDA + Ba Gabor Pha + LDA+ Ba LDA + Ba
ORL 99.00% 84.00% 84.00%
YALE 100% 94.67% 86.67%
FERET 92.75% 68.57% 65.00%
In Table 7 we combined the best found classifiers using the majority vote namely, the DT-FSVM, the DT- BPSO and BA based classifier. In these three experiments, the Gabor magnitude features reduced by LDA are adopted as they have been found effective in enhancing recognition. Further experiments (tables 8, 9) were carried out to investigate the effect of PCA and KFA. The results (tables 7, 8, 9) show that the best reduction space method to reduce the dimensionality of the Gabor magnitude features is LDA, followed by KFA and then PCA. Table 7 Combination of DT-FSVM, DT-BPSO and BA with Gabor magnitude features reduced by LDA Classifier ORL YALE DT-FSVM+ LDA 98.50% 100% DT-BPSO+ LDA 99.00% 100% BA+ LDA 98.50% 100% Classifier combination 98.50% 100
FERET 93.50% 94.25% 92.75% 93.75%
Table 7 shows that the recognition rates of the proposed DT-BPSO classifier are higher than 99% for YALE and ORL and performs better than the powerful classifier DT-FSVM. The BA classifier also provides competitive results. The majority vote is unable to improve accuracy; it provides approximately the average results of the three studied classifiers. Table 8 Combination of DT-FSVM, DT-BPSO and BA with Gabor magnitude features reduced by KFA Classifier ORL YALE DT-FSVM+ KFA 96.50% 93.33% DT-BPSO+ KFA 96.50% 93.33% BA+ KFA 96.50% 93.33% Classifier combination 96.50% 93.33%
FERET 77.00% 77.00% 77.00% 77.00%
Tables 8 and 9 show comparable performance of DT-FSVM, DT-BPSO and BA using KFA or PCA to reduce the dimensionality of the Gabor magnitude features. We can see (table 8) that the recognition rates based on KFA are equivalent for the three discussed classifiers, while those based on PCA show slightly higher recognition accuracy of the DT-FSVM classifier followed by the BA based classifier. Table 9 Combination of DT-FSVM, DT-BPSO and BA with Gabor magnitude features reduced by PCA Classifier ORL YALE DT-FSVM+ PCA 95.00% 94.67% DT-BPSO+ PCA 87.00% 78.67% BA+ PCA 95.50% 89.33% Classifier combination 95.00% 93.33%
FERET 89.75% 59.25% 89.25% 87.25%
Other experiments have been conducted on the UMIST database (tables 10, 11 and 12). In these experiments, classifier combination and the Gabor phase features were not considered as they have been shown unable to enhance recognition accuracy. We have also randomly selected 180 faces for training and another 180 from the UMIST database for testing. Tables 10, 11 and 12 show the recognition rates obtained using LDA, KFA or PCA with or without the Gabor magnitude features. Unlike the previous recognition results which are consistent for different runs since they are derived from the combination of local and global search strategies, the achieved recognition rates on UMIST database vary from one experiment to another, since the training and testing data were randomly selected, because the UMIST database contains faces organized by pose. The preliminary recognition results using the first half of UMIST for training and the second half for testing were unsatisfactory. This can be justified by the fact that classifiers are
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
generally based on comparison of similarities between the testing face and the stored faces in the database. Subsequently, comparing a face taken in a frontal pose with another taken in a side view will not lead to correct classification, since the characteristic features of a turned face change completely compared to those of a face taken in frontal view. We also used the mutual information as an objective function of the proposed classifiers without noticing an improvement. The use of mutual information has made classifiers very slow and time consuming. Moreover, recognition accuracy has slightly decreased compared to those of classifiers based on the sum of the Euclidean distances. Furthermore, we have randomly selected the training faces to allow learning of the different poses of the considered faces. The results included in tables 10, 11, 12 represent the mean, the best, and the standard deviation (STD) obtained over three runs of the considered classifiers. Table 10 Recognition rates of BA, DT-BPSO and DT-FSVM on UMIST with LDA used for dimensionality reduction. Classifier features UMIST Database best mean STD DT-FSVM Gabor Mag + LDA 96.11% 95.18% 0.86 DT-BPSO Gabor Mag + LDA 97.22% 95.00% 2.23 BA Gabor Mag + LDA 97.22% 96.30% 0.62 DT-FSVM LDA 94.44% 92.22% 2.23 DT-BPSO LDA 95.55% 94.81% 0.62 BA LDA 96.67% 94.26% 1.61 Table 11 Recognition rates of BA, DT-BPSO and DT-FSVM on UMIST with KFA used for dimensionality reduction. Classifier features UMIST Database best mean STD DT-FSVM Gabor Mag + KFA 97.22% 96.85% 0.49 DT-BPSO Gabor Mag + KFA 99.44% 97.77% 1.11 BA Gabor Mag + KFA 97.22% 95.92% 1.73 DT-FSVM KFA 99.44 % 96.67% 1,85 DT-BPSO KFA 97.77% 97.03% 0.62 BA KFA 100% 98.67% 0.42 Table 12 Recognition rates of BA, DT-BPSO and DT-FSVM on UMIST with PCA used for dimensionality reduction. Classifier
features
UMIST Database best mean
STD
DT-FSVM
Gabor Mag + PCA
98.33%
96.67%
1.11
DT-BPSO
Gabor Mag + PCA
69.44%
67.04%
1.61
BA
Gabor Mag + PCA
93.89%
92.22%
1.11
DT-FSVM
PCA
95.55%
94.81%
0.62
DT-BPSO
PCA
73.89%
70.93%
2.41
BA
PCA
91.67%
90.74%
1.24
Looking at the percentage of correctly classified faces, we notice that the proposed methods outperform the DT-FSVM classifier when LDA or KFA are used dimensionality reduction. We can say that both BA and DT-BPSO classifiers give relatively similar results, however, the DT-BPSO classifier appears less effective when used with PCA for dimensionality reduction. The proposed methods are competitive to the powerful DT-FSVM classifier. In tables 13 and 14, we compare the results obtained with those of similar works [53, 54] which use swarm intelligence methods to enhance recognition. For a convenient comparison, we decreased the size of testing and training data. In the previous results (tables 1 to 9), we repeated the experiments for three runs and found that the recognition rates were nearly the same. As such, it is unnecessary to show the mean and standard deviation of each recognition rate. In addition, it is difficult to repeat the experiments several times, since the size of the studied databases is large and image processing is time consuming. To better illustrate the behaviour of each algorithm and to allow comparison with earlier works [53, 54], tables 13 and14 show the standard deviation to the mean recognition rate of each algorithm on small subsets from the FERET database. For each case experiments are repeated ten times. Table 13: Rank one recognition rate on a small subset from FERET (three classes for training and three for testing) Classifier features FERET mean best Std DT-FSVM Gabor mag + LDA 100% DT-BPSO Gabor mag + LDA 100% BA Gabor mag + LDA 99.17% 100% 1.49 DT-FSVM LDA 83.33% DT-BPSO LDA 83.33% BA LDA 83.33% DT-FSVM KFA 83.33% DT-BPSO KFA 83.33% BA KFA 83.33% DT-FSVM PCA 83.33% DT-BPSO PCA 79.17% 83.33% 4.16
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000 BA
PCA
83.33%
100
Recognition rate in %
90 80 70 60 50 PCA LDA KFA
40 30 10
20 30 40 50 PCA, LDA or KFA dimension
60
70
Fig.10. DT-BPSO classifier on YALE database 100
90 85 80 PCA LDA KFA
75 70 20
40
80 100 120 140 60 PCA, LDA or KFA dimension
180
160
Fig.11. DT-BPSO classifier on ORL database
Recognition rate in %
100 90 80 70 60 50 40 30
PCA LDA KFA
20 10 10
15
30 20 25 35 40 PCA, LDA or KFA dimension
45
50
55
Fig.12. DT-BPSO classifier on FERET database 100
Recognition rate in %
Recognition rate in %
95
95 PCA LDA KFA
90 85 80 75 70
20
40
80 60 100 120 140 160 Fig.13. DT-BPSO classifier on UMIST database PCA, LDA or KFA dimension
Recognition rate in %
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
100 98 96 94 92 90 88 86 84 82 80
PCA LDA KFA 10
20
30 40 50 PCA, LDA or KFA dimension
60
70
Recognition rates in %
Fig.14. DT-FSVM classifier on YALE database 100 98 96 94 92 90 88 86
PCA LDA KFA
84 82 80 20
40
60 80 100 120 140 PCA, LDA or KFA dimension
160
180
Fig.15. DT-FSVM classifier on ORL database 100
Recognition rate in %
90 80 70 60 50 PCA LDA KFA
40 30 5
10
15
35 40 20 25 30 45 PCA, LDA or KFA dimension
50
55
Fig.16. DT-FSVM classifier on FERET database
Recognition rate in %
100 99 98 97 96 95 94 93
PCA LDA KFA
92 91 20
40
60 80 100 120 140 160 PCA, LDA or KFA dimension Fig.17. DT-FSVM classifier on UMIST database
S.Nebti et al / Swarm and Evolutionary Computation 00 (2016) 000–000
Recognition rate in %
100 95 90 85 80
75 70 65 PCA LDA KFA
60 55 4
6
8 10 Number of classes
12
14
Recognition rates in
Fig.18. DT-BPSO classifier on YALE database; recognition rate vs the number of classes 100 98 96 94 92 90 88 86 84 82 80
PCA LDA KFA 5
40 20 25 30 35 Number of classes Fig.19. DT-BPSO classifier on ORL database; recognition rate vs number of classes 10
15
Recognition rate in %
100 95 90 85
80 75
70
PCA LDA KFA
65 60 55 50
10
15
20
25
30 35 40 Number of classes
45
55
50
60
Fig.20. DT-BPSO classifier on FERET database; recognition rate vs number of classes
Recognition rate in %
100 95 90 85 80 75 70 65 60
PCA LDA KFA
55 4
6
8
10 12 14 Number of classes
16
18
Fig.21. DT-BPSO classifier on UMIST database; recognition rate vs number of classes
20
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000 Table 14 Rank one recognition rate on a small subset from FERET (15 classes for training and 15 for testing) Classifier features FERET mean best Std DT-FSVM Gabor mag + LDA 98.33% DT-BPSO
Gabor mag + LDA
BA
Gabor mag + LDA
97.50%
98.33%
0.83
98.33%
DT-FSVM
LDA
DT-BPSO
LDA
81.67%
BA
LDA
80.00%
DT-FSVM
KFA
76.67%
DT-BPSO
KFA
76.67%
BA
KFA
76.67%
DT-FSVM
PCA
DT-BPSO
PCA
BA
PCA
79.59%
81.67%
1.67
75.00% 58.67%
61.67%
4.12
71.67%
Tables 13 and 14 show an unstable behaviour of DT-BPSO compared to DT-FSVM and BA based classifier due to variation in recognition rate over a number of runs. However, the standard deviation is unnoticeable, and its best recognition rate is always equivalent to the best rate achieved by its competitor (DT-FSVM). Table 13 indicates that the obtained recognition rates on FERET database are significantly better than the best recognition rate (72.58 %) obtained using bacteria foraging optimization based approach (BFO) [53] in the case of three training classes and three testing classes. Table 7, also, shows that a 100% recognition rate on YALE database is reached by all the proposed algorithms which is greatly higher than the best recognition rate at rank one on YALE faces (88.57%, 86.47 %) achieved by BFO based approach [53, 54]. However, it is difficult to make a precise comparison due to differences in characteristic features, percentage of training and testing faces, and reduction space methods. Moreover, our comparison is only based on the best rank one recognition rate which is the percentage of the correctly classified faces. Table 15 Recognition rates of BA and QPSO using PCA Classifier BA QPSO
ORL 95.00% 39.00%
Table 16 Recognition rates of BA and QPSO using LDA Classifier BA QPSO Table 17 Recognition rates of BA and QPSO using KFA Classifier BA QPSO
YALE 89.33% 67.33% 4.66
UMIST 92.22% 1.11 50.27% 5.83
ORL 99.00% 9700% 1.33
YALE 100 98.67% 1.33
UMIST 96.30% 0.62 93.61% 8.3
ORL 96.00% 91.00% 1.05
YALE 93.33% 95.33% 6.33
UMIST 95.92% 9750%
1.66
1.73 2.8
We can observe from Fig.10 to Fig.21 that LDA is the most efficient space reducer, followed by KFA and then PCA. However, KFA is superior to LDA in the case of UMIST database. The UMIST database is completely different from the other studied databases (YALE, ORL, and FERET). In UMIST database, the faces are significantly turned which result in completely dissimilar characteristic features for the same face. As such, the UMIST data is highly correlated, that is why KFA performs better than LDA on UMIST data [39]. Therefore, KFA space reducer could be preferable to circumvent the pose problem.
S.Nebti et al / Swarm and Evolutionary Computation 00 (2016) 000–000
Recognition rate
1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8
PCA LDA KFA 25
20
15
35 40 45 30 PCA, LDA or KFA dimension
50
55
Fig.22. BA classifier on YALE database; recognition rate vs PCA, LDA or KFA dimension 1
Recognition rate
0.95 0.9 0.85 PCA LDA KFA
0.8 0.75 20
40
120 140 60 80 100 PCA, LDA or KFA dimension
160
Recognition rate in%
Fig.23. BA classifier on ORL database; recognition rate vs PCA, LDA or KFA dimension 100 95 90 85 80 75 70 65 60 55 20
40
60 80 100 120 140 PCA, LDA or KFA dimension
PCA LDA KFA 160
Fig.24. BA classifier on UMIST database; recognition rate vs PCA, LDA or KFA dimension
From Fig.10 to Fig.17, we can see that a large number of principal components of PCA, LDA, or KFA can increase recognition accuracy, especially in the case of large databases such as ORL, FERET and UMIST. Therefore, a very small number of principal components lead to under-fitting resulting in low recognition accuracy. Fig.10 to Fig.17 show also that DT-BPSO and DT-FSVM give comparable results. For Fig.18 to Fig.21, the number of principal components is equal to the number of classes –1. From these figures, we can see that the recognition rates are not dependent to the number of classes. It is also observed that the DT-BPSO recognition rates change with respect to the number of classes; the highest recognition rates are obtained with number of classes for
1
Recognition rate
0.9 0.8 0.7 0.6
Qpso-Pca Ba-Pca Qpso-Lda
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000
Fig.25. comparing BA and QPSO on YALE database 1 Recognition rate
0.9 0.8
0.7
Qpso-Pca Ba-Pca Qpso-Lda Ba-Lda Qpso-Kfa Ba-Kfa
0.6 0.5
0.4 5
10
15
20
25 30 iterations
35
40
45
50
Fig.26. comparing BA and QPSO on ORL database 1 Recognition rate
0.9 0.8 Qpso-Pca Ba-Pca Qpso-Lda Ba-Lda Qpso-Kfa Ba-Kfa
0.7 0.6 0.5 0.4 0.3 0.2 5
10
15
20
25 30 iterations
35
40
45
50
Fig.27. comparing BA and QPSO on UMIST database
PCA, and with number of classes approximately for LDA and KFA. This could be explained by the fact that DT-BPSO is a binary classifier which makes distinguishing between numerous classes difficult and consequently decreases the recognition rate. Figures (Fig.25, Fig.26, and Fig.27) show the recognition rate vs. the number of iterations using BA and a QPSO based classifier. (Tables 15 –17) report their corresponding recognition rates at rank 1. The QPSO based approach is a quantum particle swarm optimization based classifier which follows the same principle of BA that is, it has the same objective function as BA and the same structure of bees for particle solutions. The QPSO algorithm used in our experiment is described in detail in [57].
S.Nebti et al / Swarm and Evolutionary Computation 00 (2016) 000–000
From Tables15–17 and Figures (Fig.25, Fig.26, and Fig.27), it is clear that the recognition rate of the BA based classifier has significantly improved compared to QPSO classification. Both classifiers have the best recognition rate when LDA is used for dimensionality reduction, and both give better recognition accuracy with KFA than with PCA. However, the BA based classifier takes more time than QPSO or the proposed DT-BPSO, because it is based on local exploitation combined with global exploration. In BA, the search intensifies in the neighbourhood of each potential solution; this search is greater around the found elite solutions. From the previous experimental results, we observed that the best recognition rates are obtained when using the Gabor magnitude features reduced by LDA over KFA or PCA. Thus, the characteristic features combined with a specific reduction space method could potentially enhance the recognition performance. All the proposed methods are competitive to the efficient DT-FSVM, while the DT-BPSO classifier outperforms DT-FSVM when applied on YALE, ORL or FERET faces (see table 7). This is due to DT-BPSO ability to escape local optima by searching a global solution while DT-FSVM is influenced by its parameters initialization. The obtained results prove the efficiency and robustness of the proposed algorithms over face expression and illumination variations which exist in the databases considered. It has been found that DT-BPSO and BA outperform DT-FSVM when used with LDA or KFA, they only give less accurateness in the case of PCA; DT-BPSO and BA are global search algorithms which are more sensitive to dimensionality of input vectors than the DT-FSVM, this latter is based on an exact search algorithm which is the quadratic programing. We can say that DT-BPSO and BA are comparable in terms of accuracy, while the BA based classifier is more robust over a number of experiments especially in the case of PCA reduction space method. The best recognition rates are obtained when using the Gabor magnitude as characteristic features in combination with LDA as a reduction space method. Furthermore, this study shows that the recognition accuracy rates of the proposed algorithms (DT-BPSO, BA) are higher than those based DT-FSVM [39] and on bacteria foraging optimization (BFO) [53, 54]. Face recognition accuracy is dependent on three related steps namely, features extraction, dimensionality reduction and classification. The BFO based approach focuses only on features selection particularly, how to find the optimal principal components using BFO and classification is based on LDA [53, 54], whereas, the proposed work adapted recent swarm optimizers for classification. In addition, the recognition rate has been investigated using different characteristic features and three reduction space methods. It has been found that PCA is not the best space reducer even when changing the number of principal components. The best results have been found when using LDA for space dimensionality reduction. This is the reason why the proposed methods outperform the BFO based approach. The BFO features selection can be used to enhance the robustness of the proposed classifiers. The DT-BPSO recognition rate (see Table 7) on the entire ORL dataset is 99%, which is significantly better than those based on DTFSVM (98.62 %) on a small subset from ORL [39], this is because the DT-FSVM based approach employs reformative fuzzy LDA (RFLDA) for features extraction, whereas, DT-BPSO employs the Gabor magnitude features which are more efficient than RF-LDA features.
6. Conclusion This paper addressed the main challenges encountered in facial recognition particularly, illumination, pose and facial expression variations by using swarm intelligence based methods. Unlike recent methods using swarm optimization techniques for features selection or parameters optimization, this work employed bio-inspired methods to construct new effective classifiers. Therefore, an adaptation of a binary particle swarm optimizer (BPSO) and the bees algorithm (BA) has been proposed for face identification. The BPSO based approach solves an N-face identification problem through (N-1) BPSO based classifiers organized in a structure of 3-node binary decision tree for lower computing time. While the BA based classifier uses the bees algorithm to search for the most similar features in a database to the submitted face features. The proposed classifiers have been applied to different characteristic features in combination with different reduction space methods namely, the Gabor magnitude and Gabor phase congruency reduced by PCA, LDA or KFA. In addition, various combinations of DT-FSVM, DT-BPSO and BA classifiers based on the majority vote were examined. It has been found that the Gabor magnitude features contribute to better accuracy compared to the Gabor phase congruency features. However, PCA and KFA reduction space methods in combination with the Gabor wavelet features cannot guarantee the best recognition rate compared to LDA. Moreover, the experimental results on YALE, ORL, FERET and UMIST databases were promising. The DT-BPSO method provided a recognition rate of 99.00% on ORL, 100% on YALE, 94.25% on 700 FERET images, and 99.44% as best rate on UMIST database. These results have been found superior than those of similar works. Finally, DT-BPSO and BA based classifiers can be applied on low- quality images extracted from video streams. They also can be used for hand gesture recognition. Moreover, DT-BPSO and BA based classifier may provide better accuracy based on other classifier combination rules, or binary bees algorithm for features selection. This is due to the fact that the redundant data impacts negatively on the performance of the recognition system. In addition, DT-BPSO may be improved using the Hamming distance in the objective function, since it is more appropriate for high dimensional data. References [1] [2] [3]
J.A. Unar, W.C. Seng, A. Abbasi, A review of biometric technology along with trends and prospects, Pattern Recognition, 47 (8) (2014),2673– 2688. G.C. Luh, C.Y. Lin, PCA based immune networks for human face recognition, Applied Soft Computing. 11 (2011) 1743-1752. Y. Sumi, Y. Ohta, Human Face Analysis Based on Distributed Two- Dimensional Appearance Models, Systems and Computers in Japan, 27(7) (1996) 97-108.
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000 [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44]
R.S. Ghiass, O. Arandjelovi, A. Bendada, X. Maldague, Infrared face recognition: A comprehensive review of methodologies and databases, Pattern Recognition, 47 (2014) 2807-2824. N. Amani, A. Shahbahram, M. Nahvi, A New Approach for Face Image Enhancement and Recognition, International Journal of Advanced Science and Technology, 52 (2013) p1. S. Du, R. Ward, Face recognition under pose variations, Journal of the Franklin Institute 343 (2006)596-613. L. Zhang, W. Zuo, D. Zhang, LSDT: Latent Sparse Domain Transfer Learning for Visual Adaptation, IEEE Transactions on Image Processing, vol. 25 (3) (2016)1177-1191. L.Zhang, F.Tian, C.Kadri, G.Pei, H.Li, L.Pan, Gases concentration estimation using heuristics and bio-inspired optimization models for experimental chemical electronic nose, Sensors and Actuators B160(2011)760–770. L.Zhang, F.Tian, Performance Study of Multilayer Perceptrons in a Low-Cost Electronic Nose, IEEE Transactions on Instrumentation and Measurement, 63 (7) (2014)1670-1679. L.Zhang, F.Tian, H.Nie, L.Dang, G.Li, Q.Ye, C.Kadri, Classification of multiple indoor air contaminants by an electronic nose and a hybrid support vector machine, Sensors and Actuators B 174 (2012) 114– 125. H.Chen, Q.Wang, Y.Shen, Decision tree support vector machine based on genetic algorithm for multi-class classification, Journal of Systems Engineering and Electronics, 22 (2) (2011)322–326. L. Zhang, D. Zhang, MetricFusion: Generalized metric swarm learning for similarity measure, Information Fusion, 30 (2016) 80-90 R.Jafri, H.R.Arabnia, Survey of Face Recognition Techniques, Journal of Information Processing Systems, 5(2) (2009)41-68. M.Turk, A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience, 3 (1) (1991) 71-86. P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE Transactions on pattern analysis and machine intelligence, 19 (7) (1997)711-720. C.Y. Zhang, Q. Ruan, Face Recognition Using L-Fisherfaces, Journal of Information Science and Engineering, 26 (2010)1525-1537. S. Suhas, A. Kurhe, Dr.P. Khanale, Face Recognition Using Principal Component Analysis and Linear Discriminant Analysis on Holistic Approach in Facial Images Database, IOSR Journal of Engineering e-ISSN:2250-3021, ISSN: 2278-8719, 2 (12) (2012) 15-23. K. Jyotsna, N. Chaubey, K. Durga, U. Baruah, Face Recognition using Support Vector Machine, International Journal of Emerging Technology and Advanced Engineering, ISSN 2250-2459, ISO 9001:2008 Certified Journal, 4 (3) (2014) 585-588. R. Garg, I.S. Rajput, Review on Local Binary Pattern For Face Recognition. International Journal of Advanced Research in Computer Science and Technology (IJARCST 2014), ISSN: 23478446, 2(2) (2014) 201-204. P. Corcoran, C. Iancu, Hidden Markov Models in Automatic Face Recognition-A Review, Reviews, Refinements and New Ideas in Face Recognition, Dr. Peter Corcoran (Ed.), ISBN: 978-953-307-368-2, (2011) InTech, Available from: www.intechopen.com. N.L. Ajit Krisshna, V. Kadetotad Deepak, K. Manikantan, S. Ramachandran, Face recognition using transform domain feature extraction and PSObased feature selection, Applied Soft Computing 22 (2014) 141–161. R. M. Ramadan, R. F. Abdel – Kader, Face Recognition Using Particle Swarm Optimization-Based Selected Features, International Journal of Signal Processing, Image Processing and Pattern Recognition, 2 (2) (2009) 51-66. P.V. Shinde, B.L. Gunjal, R. G. Ghule, Face Recognition Using Particle Swarm Optimization, Emerging Trends in Computer Science and Information Technology -2012(ETCSIT2012) Proceedings published in International Journal of Computer Applications® (IJCA) 2 (2) (2012)1113. H. Yin, J-Q. Qiao, P. Fu, X-Y. Xia, Face Feature Selection with Binary Particle Swarm Optimization and Support Vector Machine, Journal of Information Hiding and Multimedia Signal Processing, ISSN 2073-4212, 5 (4) (2014)731–739. R.Jakhar, N.Kaur, R.Singh, Face Recognition Using Bacteria Foraging Optimization-Based Selected Features, International Journal of Advanced Computer Science and Applications, Special Issue on Artificial Intelligence,( 2011)106-111. M. K. Naik, R. Panda, A novel adaptive cuckoo search algorithm for intrinsic discriminant analysis based face recognition, Applied Soft Computing 38 (2015) 661–675. J. Wei, Z. Jian-qi, Z. Xiang, Face recognition method based on support vector machine and particle swarm optimization, Expert Systems with Applications 38 (2011) 4390–4393. P. K. Banerjee , Asit K.Datta , A preferential digital optical correlator optimized by particle swarm technique for multi-class face recognition, Optics & Laser Technology 50(2013)33–42. R. Raghavendra , B. Dorizzi , A. Rao , G.H. Kumar , Particle swarm optimization based fusion of near infrared and visible images for improved face verification. Pattern Recognition, 44 (2) (2011) 401–411. M. Aly, Face Recognition using SIFT Features, CNS186 /Bi/EE Project report (2006). K. Delac, M. Grgic, S. Grgic, Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set, International Journal of Imaging Systems and Technology, 15 (5) (2006) 252-260. A. Bhuiyan, C.H. Liu, On Face Recognition using Gabor Filters, World Academy of Science, Engineering and Technology. 28 (2007). issn 13076884 , 22 (2007)51- 56 . V. Struc, N. Paveic, Gabor-Based Kernel Partial-Least-Squares Discrimination Features for Face Recognition, Informatica (Vilnius), 20(1) (2009) 115-138. V. Struc, N. Paveic, The Phase-Based Gabor Fisher Classifier and its Application to Face Recognition Under Varying Illumination Conditions, 2nd International Conference on Signal Processing and Communication Systems. ICSPCS (2008) 1- 6. V. Struc, N. Paveic, The Complete Gabor-Fisher Classifier for Robust Face Recognition, EURASIP Advances in Signal Processing, volume 2010, 26 pages, doi:10.1155/2010/847680. http://luks.fe.uni-lj.si/sl/osebje/vitomir/face_tools/ PhDface/refs.html T.Y. Wang, H.M. Chiang, One-against-one fuzzy support vector machine classifier: An approach to text categorization, Expert Systems with Applications, 36 (2009)10030-10034. S. Abe, T. Inoue, Fuzzy Support Vector Machines for Multiclass Problems,ESANN’ proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 24-26 April 2002, d-side publi., ISBN 2-930307-02-1, (2002) 113-118. F. Lauer, C.Y. Suen, G. Bloch A trainable feature e tractor for handwritten digit recognition,” Pattern Recognition, 40 (2007) 1816-1824. X. Song, Y. Zheng, X. Wu, X. Yang, J. Yang, A complete fuzzy discriminant analysis approach for face recognition, Applied Soft Computing 10 (2010) 208-214. D. Tsujinishi, S. Abe, Fuzzy least squares support vector machines for multiclass problems, Neural Netw. 16(5-6), (2003) 785-92. S. Zhai, T. Jiang, A novel sense-through-foliage target recognition system based on sparse representation and improved particle swarm optimization-based support vector machine, 46 (10) (2013) 3994–4004. J.Milgram, M.Cheriet, , R. Sabourin, One Against One” or ”One Against All”: Which One is Better for Handwriting Recognition with SVMs?. Guy Lorette. Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule (France). Suvisoft.
( 2006). O. Toygar, A. Acan, Multiple classifier implementation of a divide-and- conquer approach using appearance-based statistical methods for face recognition, Pattern Recognition Letters. 25 (2004) 1421-1430. M. A. Khanesar, M. Teshnehlab, M. A. Shoorehdeli, A Novel Binary Particle Swarm Optimization, MED '07. Mediterranean Conference on Control & Automation, ISBN: 978-1-4244-1282-2, (2007) 1 – 6.
S.Nebti et al. / Swarm and Evolutionary Computation 00 (2016) 000–000 [45] D. T. Pham, A. Ghanbarzadeh, E. Koc, S. Otri, S. Rahim, M. Zaidi,The Bees Algorithm, A Novel Tool for Complex Optimisation Problems , In Proceedings of the 2nd International Virtual Conference on Intelligent Production Machines and Systems (IPROMS 2006) (2006) 454-459. [46] D.P.Rini, S. M. Shamsuddin, S. S. Yuhaniz, Particle Swarm Optimization: Technique, System and Challenges, International Journal of Computer Applications). 14 (1) (2011)19-27. [47] S. Tulyakov, S. Jaeger, V. Govindaraju, D. Doermann, Review of classifier combination methods. In: Simone Marinai, H.F. (Ed.), Studies in Computational Intelligence: Machine Learning in Document Analysis and Recognition, (2008) 361-386. [48] K. H. Zouari "Contribution a l’evaluation des methodes de combinaison parallele de classificateurs, Ph.D. Thesis, Rouen university, 2004. [49] L.I. Kuncheva, Clustering-and-selection model for classifier combination. In Proc. Knowledge-Based Intelligent Engineering Systems and Allied Technologies, Brighton, UK, (2000) 185-188. [50] H. Zouari, L. Heutte, Y. Lecourtier, A. Alimi, Un panorama des méthodes de combinaison de classificateurs en reconnaissance de formes. RFIA’ , Angers, France, 2 (2002) 499-508. [51] K. Torkkola, Linear Discriminant Analysis in Document Classification, In IEEE International Conference on Data Mining (ICDM) Workshop on Text Mining, ICDM 2001, San Jose, USA, (2001) 800-806. [52] L. Wiskott, J.M. Fellous, N. Krger, C.V. Malsburg, Face Recognition by Elastic Bunch Graph Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence. 19 (7) (1997) 775-779. [53] R. Panda, M.K. Naik, A novel adaptive crossover bacterial foraging optimization algorithm for linear discriminant analysis based face recognition, Applied Soft Computing, 30 (2015) 722–736. [54] R. Panda, M.K. Naik, B.K. Panigrahi, Face recognition using bacterial foraging strategy, Swarm and Evolutionary Computation. 1 (2011) 138–146. [55] https://www.sheffield.ac.uk/eee/research/iel/research/face [56] E.Cuevas, M.Cienfuegos, D.Zaldívar, M.Pérez-Cisneros, A swarm optimization algorithm inspired in the behavior of the social-spider, Expert Systems with Applications, 40 (16) (2013)6374-6384. [57] Coelho, L.S. 'Novel Gaussian quantum-behaved particle swarm optimizer applied to electromagnetic design', Science, Measurement & Technology, IET, 1 (5) (2007) 290-294. [58] L.I. Kuncheva. “Combining Pattern Classifiers. Methods and Algorithms”. John Wiley & Sons, New Jersey,ISBN: 0-471-21078-8, 2004, 376 page.