Accepted Manuscript
Optimized Symmetric Partial Facegraphs for Face Recognition in Adverse Conditions Badr Lahasan, Syaheerah Lebai Lutfi, Ibrahim Venkat, Mohammed Azmi Al-Betar, Ruben ´ San-Segundo PII: DOI: Reference:
S0020-0255(16)31663-2 10.1016/j.ins.2017.11.013 INS 13241
To appear in:
Information Sciences
Received date: Revised date: Accepted date:
15 November 2016 31 October 2017 8 November 2017
Please cite this article as: Badr Lahasan, Syaheerah Lebai Lutfi, Ibrahim Venkat, Mohammed Azmi Al-Betar, Ruben ´ San-Segundo, Optimized Symmetric Partial Facegraphs for Face Recognition in Adverse Conditions, Information Sciences (2017), doi: 10.1016/j.ins.2017.11.013
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Optimized Symmetric Partial Facegraphs for Face Recognition in Adverse Conditions Badr Lahasana , Syaheerah Lebai Lutfia,∗, Ibrahim Venkata , Mohammed Azmi Al-Betarb , Rub´en San-Segundoc a School
of Computer Sciences, Universiti Sains Malaysia, 11800 USM, Malaysia of Information Technology, Al-Huson University College, Al-Balqa Applied University, Jordan c Grupo Technolog´ıa del Habla, E.T.S.I. Telecomunicacin (ETSIT) Universidad Polit´ ecnica de Madrid (UPM), Spain
CR IP T
b Department
Abstract
AN US
In this paper, we propose a memetic based framework called Optimized Symmetric Partial Facegraphs (OSPF) to recognize faces prone to adverse conditions such as facial occlusions, expression and illumination variations. Faces are initially segmented into facial components and optimal landmarks are automatically generated by exploiting the bilateral symmetrical property of human faces. The proposed approach combines an improved harmony search algorithm and an intelligent single particle optimizer to take advantage of their global and local search capabilities. Basically, the hybridization version aids to compute the optimal landmarks. These landmarks further serve as the building blocks to intuitively construct the partial facegraphs. The efficiency of the proposed approach has been investigated in addressing the facial occlusion problem when only one exemplar face image per subject is available using comprehensive experimental validations. The proposed approach yields improved recognition rates when compared to recent state-of-the-art techniques.
1. INTRODUCTION
ED
M
Keywords: Face recognition, Facial occlusion, Symmetric partial facegraphs, Harmony search algorithm, Intelligent single particle optimizer
AC
CE
PT
Face Recognition Systems (FRS) that aim to recognize faces captured in typical operational environments, need to handle facial occlusions apart from other challenges such as facial expressions, varying illumination conditions, scaling and so on [11, 44]. The uncertainty imposed by these real world intricacies throw another potential challenge to FRS when the registered frontal face images in the face database, used for training any proposed Face Recognition (FR) algorithm, could offer only Single Sample Per Object Class (SSPOC). Further, automated FRS would aim to recognize faces without relying on ground truth data that represent the locations of selected facial landmarks [27]. Different techniques have been proposed to deal with these challenges of face recognition. These techniques can be classified based on their methodologies in handling occlusions problem into two types: holistic and part-based (also known component-based). In the holistic approach, the face image is treated as the whole entity, while, in the part-based approach, only the local regions are considered [3]. Moreover, approaches under this category can be further classified based on the recognition process into three classes namely: local matching approach, reconstruct approach and detect-discard approach [23]. In local matching approach [8, 24], the face image is first subdivided into small patches. Thereafter, each patch is manipulated in isolation. In the reconstruction based approaches [43], the occlusions are tackled as a reconstruction problem where the occluded probe face image is reconstructed using a linear combination of gallery images. In the case of detect-discard approaches [35], detected regions that are perceived to be occluded in a given face image are discarded in the recognition process. Another class of prominent techniques that have attained promising performance in the face recognition domain are graph-based and deep learning methods. ∗ Corresponding author at: School of Computer Sciences, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia. E-mail address:
[email protected] (S. L. Lutfi)
Preprint submitted to Information Sciences
November 13, 2017
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
CR IP T
Although deep learning methods have a promising recognition rate but they are not suitable for SSPOC due to the fact that they require many images per subject for the training phase[15]. The graph-based techniques which represent the face image through graphs are regarded as the most effective face recognition approaches in terms of SSPOC [42]. Representing face images using the notion of graphical models are basically formulated by the fusion of two paradigms viz., graph theory and image processing have proven to be ideal to recognize faces using SSPOC in the 2D FR as well as other challenges inherent in the 3D FR [5, 30]. Such graphical models intend to represent each face using graphs whose nodes rely on the locations of several facial landmarks such as eyes, eyebrows, nose, mouth, jaw and chin. These nodes contain image features derived from regions around the landmarks. Based on the above discussions for the two potential issues namely occlusion and SSPOC inherent on current face recognition, clearly it can be seen that part-based approaches are considered to be more effective in recognizing faces under adverse conditions. However, among the part-based approaches, reconstruction-based approaches have been proved to be effective in the presence of large occlusions. Detection and discarding of the occluded areas often result in loss of vital information, which may give discriminative results and add bias to the overall recognition rate. Local matching approaches suffer from difficulty in selecting suitable local facial features. However, by exploiting some natural properties inherent on human faces such as bilateral symmetry [16, 36] via strategically deploying metaheuristic techniques seems to overcome such potential issues. Meta-heuristic optimization algorithms are considered efficient for handling several hard optimization problems due to their powerful and adaptable search abilities. Based on the number of solutions treated during the search, they are classified into population based and local search algorithms [6]. Population-based algorithms begin with a set of initial candidate solutions referred to as a population while the local search algorithms only treat single solution at a point of time. Memetic algorithms combine several characteristics between global search and local search strategies. Population-based algorithms are efficient in scanning several search spaces simultaneously. However, it is inefficient in finding a local optimal solution entailed in each search region. Conversely, the local search based algorithm is cohesive in fine tuning the search space region and finding a discrete or local optima, by searching through a trajectory in a single search space region without performing a wider scan over other regions. In order to design an efficient metaheuristic algorithm, the advantages of both the population based and local search-based algorithms can be strategically combined. A memetic-based algorithm has the ability to effectively integrate the wide range exploration mechanism and the local discrete exploitation mechanism of a search space in a single algorithmic design. Literature reveals that, there are many approaches that deploys memetic based techniques or hybridization of meta heuristic techniques for face recognition[49, 21]. Memetic algorithms (MA) have been gaining momentum to the face recognition domain by selecting the best facial features. The success of MA rests on the hybridization of a local search-based technique with a population-based optimization technique. The MA is often proposed to strike a balance between exploration and exploitation of the search space complementing the advantages of population-based and local search-based methods [2]. Intelligent Single Particle Optimizer (ISPO) presented by Ji et al. [18] is a local search technique where it treats only one particle at a time and can be considered as a special extension of the classical particle swarm optimization family [48]. This particle optimizer has the ability to increase its velocity towards the global optimum prevailed in the local search subspaces rather than being trapped in local optima. In ISPO, the velocity and position of the particle are updated via an intelligent approach. For example, particle velocity increases with improved fitness value, but decreases when the particle skips the optimum value. Also, after several iterations the fitness value does not get any improvement, the particle velocity will increase so as to jump from the local optimum. ISPO has been effectively utilized to decipher the DNA sequence compression problem [50] and performs well in unraveling some complex multimodal problems [18]. On the other hand, Harmony search algorithm (HSA) is a meta-heuristic population based algorithm, which was proposed by Geem in 2001 [14]. It derived inspirations by the music improvisation process [14] and characterized by several advantages as a result of its derivative-free characteristics [13].Therefore, HSA has been successfully applied to resolve numerous optimization problems such as timetabling [2], face recognition [25, 24], image processing [26], bio-informatics [1], and other potential problems as reported in [31]. However, similar to other meta-heuristic population-based search techniques, the performance of the harmony search can be enhanced by means of hybridizing it with other local search-based algorithms, for example ISPO. The above mentioned facts justifies that the memetic strategy which synergises the global search process with an appropriate local search mechanism can be considered as a robust approach to solve many potential problems [50, 4]. To this end, an efficient mechanism to formulate and 2
ACCEPTED MANUSCRIPT
automatically select the optimal facial landmarks by deploying a component level sub-graph is proposed in this paper. Furthermore a memetic based framework called Optimized Symmetric Partial Facegraphs OSPF has been intuitively introduced in this contribution. Basically, it utilizes the exploration capabilities of the HS to ensure the wider scanning of the search regions. Importantly, it also has the exploitation ability of the ISPO to ensure that a deeper search can be carried out in each of the search space regions as navigated by the HS based optimization mechanism. The main contribution of this paper is to introduce a memetic based framework called OSPF to recognize faces prone to occlusion coupled with other variations such as expression and illumination using SSPOC, where the subgoals can be summarized as follows:
AN US
CR IP T
1. Formulation of partial facegraphs by exploiting the bilateral symmetrical property of human faces. Such partial facegraphs do not rely on any specific pre-defined facial landmarks such as eyes, eyebrows, mouth and so on and hence paves way to the automation of face recognition techniques. 2. Intuitive formulation of the automatic optimal landmark generation by deploying memetic framework which combines the Harmony Search Algorithm (HSA) and the Intelligent Single Particle Optimizer (ISPO). Modeling and investigating the process of face recognition using the OSPF, an optimization technique, would motivate researchers from the optimization domain to contribute towards the biometric research domain. 3. Investigation of the proposed OSPF approach to recognize partially occluded frontal face images complying to the potential constraint of Single Sample Per Object Class (SSPOC). 4. Investigation of the proposed OSPF approach to recognize faces prone to different facial expressions in presence of partial occlusions.
M
The performance of the proposed OSPF is investigated using four face databases including AR [32], large Face Recognition Grand Challenge (FRGC ver2.0) [34], Labeled Face in the Wild (LFW) [17] and Face and Ocular Challenge Series (FOCS) [33]. Experimental results justify that the proposed OSPF technique yields improved recognition performance when compared to other state-of-the-art face recognition techniques. The rest of the paper is organized as follows: Section 2 intends to formally describe the problem. The formulation of the proposed OSPF methodology is presented in Section 3. Experimental evaluations and subsequent discussions are presented in Section 4. Finally the paper is concluded in Section 5.
ED
2. PROBLEM DESCRIPTION
AC
CE
PT
Given a face image (input) acquired from a typical operating environment, the main objective of a face biometric is to find the identity of the input by comparing it with a set of face images stored in a database [39] using the physiology or behavioural attributes inherent in the input using appropriate artificial intelligence techniques. To do so, some face recognition methods choose specific facial landmarks from face images. It requires sufficient training samples per individual to select those facial landmarks. However, many real word applications are subject to the SSPOC (Single Sample Per Object Class) constraint [38] which results in a degraded recognition performance[10]. Furthermore, the presence of occlusion coupled with expressions, in addition to the SSPOC contraint, further imposes uncertainty issues to the recognition process. Hence, the main goal of this contribution is to counter these potential uncertainty problems. Amidst these issues, the problem of finding optimal facial landmarks need to be addressed. We strategically intend to exploit the bilateral symmetry of human faces to identify optimal landmarks. Initially each face image is segmented into two vertical symmetric segments viz.: The Left half (L) and The Right half (R). Subsequently, each of these halves is segmented into r horizontal segments as shown in Fig 1. Thus the total number of face segments in a given face will be k= {L1 ,R1 ,L2 ,R2 ,· · · ,Lr ,Rr } (in this study r=3). The ultimate idea is to represent each segment as a face sub-graph. After a given face image is segmented, a Gabor wavelet transform is extracted for each of the underlying segments. The frequencies Fr are indexed as v = (0, ..., Fr −1) and orientations Or are indexed as µ = (0, 1, ..., Or −1), where suitably Fr =5 and Or =8 have been set as recommended in the literature [5, 47]. Then, a total of v × µ complex coefficients have been generated for the pixel entities entailed in these segments. In other words a vector J of length v × µ complex coefficients is calculated for each pixel which is referred to as a Jet. In summary, the Gabor wavelet coefficients for each of the face image segments are computed as follows: First, a Gabor wavelet filter with 5 frequencies and 8 orientations is generated using Equation 2. Second, the jet for each location (x,y) in the face image segment is produced 3
ACCEPTED MANUSCRIPT
CR IP T
Figure 1: Image segmentation and landmarks formulation
AN US
by convolving the face image segments with the Gabor filter for every (5) frequencies and (8) orientations. Note that convolutions are performed 40 times (5 × 8) which results in Gabor wavelet features (GWF) of dimensions equivalent to the dimensions of the face image segment size multiplied by 40. For instance if the size of the face image segment is 60 × 60, the Gabor features size will be 60 × 60 × 40 as each pixel is represented by a vector of 40 Gabor wavelet coefficients. Thus, the Gabor wavelet coefficients at each location (x,y) of the face segments can be conveniently stored in the form of a compact vector. The detailed formulation of the Gabor wavelets computation is carried out as follows: −x ) around a given input pixel → −x = (x,y)[42] . It is The Jet J describes a small patch of grey values in a segment ki (→ based on the wavelet transform defined in terms of a convolution operation as shown below: ∀p ∈ ki , Z − → − −0 2→ −x ) = −x − → (1) Ψ p (→ I( x0 )ψ p (→ x )d x0 with Gabor-based kernels :
M
−→ kK p k2 k−K−→p k2 k2→−x k2 i−K→p→ −x −σ2 → − 2 ] 2σ e [e − e ψp( x ) = σ2
(2)
ED
−→ the wave vector K p describes different frequencies (v) and orientations (µ) enveloped by a Gaussian function, where → −x = (x, y) represents a pixel in the segment k . The values of − σ = 2π is the standard deviation of this Gaussian; → K is
PT
defined as:
CE
where:
i
! ! −→ Kx Kv cos θµ Kp = = Ky Kv sin θµ Kv =
(3)
kmax fv
(4)
π 8
(5)
θµ = µ √
p
AC
In this study, f has been chosen to be 7 and kmax = π2 as suggested in [22]. Note that the jets for a segment k of size 60 × 50 will be 60 × 50 × 40 where each pixel is represented by a jet which is basically a vector of 40 complex numbers. For each of the symmetric segments kLi and kRi , the goal is to find the optimal vector x = (x1 , x2 · · · x M ) which represent landmark coordinates (x-axes and y-axes image pixel coordinates). Each xi has a specific value range [LBi , U Bi ], where LBi represents the lower bound and U Bi represents the upper bound with respect to xi . Each segment kLi and kRi are divided into M horizontal portions (x-axis p x ) and M vertical portions (y-axis py ). According to the principles of permutations and combinations, the set of all possible landmarks for each segment ki amounts to M 2 shown in Fig (1). The coordinates of these landmarks (LC) can be generated from the vector x as follows:
4
ACCEPTED MANUSCRIPT
(x1 , x1 ) (x2 , x1 ) LC = . .. (x M , x1 )
(x1 , x2 ) (x2 , x2 ) .. .
··· ··· .. .
(x M , x2 ) · · ·
(x1 , x M ) (x2 , x M ) .. . (x M , x M )
(6)
AN US
CR IP T
where the elements of the LC matrix represent the corresponding landmark coordinates inside a particular symmetric image segment. For example the element (x1 , x1 ) is a location of the first landmark as shown in Fig. 1. Further, to understand the landmark formulation let us briefly see an example as shown in Fig. 2. Let the vector x = (x1 , x2 , x3 , x4 ) represents a segment k1 of size 40 × 40 pixels. The segment k1 can be further decomposed into four portions each along the x-axis and the y-axis. Therefore the search range of the first variable x1 of vector x need to be selected from the range of values: x1 ∈ [LB1 = 1, U B1 = 10]; x2 ∈ [LB2 = 11, U B2 = 20],x3 ∈ [LB3 = 21, U B3 = 30], and x4 ∈ [LB4 = 31, U B4 = 40].
Figure 2: Illustration of the image portion range and to employ the vector x to generate landmarks
ED
M
2.1. Objective function The similarity of the landmark points of two symmetric segments kLi and kRi of the gallery image can be computed based on the their jet vectors , J and J0 . Thus the main aim is to find the maximum similarity between the jet vector J which entails the landmark points of the segment kLi and the jet J0 which entails the landmark points of the segment kRi . Thus, the objective function can be defined as follows: ∀ j ∈ J f (x) = max S (J, J0 )
0
(7)
CE
PT
where S (J, J ) can be computed using (Eq. 8).
0
P
S (J, J ) = qP
|a j ||a0j | P 02 2 j |a j | j |a j | j
(8)
where a represents the complex coefficients of vector J , a0 represents the complex coefficients of vector J 0 .
AC
3. THE PROPOSED OPTIMIZED SYMMETRIC PARTIAL FACE-GRAPHS (OSPF) A novel memetic based optimized symmetric partial face-graph (OSPF) is proposed in this paper. The proposed OSPF approach intends to formulate partial facegraphs to represent facial images instead of using a single facegraph. We segment face images into horizontal components and further partition them into left and right halves by exploiting the bilateral symmetrical property which is inherent on frontal face images. These partitions on the left-half and the right-half of the face would be mirror images of each other. The fundamental idea is that if we take few image samples from any of the facial partitions segmented on the left-half of the face, then by applying optimization algorithms we can find matching pair for these image samples from the corresponding right-half facial partition. Such image samples need not have to be pre-defined facial features such as eyes, eye-brows and so on. With this fundamental insight, we can define partial facegraphs as component level graphs whose nodes correspond to the landmarks of these image 5
AN US
CR IP T
ACCEPTED MANUSCRIPT
M
Figure 3: Framework of the proposal OSPF approach. The diagram witnesses an instance of automatically generated face-subgraphs over a symmetric component-level segmented typical face image.
AC
CE
PT
ED
samples which are taken from the facial partitions. Finally, the recognition mechanism of the proposed OSPF involves the computation of similarity measures between the partial facegraphs entailed by the face images. The framework of the OSPF is shown in Fig. 3 which includes six phases. In the first phase, in order to minimize the variations resulting due to lighting and noise standard preprocessing techniques (Median Filtering and Histogram Equalization) are applied on all stored face images in the database. In the second phase, all face images are segmented into six segments as illustrated in Section 3.1. Then for each segment, the Gabor wavelet transform is computed in phase three as explained in section 2. In the fourth phase, the proposed memetic based framework is applied to search for optimal landmarks which represent the maximum similarity between the segment from right half and its symmetric segment from the left half. In the fifth phase the optimal landmarks obtained by the proposed memetic framework are used to construct the partial face-graph for all of the face segments. In the sixth phase, the similarity measures between the partial face-graphs formed on the test image and the partial face-graphs formed on the gallery images are computed in order to recognize the test image. Ultimately the recognition process is performed by sorting the similarities obtained and the face image with maximum similarity will be classified as the target face. In general, the overall procedure of the proposed OSPF can be explained in the follows steps: First, the gallery face images are segmented into six symmetric segments as shown in Fig. 1. Second, for each segment, the Gabor wavelet is computed using the formulation described in section 2. Consequently, each pixel of a face image segment is represented by a vector of Gabor wavelet coefficients which will be referred to as a Jet (J). By deploying the Harmony Search algorithm initial candidate solution vectors are generated randomly (Section 3.2.1 Step 2) in the Harmony Memory HM Equation 13. Note that the number of solution vectors are determined based on the HMS parameter. Each solution vector contains four variables (elements) from which sixteen landmarks positions (LC matrix) are formulated as formulated in Equation 6. The LC matrix contains sixteen landmarks coordinates which are treated as a graph or block because of the fact that they are connected together. Secondly, at each landmark (node), the Jet ( J ) of left face image segment kLi of gallery image and the Jet (J) of corresponding right face segment kRi are selected in order to compute the fitness function (similarity). The overall fitness value or similarity for each solution 6
ACCEPTED MANUSCRIPT
CR IP T
vector is the average similarity of the sixteen landmarks (nodes) that are composed by this particular vector. Then, the solutions are sorted and the solution with highest similarity is considered to be the optimal solution. Finally, step 3 to step 4 (as briefed in Section 3.2.1) are repeated until the maximum number of iterations is reached. In each iteration, if the new solution is better than the current optimal solution, it is updated as the optimal solution. Thereafter, the optimal solution found by the HSA is passed to the ISPO in order to perform a local search on each of the dimensions separately as per the instructions detailed in algorithm 1 (Section 3.2.2). After the optimal landmarks have been found, they are used to construct the partial face graphs for the gallery and test images. In other words, the optimal landmarks are found by maximizing the similarity between each corresponding symmetric segments of the gallery set images that involve neutral and frontal face images. Then, these optimal landmarks are used to construct the partial face graphs for the gallery and test images. The presence of uncertainty elements such as occlusions and expressions would not affect the construction of these partial face graphs. The reason is that the face graphs generated from the neutral face is basically used as a mask which will be projected over the faces which are even prone to such uncertainty elements. Finally, the recognition process involves matching face sub-graphs of test face image with that of the gallery face images. A detailed technical description of the key phases of the proposed OSPF are described in the following subsections.
AN US
3.1. Image segmentation As mentioned previously in Section 2, each face image has been segmented into two halves in the vertical direction where each half is considered to be symmetric to the other. Furthermore, each half is segmented into r horizontal segments. Therefore, the total number of face segments yield to k= {L1 ,R1 ,L2 ,R2 ,· · · ,Lr ,Rr }. Figure 1 shows a typical segmentated face image for the case of r= 3. The segmentation is carried out as follows: Let I(x, y) represent a given face, where h and w respectively represent its height and width. Let I(x0 , y0 ) represent the co-ordinates of its upper left most point. Using the set theory notation to model the facial components (segments), the segmentation task is accomplished as follows: The left-half components kL1 , kL2 , · · · , kLr can be modeled as:
M
h w , y0 ≤ y ≤ y0 + } 2 r w h 2h kL2 (I(x, y)) = {(x, y) | x0 ≤ x ≤ x0 + , y0 + ≤ y ≤ y0 + } 2 r r .. . 2h w ≤ y ≤ y0 + h} kLr (I(x, y)) = {(x, y) | x0 ≤ x ≤ x0 + , y0 + 2 r
(9)
PT
ED
kL1 (I(x, y)) = {(x, y) | x0 ≤ x ≤ x0 +
Similarly the right-half facial components kR1 , kR2 , · · · , kRr can be modeled as: w h < x ≤ x0 + w, y0 ≤ y ≤ y0 + } 2 r w h 2h kR2 (I(x, y)) = {(x, y) | x0 + < x ≤ x0 + w, y0 + < y ≤ y0 + } 2 r r .. . 2h w kRr (I(x, y)) = {(x, y) | x0 + < x ≤ x0 + w, y0 + < y ≤ y0 + h} 2 r
AC
CE
kR1 (I(x, y)) = {(x, y) | x0 +
(10)
After segmenting the face images, the Gabor wavelet transform is computed as explained in Section 2.
3.2. Memetic based framework for determination of the Optimal Landmarks As demonstrated in the previous studies, that there are differences in the facial landmarks in terms of discriminative features. It is reported that some facial landmarks have discriminatory features more relevant than others [37]. In this proposed work as well, finding optimal facial landmarks which are the building blocks of constructing partial face graphs forms a vital contribution. This section provides a description of employing the memetic based framework to 7
ACCEPTED MANUSCRIPT
find the optimal landmarks. In this framework, Harmony Search Algorithm HSA [29] is used to perform population based search whereas Intelligent Single Particle Optimizer ISPO is used to perform local search. The optimal landmarks are determined by maximizing the similarity between the partial face-graphs of the symmetric segments kRi and kLi which in turn aids to increase the accuracy rate of the face recognition process when only SSPOC is available. The procedure of the memetic framework is described in the following steps: 3.2.1. Adapting the Harmony Search Algorithm (HSA) for the face recognition problem Basically HSA has five steps which are described as follows:
AN US
CR IP T
Step I: Initialize the problem of occluded face recognition and HSA parameters: As mentioned in section 2, the modeling of the selection of optimal landmarks problem is provided in order to recognize a given occluded face using the objective function f(x) as formulated in eq. (7). The selection of optimal landmarks solution is represented as a vector x = (x1 , x2 , . . . , x M ) which represents the landmark coordinates. The range of values for each landmark coordinate is specified as xi ∈ [LBi , U Bi ]. The setting of the HSA parameters which are required to handle the selection of optimal landmarks are specified in this step. These parameters are: Harmony Memory Size (HMS) which determines the number of facial landmark vectors in the harmony memory(HM). Harmony Memory Considering Rate (HMCR) which determines whether the value of a landmark position is selected from the accumulative search or randomly from its possible range. Maximum Number of Iterations (NI). PARmin is the minimum pitch adjusting rate and PARmax is maximum pitch adjusting rate. These two parameters are used to compute the Pitch Adjustment Rate (PAR) for each generation dynamically which is determined to adjust the value of the new landmark position selected from memory by a small value. It is computed as follows: PAR(gn) = PARmin +
PARmax − PARmin × gn NI
(11)
M
where gn is generation number. The BWmin which represents the minumum bandwidth and BWmax which represents the maxiumum bandwidth are used to generate the the bandwidth (BW) for each generation which can be computed as follows: BWmin ) ln( BWmax ×gn NI
(12)
ED
BW(gn) = BWmax × e
CE
PT
Step II: Initialize the harmony memory with random feasible vectors (x) of the landmarks coordinates based on the HMS parameter: For each symmetric segment of the face image, landmark coordinate vectors (solutions) are randomly generated based on the HMS parameter. xij = rand(1) × LBi + (U Bi − LBi ), ∀i = 1, 2, . . . , M and ∀ j = 1, 2, . . . , HMS where rand(1) generate a uniform random number between 0 and 1. These solutions are sorted in descending order according to their fitness function values as shown in Eq. 13. HM =
x11 x12 .. .
x21 x22 .. .
x1HMS
x2HMS
··· ··· ··· ···
HMS x1M x2M .. .
(13)
xM
AC
Step III Improvise a new harmony vector: In this step, three operators are used to generate the new harmony vector (new landmarks coordinates) which can be represented by x0 = (x10 , x20 , · · · , x0M ). These operators are termed as: memory consideration, pitch adjustment, and random consideration. Memory consideration. Here, the value of the first landmark coordinate x10 is randomly selected from the values of available landmark coordinates prevalent in the current HM i.e. from {x11 , x12 , . . . , x1HMS }. The values of the other landmark coordinates, (x20 , x30 , . . . , x0N ), are also selected in the same way with a probability of HMCR where HMCR ∈ (0, 1) Pitch adjustment. The new harmony vector of landmark coordinates xi0 , x0 = (x10 , x20 , x30 , . . . , x0M ), that has been selected from the HM by memory considerations is pitch adjusted with the a probability of PAR(gn) where PAR(gn) ∈ (0, 1) using the following decision rule: 8
ACCEPTED MANUSCRIPT
Pitch adjusting decision for
xi0
Yes w.p. ← No w.p.
PAR 1-PAR.
If the pitch adjustment decision for xi0 is Yes, the value of xi0 is adjusted to its neighboring value using: xi0 = xi0 ± rand(0, 1) × BW(gn).
(14)
CR IP T
Note that the pitch adjustment rate (PAR) controls the pitch adjustment operator in which the pixel coordinates are selected by the memory consideration operator. In case PAR is met, the pixel coordinates will be adjusted to their neighbouring pixel coordinates using equation (14). Random consideration. Landmark positions that are not assigned with values according to memory consideration are randomly assigned according to their possible range by random consideration with a probability of (1-HMCR) as follows: w.p. HMCR xi0 ∈ {xi1 , xi2 , . . . , xiHMS } 0 xi ← x0 ∈ Xi {Random consideration} w.p. 1 - HMCR. i
AN US
Step IV: Update harmony memory: In this step, the new harmony solution is evaluated based on the objective function. If the fitness value of the new harmony solution is better than the worst fitness value in the harmony memory, then the new solution is included in the harmony memory instead of the worst solution. Step V: Stopping Criteria The HS algorithm will repeat step 3 and step 4 until the maximum number of iterations specified by MI parameter is reached.
PT
ED
M
3.2.2. Intelligent single particle optimizer (ISPO) In this contribution, the ISPO model [18] is adopted to perform local search operation. The steps of the ISPO are illustrated in Algorithm 1 which follows shortly. The local search operation iterates through all dimensions of the new landmarks vector Lbesti,d and tunes each dimension independently. As stated in Algorithm 1, the local search operation continues T times for each dimension. The velocity is computed as follows [18]: a V Ei,d = p rn + Li,d (15) t where a is the acceleration factor, p is the descent parameter that controls the decay of the velocity, rn is a uniformly distributed random number in the range [-0.5,0.5 ], and t is the current iteration number. Variable Li,d represents the learning rate that controls the jumping size. Its value is updated based on the fitness function where it is doubled if the fitness is improved, otherwise it is decreased. The update operation can be performed as follows: 2V Ei,d if fitness gets improved Li,d = (16) Li,d otherwise. 2
CE
The value of the Lbesti,d is updated as follows: Lbesti,d + V Ei,d Lbesti,d = Lbesti,d
if fitness gets improved otherwise.
(17)
AC
The local search operation is useful for exploiting promising search space. This operation is executed after the exploration operation by HSA. The new harmony vector for each iteration is passed to the ISPO as a single particle with the maximum and minimum bounds for each portion (dimension). In ISPO, the operation aims to tune each dimension di of the particle Lbest independently from the other dimensions. Note that the value of the di of the particle Lbest gets updated, if the new value exceeds either the maximum or the minimum bounds. The new value will be generated randomly in between the lower and upper bounds of di . 3.3. Partial face-graphs construction In this step, the partial face-graphs for each of the symmetric segments kLi , kRi are constructed using the optimal landmarks as shown in Fig 4, where each optimal landmark has been represented as a node in the partial face-graph. The edges between the adjacent nodes are fully connected. The partial face-graph is basically a data structure that contains the coordinates of the nodes and its respective jet. 9
ACCEPTED MANUSCRIPT
Algorithm 1 ISPO(Lbesti ,BestFitness,LB,UB)
AC
CE
PT
ED
M
AN US
CR IP T
Require: Input: Best solution vector (x)found by IHS, Best Fitness value, Lower and Upper bound for each variable (element) of vector x Ensure: Output: New solution vector (After perform local search) and its fitness value 1: Initialize ISPO parameters 2: for d=1 to length(Lbest) do 3: Repeat 4: Compute V Ei,d based on Eq 15 5: Lbesti (d)=Lbesti (d)+V Ei,d 6: if ( Lbesti (d) < LB(1,d)) then 7: Lbesti (d)=rand(1)*(UB(1,d)-LB(1,d))+LB(1,d) 8: end if 9: if ( Lbesti (d) > UB(1,d)) then 10: Lbesti (d)=rand(1)*(UB(1,j)-LB(1,j))+LB(1,j) 11: end if 12: Compute FitnessFunction(Lbesti ) 13: fitcount = fitcount + 1 14: Update Lbesti,d based on Eq 17 15: Update Li,d based on Eq 16 16: Until (T iteration is met) 17: end for
Figure 4: A typical face image with partial face-graphs constructed using the computed optimal landmarks
10
ACCEPTED MANUSCRIPT
3.4. Partial face-graphs similarity calculation
1 X S k (J j , J 0j ) Ω j=1 Li Ω
PFGkLi =
CR IP T
The partial face-graph of a given test image is constructed using the optimal solution vector x = (x1 , x2 , · · · , x M ) which is found by maximizing the similarity between each symmetric segment of the gallery image. Thereafter, the matching process is performed by calculating the similarity between the partial face-graphs of the test image and the partial face-graphs of those gallery images stored in the database in order to recognize the test image. The computation of the similarity measures are as follows: For the left half, the similarity of each node of the partial face-graph for the segment kLi , (PFGkLi ) is computed using Eq. 18: (18)
where Ω is the number of nodes of the partial face-graph of segment kLi . J j represents a jet of the node of the partial face-graph of the gallery image segment, J 0 represents a jet of of the node of the partial face-graph of the test image segment. The overall similarity measure of the left half LS is computed as: r
1X PFGkLi r i=1
AN US
LS =
(19)
This similarity computation is repeated for the right half where : PFGkRi
M2 1 X S k (J j , J 0j ) = 2 M j=1 Ri
(20)
Ultimately the overall similarity measure of the right half RS is computed as: r
1X PFGkRi r i=1
M RS =
(21)
ED
The average similarity measure of the entire image comprising of the partial face-graphs (OS) is calculated using 1 (LS + RS) (22) 2 Finally, the similarity measures between the test image and each of the gallery images is computed and then sorted and ranked, so that the image with the highest similarity measure corresponds to the rank one match. A face image is considered to be accurately recognized if it belongs to rank one.
CE
PT
OS =
4. RESULTS AND DISCUSSION
AC
In this section, the performance of the proposed OPSF has been evaluated using four large databases including AR [32], Face Recognition Grand Challenge (ver2.0) [34]), Label Face in the Wild (LFW) [17] and Face and Ocular Challenge Series (FOCS) [33]. A series of experimental evaluations have been performed which include recognizing faces under different operational conditions; for instance faces taken under control/ideal condition, faces with realistic occlusion such as sunglass and scarf, faces with different facial expressions coupled with partial occlusion and faces with varying illumination. The parameters of the proposed memetic based framework OSPF are initialized as specified in Table 1. However, the initial parameter settings of both the HSA and the ISPO algorithms do not highly affect their performance due to the simplicity of the problem search space. Note that the proposed OSPF has been run 30 times for all the experiments conducted in the following subsections below and the performance metrics in the tables that follow refers to the average recognition rate of these replicated runs. The recognition rate has been computed based on the first rank classification of the subjects yielded by the algorithms in all the trials. 11
ACCEPTED MANUSCRIPT
Table 1: List of parameter settings used in the proposed memetic based framework OSPF Parameters of ISPO p=20 T=30 a=150
M
AN US
CR IP T
Parameters of HSA HMCR= 0.9 PARmin=0.4 PARmax=0.9 bwmin=0.0001 bwmax=1.0 NI=1500
ED
Figure 5: Cropped face images of a typical subject chosen from the AR database including (a) Neutral face under controlled conditions taken in the session 1; (b-d) faces with smile,anger and scream expressions taken in the session 1; (l) neutral face under controlled conditions taken in the session 2; (f-h) faces under varying light conditions taken in the session 1; (i,j) partial occluded faces with sunglasses and scarf taken in the session 1; (k,l) partial occluded faces with sunglasses and scarf taken in the session 2;
PT
4.1. Experimental results on the AR database
AC
CE
The AR database includes over 3200 frontal face images captured from 126 subjects (70 male and 56 female). Each subject comprises of 26 face images obtained in two different sessions and the difference between these sessions being a time interval of two weeks. Each session has 13 face images consisting of different facial expressions of smile, anger, scream, and varying illumination conditions (left light on, right light on and both lights on) and partial occlusions (sunglasses and scarves). The face images of 100 subjects (50 male and 50 female) comprising all variations have been used in the experiments to avoid bias in the gender. The face images have been enhanced and normalized using median filtering and histogram equalization, and subsequently cropped. The cropped images have a resolution of 165×120 pixels as shown in Fig. 5. The AR Face database has been used to evaluate the OSPF performance under different operational conditions. The performance of the proposed OSPF has been compared with ten standard approaches including Line Edge Map (LEM) [12], Ensemble String Matching (ESM) [8], Adaptively Weighted Patch Pseudo Zernike Moment Array (AWPPZMA) [20], partitioned Sparse Representation-based Classification (p-SRC), [43],Adaptively Weighted SubGabor Array(AWSGA) [19] , Psychophysically Inspired Similarity MApping (PISMA) [40] and Harmony Search Oriented EBGM (HSO-EBGM) [24], Adaptive Noise Dictionary Sparse Regression based Classifier (AND SRC) [9], Local Structure based Multi-Phase Collaborative Representation Classification (LS MPCRC) [28] and SIFT based Kepenekci approach ( SIFT) [27].
12
ACCEPTED MANUSCRIPT
AN US
CR IP T
4.1.1. The impact of the number of segments per face image with reference to the performance of the OSPF Several experiments have been conducted to investigate the impact of the number of segments per face pertaining to the performance of the proposed OSPF framework. The neutral face images from session 1 have been used as the gallery set while the neutral faces from session 2 have been used as the test set. To fulfill the symmetry between the segments, The first experiment involves segmenting the face image into two segments (r = 1) . The second and third experiments involve segmentation of the face image into four (r = 2) and six (r = 3) segments, respectively as illustrated in (Fig. 6). Then, the OSPF is used to find the optimal landmarks and construct the partial face-graphs for each of the symmetric segments. The outcomes of the three experiments are shown in Fig. 7. It can be noted that, the recognition rate has improved with the increasing number of segments where the best recognition rate is achieved when the number of segments have reached six. However,a further increase in the number of segments demands a great deal of computational load. Therefore, this study considers the number of segments to be six for subsequent experiments. This experiment also conveys that a component-based face recognition (i.e., OSPF) significantly improves the recognition rates.
AC
CE
PT
ED
M
Figure 6: A typical face image segmented into two, four and six segments
Figure 7: Comparison on the effect of the number of face segments (face sub-graphs)
4.1.2. Face recognition under controlled /Ideal conditions In this section, the proposed OSPF approach has been assessed under relatively controlled conditions using both the AR and the FRGC ver2.0 databases. For the experiments on the AR database, the neutral face images obtained in first session has been selected as the gallery set while the neutral face images obtained in the second session has been selected as the testing set. For the FRGC ver2.0 database, one controlled still image per subject has been used as the gallery for all the 466 subjects while a different random still image was used as the testing set. The performance of OSPF was comparatively analyzed against those of ESM, p-SRC, AWPPZMA, LEM and HSO-EBGM based on 13
ACCEPTED MANUSCRIPT
Table 2: Comparative performance: Controlled/Ideal conditions
p-SRC ESM AWPPZMA LEM HSO-EBGM OSPF
FRGC 81.76 % 82.62% 72.75 % 66.95 % 73.75 % 98.2 % (STDEV=1.6)
CR IP T
Recognition Rate AR 97.43 % 96.58 % 92.31 % 96.58 % 94.1 % 98.9% (STDEV=1.07)
Methods
AN US
the recognition rate. The comparative analysis of all the methods is shown in Table 2. The best recognition rate is emphasized in bold (the higher the better). It can be observed that the OSPF attained the best recognition rates on both the databases compared to the other methods, particularly outperforming AWPPZMA, LEM and HSO-EBGM. Nonetheless, ESM and p-SRC algorithms also exhibited comparatively a higher performance in face recognition. Although there are major challenges due to the large variations of the face images for the same person in the FRGC ver2.0 database, OSPF could still yield a 98.2% recognition rate.
CE
PT
ED
M
4.1.3. Face recognition experiments against realistic partial occlusions The proposed OSPF approach, by exploiting the facial bilateral symmetry concept, is able to find the optimal landmarks without requiring any prior knowledge about the occluded regions or the occlusion size. In this experiment, the neutral images (i.e., Fig. 5 (a)) have been used as a gallery set while the occluded faces with sunglasses and scarf from both sessions (i.e.,Fig. 5 (h-k)) have been used for testing. The recognition rates of the proposed method (OSPF) and other standard methds viz., ESM, p-SRC, PISMA, AWSGA, AWPPZMA and HSO-EBGM approaches are tabulated in Table 3. The best recognition rate is emphasized in bold (the higher the better). It is evident that the OSPF outperforms the other methods against the sunglasses and scarves occlusions in both the sessions. These results validate the robustness of the proposed OSPF approach. Specifically, the OSPF has achieved an average accuracy better than all comparative methods. Specifically, an improvement of 11.31% and 14.3% is reported when compared to the ESM and p-SRC respectively. It can be deduced that occlusions affect the performance of all the algorithms by significantly decreasing the recognition rates. However, only a small degradation has been recorded by the performance of the OSPF, from 0.2% to 0.8% and 0.4% to 2.6% for faces with sunglasses with respect to sessions 1 and 2. In addition, Fig. 8 (a) , (b) ,(c) and (d) compares the recognition rates in terms of Cumulative Match Characteristics (CMS) of our proposed OSPF approach with those of PISIMA, AWSGA and AWPPZMA in the presence of sunglasses and scarf occlusions of sessions 1 and 2 respectively. These figures justify that the OSPF outperforms PISMA, AWSGA and AWPPZMA (about 32.1%,44.85% and 32.85% significantly better) for all conditions in both the sessions. These results have proven that the proposed method has the ability to tackle the problem of partial occlusions. Furthermore, the results prove that the OSPF approach is able to effectively select optimal landmarks without the need of imposing any prior knowledge or assumptions on occlusions so as to successfully performing the face recognition challenge.
AC
4.1.4. Face Recognition Under Varying Facial Expressions Facial expressions form a vital part of any face recognition challenge. Face images from the AR database comprise of different facial expressions (smiling, angry, and screaming), which are used to assess the impact of facial expressions on the performance of proposed OSPF. In this experiment, face images showing neutral expressions in the first session ( Fig. 5 (a)) have been used as the gallery set, while face images with three different expressions (e.g., Fig. 5 (b), (c), and (d)) taken in the same session have been used as the test set. The experimental results are tabulated in Table 4. The best recognition rate is emphasized in bold (the higher the better). It can be observed in Table 4, that scream expression reduces the accuracy of OSPF by only 6.3%, but significantly lowers the accuracy of other methods. In contrast, better accuracies were reported for smiling and the angry expressions. These results show that expression strength can affect the recognition rate given that scream expression has a more drastic effect on accuracy than soft expressions such as smile and anger. Also it can be noted that there is an increase of 0.4% in the 14
ACCEPTED MANUSCRIPT
Table 3: Comparative performance: Real occlusions
AWSGA AWPPZMA ESM p-SRC PISMA HSO-EBGM AND SRC LS MPCRC OSPF
Session-1 Sunglass Scarf 38 % 84% 70 % 72% 87.18% 94.87% 85.47% 90.6% 80% 79% 88.20 % 91.8% 95.3 % 73.1% 97.8 % 89.4% 98.7% STDEV(0.88) 97.9% (STDEV=1.04)
Session-2 Sunglass Scarf 20% 70% 58% 60% 76.07% 88.03 % 72.65% 85.47% 55% 49% 80% 83.3% 72.5% 65.6% 98.5% (STDEV=0.82) 96.3% (STDEV=1.07)
CR IP T
Method
Average 53 % 65 % 86.54% 83.55 65.75 % 85.83 % 81.3 % 97.85%
Table 4: Experimental results under varying facial expressions
Scream 26.49% 33.33% 38.46% 31.62% 33.33% 53.4% 92.6% (STDEV=1.6)
Average 66.38% 72.36% 74.7% 68.09 % 74.64 % 80.7 % 96.9% 97.06%
M
ESM p-SRC AWPPZMA LEM AWSGA HSO-EBGM LS MPCRC OSPF
Recognition Rate Anger 87.18% 90.60% 87.18% 93.16% 94.87% 95% 99.3% (STDEV=0.84)
Smiling 85.47% 93.16% 96.58% 79.49% 95.72% 93.6% 99.3% (STDEV=0.84)
AN US
Method
ED
recognition rate under smiling and anger conditions compared to the recognition rate for the neutral faces in Table 2. This is attributable to the fact that the test and gallery images in this section were captured in the same session while the gallery images in Section 4.1.2 were captured in the first session and the test images were captured in the second session after a 15 days time interval which tends to increase the intra-class variations.
AC
CE
PT
4.1.5. Recognizing faces with different facial expressions in the presence of partial occlusions Facial expression and partial occlusion seems to negatively affect the accuracy of a face recognition system. Furthermore, the combination of both these degrading entities in the face image further complicates the recognition task. This section evaluates the feasibility of the proposed OSPF to effectively recognize faces under different facial expressions with partial occlusion. Two experiments were performed. In the first experiment, the faces used in section 4.1.4 are occluded with black squares along the eyes region as illustrated in Figure 9. A similar experimental setup as described section 4.1.4 has been used.where face images shown in Figure 9 are matched with one neutral image per subject. Recognition rates of 93%, 92% and 79% were achieved for the conditions namely smiling with occlusion, angry with occlusion and scream with occlusion, respectively. Compared to the other methods outlined in Table 4, the proposed OSPF still outperforms all comparative methods in terms of the scream facial expression coupled with occlusions and exhibits a similar performance in terms of the smiling and anger expressions. Note that the performance of all the comparative methods have been conducted against facial expression without any occlusion (as reported in the literature). In the second experiment, face images with different expressions such as smiling, anger and scream have been occluded by a black square around the entire mouth region as shown in Figure 10. The recognition rate of the proposed OSPF in presence of facial expressions and partial occlusions has been decreased from 99.3%, 99.3% and 92.6% to 76%, 75%, 68% against smiling, anger and scream expressions, respectively. However, it still outperforms all comparative methods as evidenced by the results tabulated in Table 4 in terms of scream facial expression although the recognition rate of these methods is reported against only facial expression in the absence of occlusion. Further, the recognition rate of the OSPF in terms of Cumulative Match Characteristics 15
CR IP T
ACCEPTED MANUSCRIPT
(b) Scarf Session 1
(c) Sunglasses Session 2
M
AN US
(a) Sunglasses Session 1
(d) Scarf Session 2
ED
Figure 8: Comparison of proposed OSPF model with PISIMA, AWSGA and AWPPZMA against realistic occlusions using the AR dataset in terms of Cumulative Match Characteristic(CMC) graphs.
PT
(CMC) curve is plotted in Figures 11 and 12 for the first and second experiments respectively.
AC
CE
4.1.6. Face Recognition under the illumination challenge Varying illumination is an intrinsic challenge when capturing face images in unconfined environments, which can affect the performance of any face recognition system. In this experiment, the robustness of the proposed OSPF approach in relation to illumination variations has been examined. The neutral face images in the AR database taken in the first session (e.g., Fig. 5 (a)) have been chosen as the single gallery image of the subjects. The face images under three different lighting conditions (e.g., Fig. 5 (e), (f), and (g)) taken in the same session were utilized as test images. A summary of the results is shown in Table 5.The best recognition rate is highlighted in bold (the higher the better). It can be inferred from Table 5 that the proposed method significantly outperformed ESM and p-SRC which are recognized to be insensitive to variable illumination conditions. This higher performance yielded by the OSPF is attributable to the deployment of feature extraction using the Gabor wavelet features. 4.1.7. Comparison with other hybridization methods In this subsection,the result obtained by the proposed OSPF approach is compared against the results of different hybridization methods that have been reported in [7] using the AR face database. Three meta heuristic methods namely Binary adaptive weight gravitational search algorithm (BAW-GSA),binary particle swarm optimization (BPSO) and binary genetic algorithm (BGA) are hybridized with three popular feature extraction techniques including local binary pattern (LBP), modified census transform (MCT), and local gradient pattern (LGP). The results of these hybridization 16
ACCEPTED MANUSCRIPT
AN US
CR IP T
Figure 9: Faces with different facial expressions coupled with partial occlusions located along the eyes region.
Figure 10: Faces with different facial expression and partial occlusion located along mouth region
M
methods that include LBP+BAW-GSA, MCT+BAW-GSA,LGP+BAW-GSA, LBP+BGA, MCT+BGA, LGP+BGA, LBP-BPSO, MCT-BPSO and LGP-BPSO together with obtained results yielded by the proposed OSPF approach are reported in Table 6. In this test, the neutral face image from session 1 has been used as the gallery while 3 face images exposed to occlusion, expression and illumination from session 2 have been used for the testing. The table 6 shows that the proposed OSPF outperforms all other methods taken into consideration for this comparative study. Furthermore, the OSPF has the advantage of using only one face image per subject as a gallery set while 5 face images per subject have been used for training the other comparative methods.
AC
CE
PT
ED
4.2. Experimental results on the FRGC database The FRGC ver2.0 database holds 50,000 images categorized into training and validation sets. The training set comprises of 12,776 images from 222 persons while the validation set contains images from 466 persons collected in 4,007 sessions. The number of sessions used for taking the photos ranged from 1 to 22 sessions (410 persons were photographed in 2 or more sessions). In each session, four controlled still images, two uncontrolled still images and one 3D scan have been recorded. In the entire range of experiments, the facial regions were cropped to a size of 160 × 160 pixels. FRGC has been used to investigate the sensitivity of the proposed OSPF approach against synthetic occlusions of variable sizes in addition to an ideal condition as earlier illustrated in Section 4.1.2. The performance of the pro-
Method
AWSGA AWPPZMA ESM p-SRC LEM HSO-EBGM LS MPCRC OSPF
Table 5: Experimental results under varying lighting conditions
Left light 23.93% 74.36% 94.02% 95.73% 92.31% 98.6% 99,7% (STDEV=0.66)
Recognition Rate Right light 5.98% 64.96% 94.02% 96.58% 91.45% 98 .3% 99,7% (STDEV=0.66)
17
Both lights 23.8% 42.74% 73.50% 32.48% 73.50% 93% 99,2% (STDEV=0.83)
Average 17.9% 60.69% 87.18% 74.93% 85.75% 96.63% 98.9% 99.5 %
CR IP T
ACCEPTED MANUSCRIPT
AN US
Figure 11: Recognition rate observed with different facial expression coupled with partial occlusion along eyes region. Table 6: Recognition rate comparisons, among different hybridization methods using AR database
Recognition rate 94.1 95.2 95.5 93.6 94.7 94.8 93.9 945 95.1 98.7 (STDEV=0.88)
PT
ED
M
Methods LBP+BAW-GSA MCT+BAW-GSA LGP+BAW-GSA LBP+BGA MCT+BGA LGP+BGA LGP+BPSO LGP+BPSO LGP+BPSO OSPF
AC
CE
posed method has been compared with four benchmark approaches viz., Line Edge Map (LEM) [12], Ensemble String Matching (ESM) [8], Adaptively Weighted Patch Pseudo Zernike Moment Array (AWPPZMA) [20] and partitioned Sparse Representation-based Classification (p-SRC) [43] which achieved outstanding performances for recognizing partially occluded faces. 4.2.1. The sensitivity analysis of OSPF against varying occlusion sizes The sensitivity of the proposed OSPF against varying occlusion sizes (partial occlusion challenge) have been evaluated using comprehensive experimental assessments. In this experiment, we have stratified the same experimental settings as in [8]. The test images used in Section 4.1.2 were occluded by a square of s × s pixels at a random location, while the gallery images remained unaffected without occlusion (as shown in Fig. 13). The size of the occlusion blocks were varied from s = 10 (occlusion content of 0.39% within face image) up to s = 100 (39.06% occlusion content) with a step size of 10. The performance of the proposed method and the benchmark approaches in terms of recognition rates are plotted against the occlusion size in Fig. 14. As deduced from Fig. 14 the recognition rate of the proposed OSPF method is significantly higher compared to the other methods. It can also be observed that the occlusion blocks of sizes ranging from 10×10 up to 40×40 pixels exhibit no effect on the performance of the proposed 18
CR IP T
ACCEPTED MANUSCRIPT
AN US
Figure 12: Recognition rate observed with different facial expression coupled with partial occlusion along mouth region.
ED
M
OSPF method and merely 10% is degraded with a block size of 100 × 100. Conversely, recognition rates of other benchmark algorithms were significantly degraded within the range of 21.76% to 65% with a block size of 100 × 100. These results validate the effectiveness of the proposed OSPF in successfully resolving the partial occlusion problem.
PT
Figure 13: A typical face images being occluded with random occlusion blocks ranging from 10 × 10, 20 × 20, · · · ,
100 × 100 respectively.
4.3. Experimental results on the FOCS database
AC
CE
The key objective of the Face and Ocular Challenge Series (FOCS) database [33] is to motivate researchers in the face recognition domain to develop robust algorithms capable of recognizing persons based on still frontal face images. It comprises three groups of images: the Good, the Bad and the Ugly and hence the name GBU. The GBU classifications have been made based on the difficulty level of the recognition challenge. The Good group contains a pair of face images of the same person which is easy to match while images with averagely difficult to recognize were categorized into the Bad group; and face images which are difficult to match were classified into the Ugly group. For each group there are two sets; target set and test set. Each set contains 1085 face images for 437 different person. To the best of our knowledge, the previous studies on this database have mainly focused on verification task, which is deciding whether two face images belong to the same person or not. Therefore, only the result obtained by the proposed OSPF model is reported. The experimental results reported in this study may serve as a benchmark to future identification inclined research studies. In this study, the identification task of determining who is the person? has been performed by matching the test image with the stored gallery images. Only a single image per subject has been selected from the target set as gallery images. Another face image for each subject from the test set were used as test images. The recognition rate on each group is presented in Table 7. In addition, the behavior of the proposed method in terms of Cumulative Match 19
CR IP T
ACCEPTED MANUSCRIPT
AN US
Figure 14: Recognition rate observed under varying sizes of random occlusion. Occlusions were gradually increased from 10 × 10, 20 × 20, . . . , 100 × 100.
Characteristics (CMC) curve has been plotted in Figure 15. It is discernible that the proposed method obtains a recognition rate of 100% within the ranks 3 and 7 for the Good and Bad groups, respectively while a recognition rate of 99% was obtained within the rank 23 for the Ugly group. This difference is attributable to the strong variations in the images contained in the Ugly group.
M
Table 7: The Identification rate using GBU dataset
The Good
The Bad
The Ugly
Identification rate
99.5%
98.6%
95.8%
AC
CE
PT
ED
Recognition Task
Figure 15: The CMC of Identification rate on the Good, the Bad and the Ugly groups.
20
ACCEPTED MANUSCRIPT
CR IP T
4.4. Experimental results on the LFW database Labeled Faces in the Wild (LFW) [17] is a database of face images designed to explore the problem of unimpeded or unconstrained face recognition. LFW contains more than 13,000 face images of 5749 different individuals (subjects) obtained from the web, with image labeled with the name of the person. There are two or more distinctive photos in the data set for 1680 individuals and one photo for the rest. Each face has been detected by the standard Viola Jones algorithm [41]. The cropped version of the original LFW images is available online. In this version, nearly all the backgrounds of the images are absent. The extracted face area was rescaled to a size of 160 × 160 pixels. In this experiment, the cropped version of the frontal face images of 1680 individuals have been used to perform the identification task which involves determining who is the person? by matching the test face image with the gallery face images. The results obtained using the proposed memetic OSPF model has been comparatively analyzed against a recently reported technique in [27] using the same settings. Table8 shows the results of the comparison between the proposed OSPF approach and four methods (Gabor wavelets (Kepenekci), LBP (Ahonen), SIFT (Aly) and SIFT based Kepenekci method (Lenc)). It can be inferred from Table 7 that the proposed OSPF approach outperformed the other methods with a recognition rate of 98.22% because of its ability to optimally select facial landmarks. Table 8: Comparative performance for the LFW database.
Recognition rate
AN US
Method
Gabor wavelets (Kepenekci)
95.96%
LBP (Ahonen)
96.25%
SIFT (Aly)
95.83% 98.04%
OSPF
98.22% (STDEV=0.83)
M
SIFT based Kepenekci method (Lenc)
ED
5. Conclusion
AC
CE
PT
This study has proposed a novel memetic based framework for human face recognition, termed OSPF. It is able to recognize face images captured in adverse conditions such as occlusion, different facial expression and varying illumination using only single exemplar face image per subject class. The proposed OSPF primarily exploits the bilateral symmetry of human faces in order to efficiently construct partial face sub-graphs. The objective is to find optimal landmarks by further hybridizing an enhanced harmony search algorithm with the intelligent single particle optimizer to perform a global search and carry out local search, respectively. The intuitive optimization mechanism has enabled the full automation of the face recognition process without generating any human intrusion such as manual selection of facial landmarks. The matching between faces has been performed by computing the overall similarity between their respective partial face sub-graphs. Importantly, the proposed memetic OSPF model is able to recognize faces prone to occlusion without any prior knowledge or assumption about the occluded location or size. Furthermore, the proposed approach neither discard any occluded region nor reconstruct any occluded region, thereby reducing the computational overheads. The effectiveness of the proposed OSPF has been evaluated and comparatively analyzed against the performance of recent approaches using four face databases: the AR, FRGC ver2.0, LFW and FOCS databases. The experimental results highlight the robustness of the proposed approach - which is not only limited to partial occlusions, but also when combined with other challenges, such as various facial expressions, and different illumination conditions – all with just the use of only one prototype face image per subject. Furthermore, the promising experimental results demonstrate the superiority of the proposed memetic model and the feasibility of using facial symmetry to improve the accuracy of the face recognition biometric under uncontrolled scenarios. However, in the case of faces with large pose variations, the proposed OSPF may not yield a good recognition performance. This limitation is owing to the fact that the bilateral symmetrical property of a face image gets diluted in proportion to the severity of pose variations. For future avenues, a further enhancement would be possible by exploring methods that address pose variations [45, 46]. Moreover, an investigation could be made by extending the proposed OSPF 21
ACCEPTED MANUSCRIPT
approach to model other traits inherent on the biometric domain such as gait recognition. Further, integrating Gabor wavelets with other feature extraction techniques such as Histogram of Oriented Gradients (HOG) or Local Binary Patterns (LBP) may enhance recognition performance. It is worth mentioning that the running time of OSPF is somehow large. By suitably applying some parallel processing algorithms in the near future run time complexity of the proposed approach could be mitigated. 6. Acknowledgements
CR IP T
The authors would like to thank Universiti Sains Malaysia for the partial funding of this work from the grant no. 304/PKOMP/ 6312153. References
AC
CE
PT
ED
M
AN US
[1] Abual-Rub, M. S., Al-Betar, M. A., Abdullah, R., & Khader, A. T. (2012). A hybrid harmony search algorithm for ab initio protein tertiary structure prediction. Network Modeling Analysis in Health Informatics and Bioinformatics, 1, 69–85. [2] Al-Betar, M. A., Khader, A. T., & Zaman, M. (2012). University course timetabling using a hybrid harmony search metaheuristic algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 42, 664 – 681. [3] Azeem, A., Sharif, M., Raza, M., & Murtaza, M. (2014). A survey: face recognition techniques under partial occlusion. Int. Arab J. Inf. Technol., 11, 1–10. [4] Aziz, M., & Tayarani-N, M.-H. (2014). An adaptive memetic particle swarm optimization algorithm for finding large-scale latin hypercube designs. Engineering Applications of Artificial Intelligence, 36, 222–237. [5] Bolme., D. S. (2003). THESIS ON ELASTIC BUNCH GRAPH MATCHING. Master’s thesis Colorado State University Fort Collins. [6] Boussa¨ıD, I., Lepagnot, J., & Siarry, P. (2013). A survey on optimization metaheuristics. Information Sciences, 237, 82–117. [7] Chakraborti, T., & Chatterjee, A. (2014). A novel binary adaptive weight gsa based feature selection for face recognition using local gradient patterns, modified census transform, and local binary patterns. Engineering Applications of Artificial Intelligence, 33, 80–90. [8] Chen, W., & Gao, Y. (2013). Face recognition using ensemble string matching. Image Processing, IEEE Transactions on, 22, 4798–4808. [9] Chen, Y., Yang, J., Luo, L., Zhang, H., Qian, J., Tai, Y., & Zhang, J. (2016). Adaptive noise dictionary construction via irrpca for face recognition. Pattern Recognition, . [10] Ding, R.-X., Du, D. K., Huang, Z.-H., Li, Z.-M., & Shang, K. (2015). Variational feature representation-based classification for face recognition with single sample per person. Journal of Visual Communication and Image Representation, 30, 35–45. [11] Ekenel, H. K., & Stiefelhagen, R. (2009). Why is facial occlusion a challenging problem? In Advances in Biometrics (pp. 299–308). Springer. [12] Gao, Y., & Leung, M. K. (2002). Face recognition using line edge map. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24, 764–779. [13] Geem, Z. W. (2008). Novel derivative of harmony search algorithm for discrete design variables. Applied Mathematics and Computation, 199, 223 – 230. [14] Geem, Z. W., Kim, J. H., & Loganathan, G. (2001). A new heuristic optimization algorithm: harmony search. Simulation, 76, 60–68. [15] Haghighat, M., Abdel-Mottaleb, M., & Alhalabi, W. (2016). Fully automatic face normalization and single sample face recognition in unconstrained environments. Expert Systems with Applications, 47, 23–34. [16] Harguess, J., & Aggarwal, J. (2011). Is there a connection between face symmetry and face recognition? In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on (pp. 66–73). IEEE. [17] Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report Technical Report 07-49, University of Massachusetts, Amherst. [18] Ji, Z., Liao, H., Wang, Y., & Wu, Q. (2007). A novel intelligent particle optimizer for global optimization of multimodal functions. In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on (pp. 3272–3275). IEEE. [19] Kanan, H. R., & Faez, K. (2010). Recognizing faces using adaptively weighted sub-gabor array from a single sample image per enrolled subject. Image and Vision Computing, 28, 438–448. [20] Kanan, H. R., Faez, K., & Gao, Y. (2008). Face recognition using adaptively weighted patch pzm array from a single exemplar image per person. Pattern Recognition, 41, 3799–3812. [21] Kumar, D., Kumar, S., & Rai, C. (2009). Feature selection for face recognition: a memetic algorithmic approach. Journal of Zhejiang University-Science A, 10, 1140–1152. [22] Lades, M., Vorbruggen, J., Buhmann, J., Lange, J., Von Der Malsburg, C., Wurtz, R., & Konen, W. (1993). Distortion invariant object recognition in the dynamic link architecture. Computers, IEEE Transactions on, 42, 300–311. doi:10.1109/12.210173. [23] Lahasan, B., Lutfi, S. L., & San-Segundo, R. (2017). A survey on techniques to handle face recognition challenges: occlusion, single sample per subject and expression. Artificial Intelligence Review, (pp. 1–31). [24] Lahasan, B. M., Venkat, I., Al-Betar, M. A., Lutfi, S. L., & De Wilde, P. (2016). Recognizing faces prone to occlusions and common variations using optimal face subgraphs. Applied Mathematics and Computation, 283, 316–332. [25] Lahasan, B. M., Venkat, I., & Lutfi, S. L. (2014). Recognition of occluded faces using an enhanced ebgm algorithm. In Computer and Information Sciences (ICCOINS), 2014 International Conference on (pp. 1–5). IEEE. [26] Lee, K., & Geem, Z. (2004). A new structural optimization method based on the harmony search algorithm. COMPUTERS & STRUCTURES, 82, 781–798. doi:10.1016/j.compstruc.2004.01.002. [27] Lenc, L., & Kr´al, P. (2015). Automatic face recognition system based on the sift features. Computers & Electrical Engineering, .
22
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
CR IP T
[28] Liu, F., Tang, J., Song, Y., Bi, Y., & Yang, S. (2016). Local structure based multi-phase collaborative representation for face recognition with single sample per person. Information Sciences, 346, 198–215. [29] Mahdavi, M., Fesanghary, M., & Damangir, E. (2007). An improved harmony search algorithm for solving optimization problems. Applied mathematics and computation, 188, 1567–1579. [30] Mahoor, M. H., Ansari, A., & Abdel-Mottaleb, M. (2008). Multi-modal (2-d and 3-d) face modeling and recognition using attributed relational graph. In Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on (pp. 2760–2763). IEEE. [31] Manjarres, D., Landa-Torres, I., Gil-Lopez, S., Del Ser, J., Bilbao, M., Salcedo-Sanz, S., & Geem, Z. (2013). A survey on applications of the harmony search algorithm. Engineering Applications of Artificial Intelligence, 26, 1818–1831. [32] Martinez, A. M., & Benavente, R. (1998). “The AR face database“. Tech. Rep. 24,. [33] Phillips, P. J., Beveridge, J. R., Draper, B. A., Givens, G., Toole, A. J., Bolme, D. S., Dunlop, J., Lui, Y. M., Sahibzada, H., & Weimer, S. (2011). An introduction to the good, the bad, & the ugly face recognition challenge problem. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on (pp. 346–353). IEEE. [34] Phillips, P. J., Flynn, P. J., Scruggs, T., Bowyer, K. W., Chang, J., Hoffman, K., Marques, J., Min, J., & Worek, W. (2005). Overview of the face recognition grand challenge. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (pp. 947–954). IEEE volume 1. [35] Priya, G. N., & BANU, R. W. (2014). Occlusion invariant face recognition using mean based weight matrix and support vector machine. Sadhana, 39, 303–315. [36] Singh, A. K., & Nandi, G. (2012). Face recognition using facial symmetry. In Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology (pp. 550–554). ACM. [37] Sinha, P., Balas, B., Ostrovsky, Y., & Russell, R. (2007). Face recognition by humans: Nineteen results all computer vision researchers should know about. Proceedings of the IEEE, 94, 1948–1962. [38] Su, Y., Shan, S., Chen, X., & Gao, W. (2010). Adaptive generic learning for face recognition from a single sample per person. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 2699–2706). IEEE. [39] Tan, X., Chen, S., Zhou, Z.-H., & Zhang, F. (2006). Face recognition from a single image per person: A survey. Pattern recognition, 39, 1725–1745. [40] Venkat, I., Khader, A. T., Subramanian, K., & Wilde, P. D. (2013). Recognizing occluded faces by exploiting psychophysically inspired similarity maps. Pattern Recognition Letters, 34, 903 – 911. [41] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on (pp. I–511). IEEE volume 1. [42] Wiskott, L., Fellous, J.-M., Kuiger, N., & Von der Malsburg, C. (1997). Face recognition by elastic bunch graph matching. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19, 775–779. doi:10.1109/34.598235. [43] Wright, J., Yang, A., Ganesh, A., Sastry, S., & Ma, Y. (2009). Robust face recognition via sparse representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31, 210–227. doi:10.1109/TPAMI.2008.79. [44] Xu, Y., Li, Z., Zhang, B., Yang, J., & You, J. (2017). Sample diversity, representation effectiveness and robust dictionary learning for face recognition. Information Sciences, 375, 171–182. [45] Xu, Y., Zhang, Z., Lu, G., & Yang, J. (2016). Approximately symmetrical face images for image preprocessing in face recognition and sparse representation based classification. Pattern Recognition, 54, 68–82. [46] Xu, Y., Zhu, X., Li, Z., Liu, G., Lu, Y., & Liu, H. (2013). Using the original and ?symmetrical face?training samples to perform representation based two-step face recognition. Pattern Recognition, 46, 1151–1158. [47] Yan, H., Wang, P., Chen, W., & Liu, J. (2015). Face recognition based on gabor wavelet transform and modular 2dpca, . [48] Zeng, J. (2014). A self-adaptive intelligent single-particle optimizer compression algorithm. Neural Computing and Applications, 25, 1285– 1292. [49] Zhou, J., Ji, Z., Shen, L., Zhu, Z., & Chen, S. (2011). Pso based memetic algorithm for face recognition gabor filters selection. In Memetic Computing (MC), 2011 IEEE Workshop on (pp. 1–6). IEEE. [50] Zhu, Z., Zhou, J., Ji, Z., & Shi, Y.-H. (2011). Dna sequence compression using adaptive particle swarm optimization-based memetic algorithm. Evolutionary Computation, IEEE Transactions on, 15, 643–658.
23