Accepted Manuscript
Ear recognition using local binary patterns: A comparative experimental study M. Hassaballah, Hammam A. Alshazly, Abdelmgeid A. Ali PII: DOI: Reference:
S0957-4174(18)30649-3 https://doi.org/10.1016/j.eswa.2018.10.007 ESWA 12252
To appear in:
Expert Systems With Applications
Received date: Revised date: Accepted date:
16 April 2018 20 August 2018 3 October 2018
Please cite this article as: M. Hassaballah, Hammam A. Alshazly, Abdelmgeid A. Ali, Ear recognition using local binary patterns: A comparative experimental study, Expert Systems With Applications (2018), doi: https://doi.org/10.1016/j.eswa.2018.10.007
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
Highlights • A comparative study of ear recognition using local binary patterns variants is done • A new texture operator is proposed and used as an ear feature descriptor
CR IP T
• Detailed analysis on Identification and verification is conducted separately
• An approximated recognition rate of 99% is achieved by some texture descriptors
AC
CE
PT
ED
M
AN US
• The study has significant insights and can benefit researchers in future works
1
1
ACCEPTED MANUSCRIPT
Expert Systems with Applications Expert Systems with Applications 00 (2018) 1–46
CR IP T
Ear recognition using local binary patterns: A comparative experimental study M. Hassaballaha,∗, Hammam A. Alshazlyb , Abdelmgeid A. Alic a
AN US
Computer Science Department, Faculty of Computers and Information, South Valley University, Luxor, Egypt. b Mathematics Department, Faculty of Science, South Valley University, Qena 83523, Egypt. c Computer Science Department, Faculty of Computers and Information, Minia University, Al Minia 61519, Egypt.
Abstract
AC
CE
PT
ED
M
Identity recognition using local features extracted from ear images has recently attracted a great deal of attention in the intelligent biometric systems community. The rich and reliable information of the human ear and its stable structure over a long period of time present ear recognition technology as an appealing choice for identifying individuals and verifying their identities. This paper considers the ear recognition problem using local binary patterns (LBP) features. Where, the LBP-like features characterize the spatial structure of the image texture based on the assumption that this texture has a pattern and its strength (amplitude)-two locally complementary aspects. Their high discriminative power, invariance to monotonic gray-scale changes and computational efficiency properties make the LBP-like features suitable for the ear recognition problem. Thus, the performance of several recent LBP variants introduced in the literature as feature extraction techniques is investigated to determine how can they be best utilized for ear recognition. To this end, we carry out a comprehensive comparative study on the identification and verification scenarios separately. Besides, a new variant of the traditional LBP operator named averaged local binary patterns (ALBP) is proposed and its ability in representing texture of ear images is compared with the other LBP variants. The ear identification and verification experiments are extensively conducted on five publicly available constrained and unconstrained benchmark ear datasets stressing various imaging conditions; namely IIT Delhi (I), IIT Delhi (II), AMI, WPUT and AWE. The obtained results for both identification and verification indicate that the current LBP texture descriptors are successful feature extraction candidates for ear recognition systems in the case of constrained imaging conditions and can achieve recognition rates reaching up to 99%; while, their performance faces difficulties when the level of distortions increases. Moreover, it is noted that the tested LBP variants achieve almost close performance on ear recognition. Thus, further studies on other applications are needed to verify this close performance. We believe that the presented study has significant insights and can benefit researchers in choosing between LBP variants as well as acting as a connection between previous studies and future work in utilizing LBP-like features in ear recognition systems. c 2018 Published by Elsevier Ltd.
Keywords: Intelligent biometrics systems, identity verification, ear recognition, local binary patterns, experimental evaluation.
∗
Corresponding author. Tel.:+201020968555 Email addresses:
[email protected] (M. Hassaballah),
[email protected] (Hammam A. Alshazly)
2
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
3
1. Introduction Nowadays, the increasing demands for high secure automated identity recognition systems have led to intensive research efforts in the inter-disciplinary field of computer vision and intelligent systems. Biometrics as an important application of surveillance, forensics, and many expert
CR IP T
and intelligent systems (Omara et al., 2016) refers to the science of recognizing humans using their physical or behavioral traits including face, iris, ear, fingerprint, palmprint, hand geometry, voice, and signature (Ghoualmi et al., 2016, Jain et al., 2004, Rakshit et al., 2018). For any of these traits to be used as a biometric characteristic, it must satisfy some requirements: (1) Universality: each individual should have the trait, (2) Distinctiveness: able to distinguish between different
AN US
individuals, (3) Permanence: has sufficient invariance over age, (4) Performance: achieves the required recognition accuracy and speed as well as robust to operational and environmental factors affecting them, (5) Collectability: can be acquired and measured easily and quantitatively, and (6) Acceptability: which means that to what extent individuals accept these biometric characteristic technologies.
M
Each of the aforementioned biometric modalities has its advantages and disadvantages with
ED
no single modality being ideal for all expert systems applications (Chang et al., 2003, Hezil & Boukrouche, 2017). This paper focuses on the human ear as a promising and distinctive biometric modality which contains stable and reliable information as well as shape structure that does not
PT
show drastic changes with age. Figure 1 shows the external structure of the ear and its various morphological components including: helix, antihelix, tagus, antitragus, lobe, concha and other parts.
CE
While the outer structure is relatively simple, the variability between two ears is distinguishable enough even for identical twins (Nejati et al., 2012). Moreover, ear images have many advantages
AC
over others such as: easily acquired from a distance without cooperation of individuals, have uniform color distribution for the ear surface and are invariant to facial expressions. Therefore, the analysis of ear images for extracting such unique and distinguishable features to identify individuals and verify their identities is an active research topic and an emerging intelligent biometric application (Emerˇsiˇc et al., 2017, Gald´amez et al., 2017, Pflug & Busch, 2012). The existing ear recognition techniques can be roughly classified depending on the type of 3
ACCEPTED MANUSCRIPT
4
CR IP T
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
AN US
Figure 1: External structure of the human ear.
feature extraction method into geometric (Chora´s, 2005, 2008), holistic (Arbab-Zavar & Nixon, 2011, Fooprateepsiri & Kurutach, 2011, Hanmandlu & Mamta, 2013, Wang & Yuan, 2010, Yuan et al., 2006), and hybrid (Benzaoui et al., 2015, Huang et al., 2013, Kumar & Chan, 2013, Morales
M
et al., 2015, Pflug et al., 2014) approaches. Under each category, a variety of ear recognition techniques have been proposed in the literature. In (Hurley et al., 2005), force field line feature
ED
extraction is utilized for ear biometrics. To distinguish ear features, the tunable filter bank based on a half-band polynomial of 14th order is employed for feature extraction from ear images in
PT
(Chowdhury et al., 2018). Besides, some techniques depend on features extracted from 3D ear images (Prakash & Gupta, 2012, 2014, Zhang et al., 2017, Zhou et al., 2012). However, these
CE
approaches are not effective under uncontrolled imaging conditions like rotation, scaling, lighting variations, occlusion and noise (Abaza & Bourlai, 2013, Abaza et al., 2013, Yuan et al., 2016).
AC
On the other hand, local features based approaches which have proven successful in other
biometric applications are also proposed for ear biometrics (Annapurani et al., 2015, Anwar et al., 2015, Bustard & Nixon, 2010). These approaches extract the local features of ear by first detecting a set of interesting points (keypoints) from images and then an independent descriptor is computed for each keypoint. Noticeable advantages of these approaches are their robustness to partial occlusion and various affine transformations (Hassaballah et al., 2016). However, global information 4
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
5
about the shape structure of the ears, which is important for efficient recognition, is lost. Some local features based approaches compute the local features densely over the entire image considering all pixels as keypoints. Examples for such approaches are Gabor filters (Kumar & Wu, 2012, Meraoumia et al., 2015), log-Gabor filters (Arbab-Zavar & Nixon, 2008), curvelet representation (Basit & Shoaib, 2014), and 2D quadrature filters (Chan & Kumar, 2012). The approaches relying
CR IP T
on dense descriptor computation are found to give high recognition performance and are preferred due to their computational simplicity (Mawloud & Djamel, 2016).
Since feature extraction is at the core of any expert system, several feature extraction techniques have been introduced to capture intrinsic and discriminative local structures from images.
AN US
Among these techniques are texture descriptors, which encode and characterize the texture information of images. Local binary patterns (Nanni et al., 2012, Ojala et al., 1996, 2002) represents one prominent texture descriptor that has shown effective results in some computer vision applications such as face recognition (Ahonen et al., 2006, Liu et al., 2016) and object detection (Pan et al., 2017, Satpathy et al., 2014, Trefn`y & Matas, 2010). The success of the LBP in these ap-
M
plications motivated us to utilize it in the ear recognition problem. In this regard, few researchers considered using LBP in ear recognition either as a stand-alone image representation technique
ED
or combined with other methods. In (Boodoo-Jahangeer & Baichoo, 2013), the LBP descriptor is used to extract ear features. The performance is tested using ear images for 125 subjects from IIT
PT
Delhi ear dataset (Kumar & Wu, 2012). The authors reported that ”LBP has a high discriminative power, tolerance against global illumination changes and low computational load compared
CE
to Principal Components Analysis (PCA); and the recognition rate is about 93%”. Guo and Xu (Guo & Xu, 2008) combined the LBP descriptor with local similarity binary pattern (LSBP) to
AC
include more information about connectivity of neighboring pixels. They reported a recognition rate of approximately 93% on the USTB ear dataset. While, Benzaoui et al. (Benzaoui et al., 2015) used elliptical local binary patterns (ELBP) descriptor, a modification for the basic LBP, to characterize fine details of ear images. For dimensionality reduction purpose and to select useful information, the Discrete Wavelet Transform (DWT) is employed on each global histogram. The IITD ear dataset with 500 images for 100 subjects is used to evaluate the recognition performance and the obtained recognition rate is about 94 %. For human identification from his/her 2D ear 5
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
6
images, Benzaoui et al. (Benzaoui et al., 2014) proposed an approach based on some local texture descriptors such as LBP, local phase quantization, and binarized statistical image features. The reported results on the IIT Delhi-I, IIT Delhi-II, and USTB ear databases confirmed in general the success of local texture features in ear identification. Due to the increasing number of the LBP variants proposed recently for specific problems,
CR IP T
as well as the demand for better feature description in various fields of application, a question of suitability arises. Indeed, not all types of the LBP variants are suitable for a particular application and cope with all kinds of image variability encountered in the real conditions. Consequently, application-specific comparative studies of LBP variants suitability demand more attention. In
AN US
this paper, we make a significant step further and perform the first, to the best of our knowledge, comprehensive evaluation for several LBP variants in ear recognition. Where, we analyze and compare the performance for two scenarios: identification (one-to-many) and verification (oneto-one) under various imaging conditions using constrained and unconstrained benchmarking ear
M
datasets. In this context, the main contributions of this work are summarized as follows: • Providing a clear analysis for the recently proposed LBP variants.
ED
• An extensive set of experiments is carried out on five challenging datasets to evaluate the applicability of using LBP variants in ear recognition.
• The experiments are conducted separately on the identification and verification scenarios.
PT
As far as we know, this has not been studied in the literature.
• A new variant of the traditional LBP operator named averaged local binary patterns (ALBP)
CE
is proposed, computed its uniform version (UALBP) and compared their performance with
AC
recently proposed counterparts. The rest of the paper is organized as follows. Section 2 presents a succinct description and
analysis of various LBP texture descriptors used as local feature extractors and encoding schemes. Section 3 describes the architecture and phases of the ear recognition evaluation framework followed in this work. While, experimental results and comparisons are reported in Section 4. Section 5 gives an insightful discussion and performance analysis of the LBP variants. Finally, conclusions and future research directions are drawn up in Section 6. 6
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
7
2. Feature extraction and encoding schemes A good recognition performance is generally attributed to accurate and robust representation of image information which is considered one of the difficulties facing object recognition systems. The LBP and its variants have achieved impressive results in representing and analyzing image tex-
CR IP T
ture (Pietik¨ainen et al., 2011); and their occurrence histogram computed for all image pixels can be used as a powerful local patterns descriptor. Where, these local patterns describe micro-structures (e.g., edges, corners, flat region) and their underlying distribution is estimated by the histogram. That is, the histogram represents local structures extracted from the entire image via combining both structural and statistical information. Figure 2 illustrates an example for obtaining the feature
AN US
histogram representing the entire ear image where the ear image is represented by concatenating a set of local LBP histograms. The basic notation, encoding methodology and the characteristics
PT
ED
M
of the LBP operator and its variants are discussed briefly in the following subsections.
CE
Figure 2: The steps for obtaining the final feature histogram representation using LBP and its variants.
AC
2.1. Basic LBP
The basic LBP descriptor was first introduced in (Ojala et al., 1996), which characterizes the
spatial structures of local texture patterns in images using a 3 × 3 square patch around a central
pixel as illustrated in Fig. 3. The central pixel with intensity gc is compared with each of its eight neighbors, gi (i = 0, 1, ..., 7) and a binary value is produced according to the following threshold
7
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
function
1, i f gi ≥ gc s(gi − gc ) = 0, i f gi < gc .
8
(1)
obtain a LBP number representing the texture unit: 7 X s(gi − gc ) × 2i . LBP = i=0
CR IP T
Then, the values of the thresholded pixels are weighted by a binomial factor of 2i and summed to
(2)
In this context, the basic LBP descriptor extracts local structures (features) from images that
are gray-scale invariant by considering the occurrence probability of all obtained patterns. For an input image of size M × N and after obtaining the LBP patterns for all image pixels, their
AN US
occurrence histogram is computed to measure the distribution over the entire image and used as a feature descriptor:
(3)
M
M−1 X N−1 X 1, i f x = y H(k) = f (LBP(i, j), k), k ∈ [0, K], with f (x, y) = i=0 j=0 0, otherwise,
where K represents the maximum value for the LBP code. Since the basic LBP descriptor is
ED
restricted to 3 × 3 image patches with 8 neighbors, an 8-bit binary number representing one of
28 = 256 (i = 0, 2, ..., 255) possible distinct patterns can be obtained. This basic LBP shows high
PT
discrimination ability, success in various texture classification, and works well for extracting local structures from images. However, it is sensitive to image rotation, very sensitive to noise, and
CE
produces long histograms (i.e., 256 bins for an 8 neighborhood) leading to large memory storage
AC
requirements.
Figure 3: Basic LBP computation for a 3 × 3 image patch and the corresponding LBP code.
8
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
9
2.2. Rotation Invariant LBP An important property of a feature descriptor is to be robust against image rotations, which is not supported by the basic LBP descriptor. The first rotation invariance attempt for the basic LBP descriptor is introduced in (Pietik¨ainen et al., 2000). Figure 4 illustrates the steps for obtaining the
CR IP T
rotation invariant version of the LBP operator (RILBP). First, we obtain the binary patterns of the thresholded neighbors around the center pixel represented as 8-bit strings in clockwise direction. Then, an arbitrary number of binary shifts is made until the pattern matches one of the 36 possible patterns of ”1” and ”0” an 8-bit string can form under rotation. The matching pattern index is returned as a feature value describing a rotation invariant LBP of this particular neighborhood.
AN US
Even though this descriptor inherits some of the LBP properties such as invariance to monotonis gray-scale variation, computationally efficient, and produces shorter histogram than the basic LBP operator. However, the RILBP descriptor is found to give poor discriminative ability even for rotated images. This is attributed to two main reasons. First, quantizing the angular space at
AC
CE
PT
ED
greatly (Pietik¨ainen et al., 2000).
M
restricted 45◦ intervals. Second, varying the frequency of occurrence for the 36 RILBP patterns
Figure 4: Computing the rotation invariant version of LBP from a 3 × 3 image patch.
2.3. Uniform LBP Ojala et al. in (Ojala et al., 2002) introduced two very useful extensions to the basic LBP descriptor. First, by defining the LBP descriptor for any spatial resolution using P equally spaced 9
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
10
pixels on a circle of radius R that form a circularly symmetric set of neighbors. Second, via defining a set of fundamental binary patterns, so-called ”uniform” patterns, that occur more frequently than others. The occurrence histogram of these patterns is proven to be a very efficient texture feature and can be used to characterize image patches containing structural information such as edges, spots and corners. A local pattern is called uniform if it has at most two 0 to 1 or 1 to 0 spa-
CR IP T
tial transitions. For example, 00111100 and 11100011 (2 transitions) are uniform patterns; while, 10001010 and 10101000 (5 transitions) are not. The uniformity measure U is used to compute the number of bitwise transitions in the binary patterns and it is defined as:
i=1
|s(gmod(i,p) − gc ) − s(gi−1 − gc )|.
(4)
AN US
U(LBPP,R ) =
P X
When using patterns that have only (U ≤ 2), the resulting LBP descriptor is referred to as uniform LBP (LBPu2 P,R ) and the superscript u2 refers to using ”uniform” patterns that have U value of at most 2. Generally, LBPu2 P,R , has (P ∗ (P − 1) + 3) distinct patterns and can be implemented using
a lookup table of 2P elements. The occurrence histogram of the LBPu2 P,R , assigns a unique label
M
between 0 and (P ∗ (P − 1) + 1) for each uniform pattern and a single label (P ∗ (P − 1) + 2) for all
non-uniform patterns. As a result, the length of the feature histogram for P neighbors is reduced
ED
from 2P to (P∗(P−1)+3) dimensions. This means that, the uniform patterns can reduce the length of histogram vectors. Thus, the resulting descriptor is computationally efficient (i.e., speed up in
PT
computations and saving required memory space to store the feature vectors). However, using only uniform patterns to describe image texture slightly affects the discrimination power as all
CE
non-uniform patterns are assigned to a single label; thus, some texture information are discarded.
AC
2.4. Rotation Invariant Uniform LBP In (Ojala et al., 2002), Ojala et al. extended the RILBP descriptor for optional spatial resolution
on a circular symmetric set of neighbors using P equally spaced pixels within a circle of radius R. The resulting descriptor is referred to as rotation invariant LBP(LBPriP,R ), which assigns a unique identifier to each rotation invariant binary pattern. To this end, the set of neighbors is circularly shifted in clockwise direction so many times until a maximal number of the most significant bits becomes 0 by using
10
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
LBPriP,R = min{shi f t(LBPP,R , i) |i = 0, 1, ..., P − 1},
11
(5)
where shi f t(LBPP,R , i) performs a circular bit-wise right shift operation on LBPR,P (i.e., an P-bit number), i times. For a circularly symmetric neighborhood of P = 8 sampling points equally
CR IP T
spaced on a circle of radius R, the LBPriR,P has 36 unique rotation invariant binary patterns. Note, If R is set to 1, then the LBPri8,1 becomes the rotation invariant descriptor LBPROT proposed in (Pietik¨ainen et al., 2000).
Another improvement over the LBPriPR descriptor (Ojala et al., 2002) is via considering only ”uniform” patterns that have at most 2 spatial transitions (U ≤ 2) in a circularly symmetric neigh-
AN US
borhood of P pixels and the resulting descriptor is called rotation invariant uniform LBP (LBPriu2 P,R ), which takes the form
(6)
M
LBPriu2 P,R
PP−1 i=0 f (gi − gc ), i f U(LBPP,R ) ≤ 2 = P + 1 otherwise,
where the uniformity measure (U) is defined by Eq. (4). Generally, (P + 1) uniform binary
ED
patterns exist in a circular symmetric neighborhood of P pixels. According to Eq. (6), each uniform pattern is assigned a unique label and all non-uniform patterns are grouped under a single label (P + 1). Practically, mapping from LBPP,R to LBPriu2 P,R which has (P + 2) distinct patterns and
PT
can be implemented using a lookup table of 2P elements.
CE
2.5. Center Symmetric LBP
Center symmetric LBP (CSLBP) descriptor (Heikkil¨a et al., 2009) was introduced as a region
AC
descriptor that combines the strengths of the well-known SIFT descriptor and the LBP. The CSLBP descriptor replaces the gradient features in the SIFT descriptor with a local binary pattern for each pixel. Further, within the CSLBP, instead of thresholding each pixel against the central pixel as the basic LBP, only horizontal, vertical, and diagonal pixel comparisons are used as depicted in Fig. 5.
11
ACCEPTED MANUSCRIPT
12
CR IP T
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
Figure 5: Computing the CSLBP pattern for an 8 neighborhood of pixels.
AN US
Mathematically, the CSLBP descriptor is defined for a center pixel with coordinates (xc , yc ) and a set of P equally spaced pixels on a circle of radius R as follows:
P/2−1 X 1, i f n > T i CS LBPP,R,T (xc , yc ) = s(gi − gi+(P/2) ) × 2 , s(n) = i=0 0, otherwise,
(7)
M
where gi and (gi + (P/2)) are the gray values of the center symmetric pairs of pixels in the neighborhood; and the threshold value T is set to 1% of the pixel value range. Generally, 2P /2 distinct
ED
patterns can be obtained for a set of P neighboring pixels. For a neighborhood of P = 8 pixels and a radius R = 1 pixel, the CSLBP obtains one of 24 = 16, (i = 0, 1, ..., 15) possible distinct
PT
patterns. As a result, the dimensionality of the resulting feature histogram is reduced significantly from 28 = 256 to 24 = 16 dimensions. The resulting features are claimed to be robust on flat image
CE
regions, tolerant to illumination variations, and computationally efficient (Heikkil¨a et al., 2009).
AC
2.6. Median Binary Patterns Median Binary Patterns (MBP) (Hafiane et al., 2007), another extension for the original LBP
descriptor, extracts localized binary patterns via thresholding the pixels in the neighborhood around the central pixel with their median value. Then, the thresholded values are weighted with a binomial factor of 2i and summed to obtain the final MPB code as illustrated in Fig. 6. Similar to LBP, it uses binary codes to represent the texture information; however, it compares the pixels with their median value instead of their center pixel value and includes the value of the central pixel in the 12
ACCEPTED MANUSCRIPT
13
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
computation to obtain the local binary patterns. The MBP descriptor is computed for a central pixel using any square image patch as follows
MBP =
P−1 X
(8)
CR IP T
i=0
1, i f gi ≥ Median i s(gi ) × 2 , and s(gi ) = 0, otherwise,
where P is the total number of pixels in the neighborhood around the center pixel and gi is the gray value of the i-th pixel. For a 3 × 3 image patch, Equation (8) produces one of 28 = 512, (i =
0, 1, ..., 511) possible distinct patterns. Therefore, to represent the occurrence of these patterns, the
length of the resulting histogram has 512 different bins. Generally, the big values of p result in long
AN US
histograms, while small values produce short feature vectors. The MBP features has robustness against illumination variations because the median thresholding is independent of image intensities (Hafiane et al., 2007). However, MBP descriptor produces longer histogram (512 dimensions for
ED
M
a 3 × 3 square patch) to represent the local features.
PT
Figure 6: An illustration for computing the MBP code using a 3 × 3 image patch, where the median value is 45.
CE
2.7. Completed LBP
Guo et al. (Guo et al., 2010) proposed a completed modeling for the LBP descriptor (CLBP)
AC
to address some limitations and issues related to the LBP-based feature representation. In the CLBP variant, local regions are represented by their central pixels and the local difference signmagnitude transform (LDSMT). Figure 7 illustrates the different components required to encode the center pixel of a 3 × 3 image patch using CLBP descriptor. Generally, the CLBP descriptor
encodes image texture and local structures for a set of P neighboring pixels equally spaced on a circle of radius R around the center pixel according to the following steps: First, the center pixel is coded in a binary form after global thresholding defined by 13
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
(b)
(c)
(d)
CR IP T
(a)
14
Figure 7: Components of CLBP operator: (a) a 3 × 3 image patch with a center pixel 60; (b) the local differences; (c) sign components and (d) magnitudes components.
(9)
AN US
CLBP C P,R = T (gc , c),
1, i f x ≥ c and T (x, c) = 0, i f x < c,
where gc is the gray level of the center pixel and c is a threshold value set as the average gray level of the entire image. By using Eq. (9) for all image pixels, we obtain a binary map which is denoted as CLBP-Center (CLBP C). Second, the LDSMT decomposes the local structures of images
M
into two complementary components: signs and magnitudes differences. Thus, two operators are
PT
ED
suggested for encoding them, which are CLBP-Sign (CLBP S) and CLBP-Magnitude (CLBP M). P−1 X 1, i f x > 0 i CLBP S P,R = T (si ) × 2 , and T (x) = (10) i=0 0, i f x < 0,
where si is the sign of the i-th neighbor after thresholding with the center pixel as in the original
CE
LBP. The signs components have binary values of ”1” and ”-1”; where ”-1” is coded as ”0”.
AC
CLBP MP,R =
P−1 X i=0
T (mi , c) × 2i ,
1, i f x ≥ c and T (x, c) = 0, i f x < c,
(11)
where mi is the magnitude of the i-th neighbor after thresholding with an adaptive threshold c set
as the mean value of mi from the entire image. Finally, the three encoded maps, CLBP C, CLBP S and CLBP M are combined together to obtain the final CLBP feature map for the whole image using one of two forms. The first, is by building a 3D joint histogram referred to as CLBP S/M/C. The second, is via building a 2D joint histogram first, CLBP S/C or CLBP M/C, then converting 14
ACCEPTED MANUSCRIPT
15
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
it to 1D histogram and concatenating it with CLBP M or CLBP S to generate a joint histogram, referred to as CLBP M S/C or CLBP S M/C. In this experimental study, the second approach is adopted in computing the final CLBP feature histogram. 2.8. Elliptical LBP
CR IP T
Elliptical local binary patterns (ELBP) (Nguyen & Caplier, 2012) was proposed as a feature descriptor that utilizes vertical and horizontal ellipse patterns to capture micro facial features from face images. The ELBP is computed for the center pixel by considering its neighbors that lie on an ellipse as shown in Fig. 8. For a center pixel with spatial coordinates (xc , yc ) in a neighborhood
ELBPP,r1 ,r2 (xc , yc ) =
P−1 X
AN US
of P pixels at (r1 , r2 ) distances from the center, the ELBP operator can be defined as
s(giP,r1 ,r2
i=0
1, i f x ≥ 0 i − gc ) × 2 , s(x) = 0, i f x < 0.
(12)
When r1 > r2 , we obtain a horizontal ellipse and the resulting ELBP operator is referred to as the
M
horizontal ELBP (HELBP). Similarly, when r1 < r2 , we obtain a vertical ellipse and the resulting
(a) ELBP8,2,1
(b) ELBP8,1,2
AC
CE
PT
operator is obtained.
ED
ELBP operator is referred to as the vertical ELBP (VELBP). While, if r1 = r2 , the original LBP
Figure 8: ELBP computation using horizontal and vertical ellipses producing HELBP and VELBP, respectively.
The final ELBP feature histogram for a given image is obtained by applying two symmetric
ELBP operators, the horizontal ELBP and the vertical ELBP, to produce two ELBP coded images. Then, each coded image is divided into w × h sub-regions and a histogram is computed for each sub-region. After that, for each of the coded images, the histograms from all the sub-regions 15
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
16
are concatenated into a single histogram. Finally, we concatenate the two histograms from both images to obtain the complete horizontal and vertical ELBP (i.e., ELBP) feature vector representing the given image. For an eight neighborhood of pixels, the ELBP operator produces one of 28 = 256 possible patterns and the length of the final feature histogram is (2 × w × h × 256) dimensions. However, if the uniform patterns are used, the length of the feature histogram is reduced to
CR IP T
(2 × w × h × 59) dimensions. Clearly, either horizontal or vertical ELBP operators can be used
individually to obtain the final feature vector representing the image, with a minimal decrease in performance. 2.9. Noise Tolerant LBP
AN US
Noise tolerant LBP (NTLBP) (Fathi & Naghsh-Nilchi, 2012) was introduced as a noise resistant extension for the LBP descriptor to preserve local structures of an image in presence of noise. Where, it utilizes the set of uniform patterns and all regular non-uniform patterns to better describe texture and local structures from images. This NTLBP operator starts with eliminating the noisy
M
bits in the obtained binary patterns using a circular majority voting filter to increase the number of regular patterns. Then, it defines a new labeling scheme that assigns a unique rotation invariant
ED
label for each sub-set of local patterns that have the same bitwise transitions and number of ones ”1” bits according to the criteria
PT
PP−1 NT NT ) < 4 i f U(BC P,R,K i=0 BC P,R,K (i), PP−1 NT NT = P − 1 + i=0 BC P,R,K (i), i f U(BC P,R,K ) = 4 NT NT 2P − 6 + U(BC P,R,K )/2, i f U(BC P,R,K ) > 4,
AC
with
CE
LBPNT P,R,k
NT BC P,R,K (i) = CMa ji2k+1 (BC P,R ),
(13)
(14)
and the CMa ji2k+1 (BC P,R ) is computed as the floor b.c of CMa ji2k+1 (BC P,R )
=
$" X k
j=−k
#, % BC P,R ((P + i + j)modP) (k + 1) , 16
(15)
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
17
where P is equally spaced neighboring points that lie on a circle of radius R; while, k is the number of noisy bits that should be modified by the filter in the obtained patterns. To obtain the final feature vector describing the image texture, the occurrence histogram of different local patterns computed over the entire image is computed. The NTLBP extension has some distinctive characteristics such as extracting the dominant uniform and non-uniform patterns without ignoring information
CR IP T
about the type of pattern. Further, it has lower sensitivity to noise and can be computed with low computational complexity. 2.10. Adjacent Evaluation Completed LBP
Song et al. (Song et al., 2015) proposed a texture descriptor called the adjacent evaluation
AN US
local binary patterns (AELBP) based on modifying the thresholding strategy of the traditional LBP operator to gain robustness against noise. The AELBP operator is computed for a neighborhood with p neighbors equally spaced on a circle of radius R by constructing N evaluation windows of a specific size. The neighboring pixels (p = 0, ..., N − 1) are set as the evaluation centers
M
for the constructed evaluation windows. Then, each evaluation center is replaced by a new value (a p , p = 0, ..., N − 1) computed as the average of the pixels values in the p-th window excluding the
ED
evaluation center. The binary codes are obtained by comparing the values of (a p ) with the central pixel of the neighborhood (gc ) according to the following formula:
PT
P−1 X
AELBPP,R =
CE
p=0
1, i f x ≥ 0 p s(a p − gc ) × 2 , s(x) = 0, i f x < 0,
(16)
The adjacent evaluation method can be integrated with many LBP variants in order to obtain
AC
more robust and discriminative features. To this end, the method is integrated with the completed local binary patterns discussed in Section 2.7, and the new descriptor is referred to as adjacent evaluation completed local binary patterns (AECLBP). Here, the two complementary components of the image local differences: the signs (s p ) and the magnitudes (m p ) are given by s p = s(a p − gc ) 17
(17)
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
18
and (18)
m p = |a p − gc |.
Then, two operators are proposed to code them AECLBP S and AECLBP M. Since AECLBPS can be obtained in a similar manner using Eq. (16), the AECLBP M is given by
CR IP T
AELBP MP,R
1, i f x ≥ c p t(m p , c) × 2 , t(x, c) = = p=0 0, i f x < c, P−1 X
(19)
where c is set as the mean value m p over the entire image. The center operator AECLBP C is identical to the CLBP C component given in Eq. (9). To obtain the final feature histogram, the
AN US
three operators can be combined either in a joint or hybrid method. Here, we adopted the joint histogram AECLBP S M/C for computing the final feature histogram. 2.11. Dominant Rotated LBP
The dominant rotated local binary patterns (DRLBP) is a rotation invariant texture descriptor
M
proposed in (Mehta & Egiazarian, 2016) as an extension of the traditional LBP operator. The DRLBP operator utilizes the two complementary components of signs and magnitudes for captur-
ED
ing the structural information and achieving robustness against rotation. Similar to conventional LBP, it uses the weighted local differences between a central pixel and its neighbors to obtain the
PT
binary codes representing the neighborhood. In addition, it provides a rotation invariance mechanism based on a reference direction computed locally from the neighborhood as the index of the
CE
pixel with the maximum difference from the central pixel. The reference direction can be formally
AC
defined as:
D = arg max|g p − gc |, p ∈ (0, 1, ..., p − 1).
(20)
The DRLBP descriptor for P neighbors placed equidistantly on a circle of radius R is obtained
by rotating the weights according to the dominant direction D and is defined as
DRLBPP,R
P−1 X 1, i f g p ≥ gc mod(p−D,P) = s(g p − gc ) × 2 , s(g p − gc ) = p=0 0, i f g p < gc , 18
(21)
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
19
where mod refers to the modulus operator and the weight 2mod(p−D,P) depends on D. The final feature vector is obtained by building a histogram of occurrence for the different local patterns computed over the entire image. 2.12. Averaged LBP
CR IP T
One obvious drawback of the basic LBP operator and some of its variants is their sensitivity to noise and non-monotonic intensity changes. These approaches obtain their binary words (codes) by comparing the neighborhood pixels with a single pixel (i.e., the center pixel). When the intensity of the central pixel is affected by any of these factors, its value will change and as a result changing the threshold value for the neighboring pixels. As a consequence, the obtained binary
AN US
word will change and resulting in a different code representing the local structure unit. To overcome this drawback, we propose to replace the center pixel comparisons with the average value of all pixels in the neighborhood including the central pixel. Since the average is computed over all the neighborhood pixels, changing a single pixel has less impact on the threshold value and in turn
M
the resulting binary words or codes. The obtained local binary patterns are referred to as averaged local binary patterns (ALBP). The ALBP operator encodes each pixel in the given image and is
ED
computed for a center pixel (gc ) at locations (xc , yc ) in a circular symmetric set of P neighboring
AC
and
CE
PT
pixels equally spaced on a circle of radius R, (R > 0) as follows: P X 1, i f x ≥ 0 i−1 ALBPP,R = s(gi − µ) × 2 , s(x) = i=1 0, i f x < 0, P 1 X ( gi + gc ), µ= P + 1 i=1
(22)
(23)
where gi is the intensity value of the i-th neighboring pixel, gc is the intensity value of the center pixel, and µ is the average intensity value of all the sampling pixels including the center pixel. Given the spatial coordinates of the center pixel (gc ) at spatial coordinates (xc , yc ), the coordinates
of the i-th sampling pixel (i = 1, 2, . . . , P) are computed by the following formulas: angle step = 19
2∗π , P
(24)
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
20
xi = xc + R ∗ cos((i − 1) ∗ angle step)),
(25)
yi = yc − R ∗ sin((i − 1) ∗ angle step)).
(26)
Figure 9 shows various circular symmetric sets of neighbors for different values of P and R. The gray values of the sampling pixels that do not fall exactly in the center of pixels are estimated by
(a) (P = 4, R = 1)
AN US
CR IP T
interpolation.
(b) (P = 8, R = 1)
(c) (P = 8, R = 2)
(d) (P = 16, R = 2)
Figure 9: Circular symmetric neighborhoods for different values of (P, R).
Figure 10 illustrates the encoding process for a center pixel (55) by the ALBP operator using
CE
PT
ED
M
a 3 × 3 discrete neighborhood of eight pixels thresholded with the average value (50).
Figure 10: ALBP encoding for a center pixel using eight neighbors and an average value 50.
AC
The ALBP operator is applied to each pixel of the input image. The feature descriptor is
obtained by computing the 2P bin histogram of ALBP codes. For a given image of size M × N
pixels and after obtaining ALBP codes for all the image pixels, their occurrence histogram can be computed to measure their distribution over the entire image by M−1 X N−1 X 1, i f x = y H(k) = f (ALBPP,R (i, j), k), k ∈ [0, K], f (x, y) = i=0 j=0 0, otherwise, 20
(27)
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
21
where K = 2P − 1 is the maximal value for any ALBP code. It is clear that the ALBP descriptor defined for a set of P pixels produces 2P possible distinct patterns and the final feature histogram
is of 2P dimensions. When the dimensionality of the obtained feature descriptor is an issue, we compute the uniform version of the ALBP operator by considering only uniform patterns that have spatial transitions of at most 2 and the resulting descriptor is referred to as the uniform ALBP
CR IP T
(UALBP). As a result of using only uniform patterns, the dimensionality of the resulting feature histogram for a set of P neighboring pixels is reduced from 2P to (P ∗ (P − 1) + 3) dimensions.
The main characteristics of the LBP operator and its variants in terms of pros and cons are
reported in Table 1. While, their ability mentioned above in encoding image texture and preserving
AN US
the intrinsic appearances of ear images is depicted in Fig. 11. In fact, it can be seen that small and fine details of edges, corners, line and flat regions are also captured. Moreover, the resulting encoded images from majority of LBP variants tested in this work vividly preserve the ear shape and appearance except for the RIULBP, CSLBP, and NTLBP variants, where the appearance of the ear features are unclear as shown in Fig. 11(e,f,j). This indicates that the ability to preserve
M
the object shape and appearance depends to a large extent on the number of distinct patterns used to represent the image texture. When this number is large enough, the appearance of the encoded
ED
image becomes clear and when this number starts to decrease, the appearance of the encoded image features becomes darker and unclear. Here, RIULBP, NTLBP, and CSLBP use the lowest
AC
CE
PT
number of patterns to encode and represent the image texture, i.e., 10, 16, 16 patterns respectively.
21
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
22
Table 1: A summary for the main characteristics of the LBP variants. Pros
Cons
Basic LBP
Basic idea for all other variants, invariant to monotonic
Sensitive to noise and rotation, restricted to eight neigh-
gray scale changes, easy to implement, low computational
bors, long histogram, only signs of differences are used to
complexity
encode texture
Rotation invariant version of LBP, significant reduction in
Sensitive to noise, only signs of differences are used to en-
feature dimensionality
code texture
Rotation Invariant LBP Uniform LBP
Recognizes specific texture patterns called uniform patterns, adjustable for the number of neighbors and angular space, multi-resolution approach, produces shorter histogram
Rotation Invariant uniform
Multi-resolution approach, rotation invariant, uses only
LBP
specific subset of uniform patterns, produces a very short sionality
Center Symmetric LBP
Sensitive to noise and rotation, assigning all non-uniform patterns a single label in histogram building
Sensitive to noise, lower performance than uniform LBP as only few patterns are used, some neighbors’ values are computed using interpolation which requires more compu-
AN US
description leading to significantly lower feature dimen-
CR IP T
Method
tation
Considers gray-level differences between pairs of opposite
Sensitive to rotation and noise, requires setting an appro-
pixels in a neighborhood, invariant to illumination varia-
priate threshold value T to increase the robustness on flat
tions, computationally efficient, robust on flat regions, pro-
image regions
duces very short histogram leading to significant dimensionality reduction
Thresholds the neighboring pixels against their median
Sensitive to rotation, computing median requires more
value, invariant to monotonic gray-scale changes, easy to
computation, longer histogram, increasing feature dimen-
implement
sionality
A generalization for the conventional LBP, improves per-
ED
Completed LBP
M
Median Binary Patterns
Sensitive to center pixel alterations, produces double the
formance by exploiting two complementary components
length of the basic LBP histogram, sensitive to rotation,
signs and magnitudes, invariant to illumination changes,
some neighbors’ values are computed by interpolation
easy to obtain a rotation invariant version of CLBP, extend-
which requires more computation
Encode micro-level features in horizontal and vertical di-
Sensitive to rotation, produces double the length of the his-
rections, invariant to illumination changes, more suit-
togram generated by basic LBP but this can be overcome
able for extracting facial features of face images, multi-
by using uniform patterns, computing vertical and horizon-
resolution approach, can use vertical or horizontal variants
tal operators requires more computation
CE
Elliptical LBP
PT
able to multi-resolution approach
separately or combined
AC
Noise Tolerant LBP
Introduced a new labeling scheme for conventional LBP to
Sensitive to rotation, requires setting a value for the param-
describe patterns, uses circular majority voting filtering to
eter k, requires more computation
reduce noise effect
Adjacent Evaluation Com-
Thresholds the neighboring pixels against specific eval-
Sensitive to center pixel alterations, produces double the
pleted LBP
uation windows around the pixels, improved discrimina-
length of the basic LBP histogram, some neighbors’ values
tive power, robust under illumination variations, robust to
are computed by interpolation which requires more compu-
noise, increased robustness
tation, evaluation windows are computationally demanding
Dominant Rotated LBP
Averaged LBP
Allows to select most common patterns, rotation invariant,
Requires computing the dominant orientation, sensitive to
utilizes signs and magnitudes components leading to more
center pixel alteration and noise, some neighbors’ values
discriminative power, invariant to illumination variations,
are computed by interpolation which requires more com-
produces short description
putation
22
Robust against center pixel alteration, minimum suscepti-
Requires computing average of the neighboring pixels, uti-
bility to noise and other illumination variations, easily ex-
lizes only the signs of differences for the neighboring pix-
tendable to multi-resolution approach, easily computed in a
els to describe texture, sensitive to rotation, some neigh-
rotation invariant manner, increased discriminative power,
bors’ values are computed by interpolation which requires
easily implemented and has low computational complexity
more computation
ACCEPTED MANUSCRIPT
23
(b) BLBP (8,1)
(c) ULBP (8,1)
(f) CSLBP (8,1)
(g) CLBP (8,2)
(h) MBP (8,1)
(d) RILBP (8,2)
AN US
(a) Original image
CR IP T
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
(j) NTLBP (8,2)
PT
ED
M
(i) ELBP
(e) RIULBP (8,2)
(l) DRLBP (8,2)
(m) ALBP (8,1)
(n) UALBP (8,1)
CE
(k) AECLBP (8,2)
Figure 11: An illustration of an ear image and its encoded images using LBP and its variants. (a) original ear image, (b) basic LBP, (c) uniform LBP, (d) rotation invariant LBP, (e) rotation invariant uniform LBP, (f) center symmetric
AC
LBP, (g) completed LBP, (h) median binary pattern, (i) combined horizontal and vertical ELBP (ELBP),(j) noise tolerant LBP, (k) adjacent evaluation completed LBP, (l) dominant rotated LBP, (m) the proposed averaged LBP, and (n) uniform ALBP.
23
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
24
3. Evaluation framework The inherent probabilistic nature of biometric recognition systems necessitates continuous performance assessment and proper testing to achieve reliability. Many experimental evaluations and comparative studies are conducted by researchers for evaluating the performance of ear recogni-
CR IP T
tion systems. However, these evaluations have subjective natures as experiments are conducted on different databases, using different feature extractors, custom classifiers, and individually developed frameworks. Thanks to Emerˇsiˇc et al. (Emerˇsiˇc et al., 2017) for addressing the problem by building an objective benchmarking framework available for public to measure the recognition performance of feature descriptors using optionally selected ear datasets. The evaluation frame-
AN US
work consists of four main phases, which are: image preprocessing, features extraction, features matching, and results representation. Figure 12 illustrates the main steps followed in this work from the beginning to the end of the evaluation process. Also, a brief description for each phase is
AC
CE
PT
ED
M
given in the following subsections.
Figure 12: An illustration for steps followed in the ear recognition evaluation framework.
3.1. Images preprocessing In reality, ear images can exist in many formats, represented at different scales, and acquired under different imaging conditions. In order to obtain the same length feature vector and unified 24
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
25
representation for ear images, preprocessing the input images prior to features extraction is a must. Therefore, in this phase, the images of the specified ear dataset are read and loaded into memory, then passed through certain preprocessing operations. By preprocessing, we ensure that all images have the same characteristics by applying the following operations: converting into gray-scale images if they are colored, resizing to be in the same spatial resolution of 100 × 100
CR IP T
pixels, obtaining a contrast enhanced version of the original image using histogram equalization technique, and finally removing the duplicated images. The output of this phase is preprocessed images suitable for extracting ear features through applying the candidate feature extractors. Table 2: The implementation details and parameters settings of the LBP descriptor and its variants used in this paper.
AN US
Here, the term ”#bins” refers to the number of distinct patterns assigned to construct the feature histogram. LBP descriptors
Parameters settings
BLBP (Ojala et al., 1996)
Basic LBP, neighbors=8, radius=1, block size= 32 × 32 pix., no block overlap, #bins=256.
RILBP (Pietik¨ainen et al., 2000) ULBP (Ojala et al., 2002)
Uniform LBP, neighbors=8, radius=1, blocksize= 32 × 32 pix., no block overlap, #bins=59.
RIULBP (Ojala et al., 2002) CSLBP (Heikkil¨a et al., 2009)
Center symmetric LBP, neighbors=8, radius=1, blocksize= 32 × 32 pix., no block overlap, T=2.5, #bins=16.
HELBP (Nguyen & Caplier, 2012) VELBP (Nguyen & Caplier, 2012) UELBP (Nguyen & Caplier, 2012) NTLBP (Fathi & Naghsh-Nilchi, 2012) AECLBP (Song et al., 2015)
Uniform CLBP, neighbors=8, radius=2, blocksize= 32 × 32 pix., no block overlap, #bins=2 × 59.
Horizontal ELBP, neighbors=8, r1=2, r2=1, blocksize= 32 × 32 pix., no block overlap, #bins=256. Vertical ELBP, neighbors=8, r1=1, r2=2, blocksize= 32 × 32 pix., no block overlap, #bins=256.
Uniform ELBP (H+V), neighbors=8, r1=1, r2=2, blocksize= 32 × 32 pix., no block overlap, #bins=2 × 59.
Noise tolerant LBP, neighbors=8, radius=2, blocksize= 8 × 8 pix., block overlap =4 × 4 pix., K=1, #bins=16. Adjacent evaluation CLBP, neighbors=8, radius=2, blocksize= 32 × 32 pix., no block overlap, #bins=2 × 59. Dominant rotated LBP, neighbors=8, radius=2, blocksize= 32 × 32 pix., no block overlap, #bins=59.
PT
DRLBP (Mehta & Egiazarian, 2016)
M
Median binary patterns, neighbors=8, radius=1, blocksize= 32 × 32 pix., no block overlap, #bins=512.
CLBP (Guo et al., 2010)
Averaged LBP, neighbors=8, radius=1 block size= 32 × 32, no block overlap, #bins=256.
Uniform averaged LBP, neighbors=8, radius=1 block size= 32 × 32, no block overlap, #bins=59.
CE
The proposed UALBP
Rotation invariant uniform LBP, neighbors=8, radius=2, blocksize= 8 × 8 pix., block overlap =4 × 4 pix., #bins=10.
ED
MBP (Hafiane et al., 2007)
The proposed ALBP
Rotation invariant LBP, neighbors=8, radius=2, blocksize= 32 × 32 pix., no block overlap, #bins=36.
3.2. Features extraction
AC
In this phase, various local feature extraction techniques are implemented to extract local struc-
tures from ear images and their occurrence frequency over the entire image. Original LBP descriptor and many of its variants mentioned in Section 2 are used to encode texture of ear images and obtain a histogram-based feature vector representing each image. Examples for the encoded images using these texture descriptors are shown in Fig. 11. The output of this phase is a feature vector representing each image that will be evaluated using distance metrics comparison in the 25
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
26
next phase. The parameters settings and implementation details of the feature extractors used in this comparative study are given in Table 2. 3.3. Features matching The feature matching phase in the recognition system can be realized using any distance or
CR IP T
similarity measure such as Euclidean, cosine, correlation, minkowski and others. But, due to the specificity of histogram-based feature descriptors, more specialized similarity/dissimilarity measure is used. Since all the feature extraction and encoding methods used in this study are histogram-based descriptors. In this regard, the most frequently distance measure for matching histograms used in computer vision tasks is the Chi-square statistics which serves as a dissimilarity
AN US
measure and the lower the value, the more similar the two histograms. In this phase, the Chisquare dissimilarity measure is utilized to find the distance between the biometric feature vectors and produces the similarity scores. Where, the Chi-square distance between two feature vectors
M
V1 = [x0 , x2 , . . . , x2 ] and V2 = [y0 , y2 , . . . , yn ] is defined as: distchi (V1 , V2 ) =
n X (xi − yi )2 i=0
xi + yi
(28)
ED
For the identification experiments, the feature vectors are compared in a probe-to-gallery manner and the resulting scores are sorted and ranked to find the rank at which a true match occurs.
PT
While for the verification experiments, the feature vectors are compared against each other and genuine and impostor scores are generated. Then, the generated scores are handed to the result
rics.
CE
representation phase for plotting the performance curves and computing other performance met-
AC
3.4. Results representation This phase utilizes the similarity scores generated in the previous phase and compute some
performance metrics such as, rank-1 recognition rates, false acceptance rates, true acceptance rates, and equal error rates. Furthermore, the Cumulative Match Characteristic (CMC) curves are generated to analyze the performance differences of the identification experiments; while, for the verivication experiments the Receiver Operating Characteristic (ROC) curves are generated. In 26
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
27
order to plot the CMC and ROC performance curves, the following definitions for performance metrics should be considered. Cumulative Match Characteristic (CMC) curve: is a rank-based metric that measures the probability of identifying the correct identity within the top K ranks. It is estimated from the match scores through a sorting and ranking process in which the best matching identities for each probe
CR IP T
are determined from the gallery.
False Acceptance Rate (FAR): is a measure of the likelihood that a biometric security system will incorrectly accept an access attempt by an unauthorized user (i.e., an impostor). For any biometric system, FAR is defined as the ratio of the number of false acceptances to the number of
AN US
identification attempts.
False Rejection Rate (FRR): is a measure of the likelihood that a biometric security system will incorrectly reject an access attempt by an authorized user (i.e., a client). For any biometric system, FRR is defined as the ratio of the number of false rejections to the number of identification attempts.
M
Equal Error Rate (EER): refers to the specific threshold value of FAR or FRR at which both FAR and FRR are equal. The lower the EER margin, the higher accuracy the biometric system
ED
has.
Receiver Operating Characteristic (ROC) curve: is a performance metric based on aggregate
PT
statistics of the match scores for all the biometric samples. ROC helps to illustrate the tradeoff between FAR and FRR by plotting FRR versus FAR at multiple thresholds. ROC curves are
CE
commonly used for comparing verification systems that have close or very similar performance at different operating points. An ROC curve can be obtained by varying the threshold that distinguish
AC
between genuine and impostor similarity scores. 4. Experimental results and performance evaluation In this section, we give a brief description of the benchmark ear datasets used in this study.
Then, we report the experimental results of ear identification and verification experiments for all assessed LBP-based features. The conducted experiments follow all-to-all test protocol, which means that every image of each subject in the dataset is used as a respective training/test image, 27
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
28
and the 5-fold cross-validation procedure on each dataset. The conducted experiments in this work can be fully reproduced using the AWE toolbox presented in (Emerˇsiˇc et al., 2017). All the experiments are carried out on the following publicly available ear datasets. 4.1. Ear datasets
CR IP T
4.1.1. IIT Delhi dataset
The Indian Institute of Technology Delhi (IITD) ear dataset (Kumar & Wu, 2012) consists of ear images collected from students and staff at IIT Delhi. All images are acquired from the same profile angle and at different indoor lighting conditions. Each subject has between 3 to 6 ear images and the subjects are in the age of 14-58 years. The IITD dataset has two sub-datasets,
AN US
IITD-I and IITD-II. The first dataset, IITD-I, has 493 gray-scale image for 125 distinct subjects and is available in raw and preprocessed format. Images from this dataset have sptial resolution of 272×204 pixels and are available in JPEG format. The second dataset, IITD-II, has 793 normalized images for 221 distinct subjects with spatial resolution of 50 × 180 pixels. Sample images from
M
the IITD-I and IITD-II datasets are shown in Fig. 13 (a and b). 4.1.2. AMI dataset
ED
Mathematical Analysis of Images (AMI) (Gonzalez, 2008) contains 700 ear images acquired from 100 different subjects in the age of 19-65 years. Each subject has 7 images with 6 images for
PT
the right ear and one image for the left. Five images of the right ear for the subject looking forward, left, right, up and down, respectively. The sixth image of the right ear is for the subject looking
CE
forward but with a different camera focal length (Zoomed). The last image is for the left ear with the subject facing forward. All images have spatial resolution of 492 × 702 pixels, available in
AC
JPEG format and were taken under the same lighting conditions as shown in Fig. 13 (c). 4.1.3. WPUT dataset
The West Pomeranian University of Technology (WPUT) (Frejlichowski & Tyszkiewicz, 2010)
ear dataset was introduced in 2010 as a diverse database to avoid some of the limitations in the existing ear datasets. To this end, the WPUT dataset contains challenging ear images for individuals 28
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
29
of various ages, taken under different profile angles (rotation), different camera focal length (scaling), and major occlusion by hair and other objects. Images were acquired from 501 subjects and each subject has between 4 to 16 images taken under different lighting conditions. The available dataset for download has 3348 images for 474 subjects and images for 27 subjects are missing.
AC
CE
PT
ED
M
AN US
CR IP T
Figure 13 (d) shows some sample images from the WPUT dataset.
29
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
30
AN US
CR IP T
(a) IITD-I dataset
M
(b) IITD-II dataset
(d) WPUT dataset
AC
CE
PT
ED
(c) AMI dataset
(e) AWE dataset Figure 13: Sample images from the ear datasets used in the experiments. Clearly, each dataset has different character-
30 istics and the ear images exhibit diverse levels of variability.
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
31
4.1.4. AWE dataset The Annotated Web Ear (AWE) dataset (Emerˇsiˇc et al., 2017) has 1000 cropped ear images for 100 distinct subjects with tem images per subject. Its images are collected from the web for some known persons males and females of different ethnicity. The images have variable sizes ranging from 15 × 29 pixels to 473 × 1022 pixels, acquired at different viewing angles, varying illumination
CR IP T
conditions, and at different periods of age. Also, some images are occluded by hair, earrings and in some cases by larger objects. See Fig. 13 (e) for sample images from the AWE dataset. 4.2. Identification experiments
The main objective of the identification experiments is to investigate and highlight the per-
AN US
formance characteristics of LBP-based features in ear identification problem. To this end, many experiments are conducted on the aforementioned constrained and unconstrained ear datasets. We follow the all-to-all test protocol and perform 5-fold cross-validation experiments on all datasets and report the rank-1 recognition rate in the form of mean and standard deviation over the 5 folds.
M
The results of these experiments are summarized in Table 3. The top two performers of the LBP variants on each ear dataset are highlighted in bold. Further, in order to sample finer differences
ED
in recognition performance, the average cumulative match characteristic (CMC) curves over the 5 folds for each identification experiment on each dataset are illustrated in Fig. 14.
PT
The identification experiments on the IITD-I dataset indicate that the texture descriptors DRLAP and AECLBP perform the best with rank-1 recognition rate of approximately 97.2% and
CE
96.2%, respectively. The remaining descriptors perform in a similar manner with performance differences in the range of 2% as it can be seen from Fig. 14 (a). The RIULBP and NTLBP descriptors give the lowest rank-1 recognition rates of 92.71% and 91.5%, respectively. This low
AC
performance is attributed to the fact that RIULBP and NTLBP use fewer number of patterns (e.g., 10 and 16) to describe local image structures and encode their texture. Consequently, discarding some patterns that encode textural structures and useful information deteriorates the recognition performance. While, in the case of IITD-II dataset that contains normalized images, the recognition rates for all descriptors are almost similar with mean recognition differences less than 2% as depicted in Fig. 14 (b). Additionally, the AECLBP and CLBP descriptors are top performers with 31
ACCEPTED MANUSCRIPT
32
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
Table 3: Results of the identification experiments, where the performance metric rank-1 is given in the form of mean and standard deviation over the 5-folds.
ULBP RIULBP CSLBP MBP CLBP ELBP NTLBP AECLBP DRLBP ALBP
94.72 ± 2.54 96.59 ± 0.65 72.29 ± 2.53
95.14 ± 2.60 93.95 ± 1.42 65.43 ± 1.84
94.52 ± 3.46 96.84 ± 0.58 70.86 ± 1.77
92.71 ± 2.41 96.09 ± 1.76 66.43 ± 2.07
94.12 ± 2.85 94.96 ± 1.06 67.86 ± 2.89
94.13 ± 3.72 96.85 ± 0.89 71.71 ± 3.08
96.14 ± 2.28 98.23 ± 0.84 73.71 ± 2.61
WPUT
AWE
37.34 ± 4.29
41.70 ± 2.79
22.64 ± 4.45
95.33 ± 2.93 97.23 ± 0.50 72.86 ± 1.81
23.50 ± 1.00
35.20 ± 3.72
29.12 ± 3.86
31.62 ± 3.63
38.10 ± 4.93
40.00 ± 2.98
34.80 ± 3.44
38.00 ± 3.36
42.80 ± 2.20
37.08 ± 4.44 47.50 ± 3.48
38.76 ± 4.53 44.80 ± 1.94
91.48 ± 2.10 95.96 ± 1.82 64.57 ± 3.57
28.86 ± 3.99
37.20 ± 4.63
97.16 ± 1.35 96.34 ± 2.09 71.43 ± 2.56
27.44 ± 4.62
23.50 ± 1.05
35.60 ± 3.76
38.40 ± 3.09
96.15 ± 3.85 98.61 ± 0.73 73.57 ± 2.26
94.72 ± 3.06 96.22 ± 0.69 71.43 ± 1.86
94.73 ± 2.77 96.34 ± 1.01 69.86 ± 2.45
37.13 ± 4.55 49.60 ± 3.40
38.71 ± 4.79 42.20 ± 2.29
ED
UALBP
AMI
CR IP T
RILBP
IITD-II
AN US
BLBP
IITD-I
M
LBP variant
recognition performance of approximately 98.6% and 98.2%, respectively. Here, The NTLBP
PT
descriptor achieves good performance 95.9% compared to its performance on the IITD-I dataset. The proposed one (ALBP) achieves acceptable rate compared to the other descriptors depending
CE
on complicated threshold scheme. In some cases such as IITD-I, it outperforms ULBP, RIULBP, MBP and NTLBP. While on the IITD-II dataset, it achieves similar performance to ULBP and
AC
DRLBP.
With respect to the AMI dataset, which has images with various poses, the descriptors work
moderately well and the obtained recognition performance ranges from 64.5% for NTLBP to 73.7% achieved by CLBP descriptor as shown in Fig. 14 (c). Though the AMI dataset contains good quality images, from these experiments, it becomes clear that pose variation has a major influence on the recognition performance for all the evaluated LBP variants. 32
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
33
When the image variability increases to reflect uncontrolled or in-the-wild imaging conditions as in the WPUT ear dataset, we notice a significant decrease in recognition performance for all the assessed LBP feature extractors. The average rank-1 recognition rate ranges approximately from 38.7% achieved by ELBP variant to 22.6% obtained by the RILBP descriptor as reported in Table 3 and illustrated in Fig. 14 (d). Clearly, the RILBP, DRLBP, NTLBP, RIULBP and CSLBP
CR IP T
descriptors give the lowest recognition performance; while, the rest descriptors give recognition rates above 35%. In this case, the proposed ALBP descriptor achieves similar performance to the highest performer with recognition rate of 38.7%. The results of these experiments demonstrate the difficulties facing all the LBP descriptors in recognizing ear images taken under uncontrolled
AN US
conditions. Where, the highest recognition rate of 38.8% achieved by the ELBP variant. This decrease in the performance is expected due to the severe imaging conditions in the WPUT ear dataset.
Similarly, the same identification experiments are conducted on the recent AWE dataset, which contains ear images with large variability. Also, a noticeable decrease in recognition performance
M
above 42% is reported. By considering the entire CMC curves depicted in Fig. 14 (e) and Table 3, the best performance is achieved by AECLBP descriptor with average rank-1 recognition rate
ED
of approximately 49.6%. The MBP and the proposed ALBP have near similar performance about 42%; while, the DRLBP and RILBP achieve similar lowest performance of 23.54%. The obtained
AC
CE
PT
results indicate another drastic decrease in recognition performance on the AWE dataset.
33
ACCEPTED MANUSCRIPT
34
CR IP T
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
(b) IITD-II CMC curves
PT
ED
M
AN US
(a) IITD-I CMC curves
(d) WPUT CMC curves
AC
CE
(c) AMI CMC curves
34 (e) AWE CMC curves Figure 14: The CMC curves generated in the identification experiments for LBP variants on the benchmark ear datasets.
ACCEPTED MANUSCRIPT
35
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
4.3. Verification experiments The main objective of the verification experiments is to ascertain the comparative performance of LBP-based features in the ear verification problem. When conducting the verification experiments, we follow the same test protocol discussed in Section 4.2 and report the average experi-
CR IP T
mental results over the 5 folds using expected error rate (EER) and generate the average ROC plots over the 5 folds. The results of these experiments are reported in Table 4 and summarized in Fig. 15. The top LBP variant performer on each ear dataset that achieves the lowest EER is highlighted in bold.
Again, for the IITD-I dataset, the best performance is achieved by the DRLBP descriptor with
AN US
EER of 4.67. By considering the entire ROC curves shown in Fig. 15 (a), one can observe that, the uniform patterns based variants ULBP, CLBP, AECLBP, and UALBP has nearly similar perTable 4: Results of the verification experiments where the performance metric EER is given in the form of mean and standard deviation over the 5 folds.
RILBP ULBP RIULBP CSLBP
CLBP
AC
ELBP
NTLBP
AECLBP
DRLBP
ALBP UALBP
M
6.16 ± 3.11 4.67 ± 1.21
6.02 ± 2.98 5.56 ± 1.85
5.50 ± 2.31 5.08 ± 1.14
7.65 ± 2.40 5.04 ± 0.95
6.03 ± 2.31 6.05 ± 1.71
6.03 ± 2.31 4.70 ± 1.49
CE
MBP
IITD-II
ED
BLBP
IITD-I
PT
LBP variants
5.11 ± 2.01 4.26 ± 1.38
6.09 ± 3.25 3.97 ± 1.02
AMI
21.17 ± 2.20
22.58 ± 3.31
21.42 ± 3.42
24.71 ± 4.86
21.73 ± 3.58
20.98 ± 1.86
20.58 ± 3.06
21.04 ± 2.28
WPUT
AWE
24.60 ± 3.70 29.78 ± 2.28
33.18 ± 3.13 37.46 ± 3.39
28.27 ± 4.03 30.12 ± 3.05
33.49 ± 3.01 35.58 ± 1.91
29.29 ± 3.74 30.28 ± 2.58
24.05 ± 3.26 31.05 ± 2.07
25.07 ± 3.58 27.58 ± 2.07
25.11 ± 3.44 28.94 ± 2.32
7.59 ± 2.48 5.04 ± 1.22 24.99 ± 4.06
32.63 ± 2.41 35.56 ± 2.41
4.67 ± 1.96 5.17 ± 1.49 21.39 ± 2.63
31.65 ± 3.27 38.30 ± 3.69
5.71 ± 2.34 5.17 ± 1.68 22.14 ± 3.32
28.91 ± 3.51 31.47 ± 2.97
5.08 ± 2.21
3.78 ± 1.10 20.41 ± 2.88
6.28 ± 2.53 4.92 ± 1.44 22.11 ± 2.60
35
26.32 ± 3.22 27.30 ± 2.24
23.47 ± 2.77 31.36 ± 1.66
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
36
formance; while, the RIULBP and NTLBP descriptors give the lowest performance with EER of 7.65 and 7.59, respectively. The other descriptors perform similarly. With respect to the IITD-II dataset, the verification results depicted in Fig. 15 (b) show that the difference in performance is somehow smaller compared to that of the IITD-I dataset, with the AECLBP performing the best and CSLBP descriptor gives the lowest performance. The verification experiments on the AMI
CR IP T
dataset summarized in Fig. 15 (c) indicate that the best performance is again achieved by the AECLBP descriptor. Here, all the other descriptors achieve approximately very similar performance except NTLBP which gives the lowest performance with EER of 24.99.
In the case of uncontrolled imaging conditions as in the WPUT dataset and by considering
AN US
the ROC for the verification experiments illustrated in Fig. 15 (d), the ALBP descriptor performs the best with EER around 23%. Whilst, all other variants perform more or less in the same level. The ROC curves shown in Fig. 15(e) for the verification experiments on the AWE dataset indicate that the performance of all assessed descriptors are close with the AECLBP descriptor performing the best. Table 4 illustrates that the best performing descriptor on AWE ear dataset has an EER of
M
approximately 27%. The obtained results emphasize the difficulty of the WPUT and AWE datasets
AC
CE
PT
ED
and promote for further research and improvements.
36
ACCEPTED MANUSCRIPT
37
CR IP T
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
(b) IITD-II ROC curves
PT
ED
M
AN US
(a) IITD-I ROC curves
(d) WPUT ROC curves
AC
CE
(c) AMI ROC curves
37 (e) AWE ROC curves Figure 15: The ROC curves generated in the verification experiments for the LBP variants on the benchmarking ear dataset.
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
38
5. Discussion In this study, we have examined and compared the performance of several LBP variants as local feature extractors in ear recognition. More than six thousand images have been used from five available ear datasets containing normalized images and gradually increasing the levels of
CR IP T
distortions until approximately fully uncontrolled imaging conditions. In most of the experiments, the best overall performance is obtained by the AECLBP texture descriptor. This illustrates the usefulness of including the various image components in encoding texture and local image structures. However, the descriptor uses two uniform components; signs and magnitudes, making its feature vector double the length of the feature vector extracted by the traditional ULBP descriptor.
AN US
While, the best low-dimensional LBP variants are those based on the uniform patterns; namely, ULBP and UALBP, if the high dimensionality of the extracted feature vectors is an issue. The MBP descriptor perform moderately well in the same performance compared with other descriptors. Unfortunately, it produces high dimensional feature vectors as it uses 512 bins histograms leading to longer matching time and more memory requirements for storing the extracted features.
M
The performance of most tested LBP variants are unstable in the case of uncontrolled imaging
ED
conditions. This can be attributed to the wide variations in lighting, scale, pose, and occlusion as well as using only few patterns to represent the image texture and therefore discarding important information about local image structures and their occurrence.
PT
On the other hand, it is important to notice that some LBP variants have different alternatives; that is, their final feature vectors can be obtained using different mechanisms resulting in new
CE
alternatives. For instance, the feature vector of the ELBP descriptor (Nguyen & Caplier, 2012) can be built via using horizontal, vertical and uniform or combine all together. Therefore, to
AC
investigate the impact of each mechanism on the discriminative power of the obtained feature vectors, similar identification and verification experiments are conducted on the same ear datasets and using the same protocol utilized in the previous experiments for each alternative of the ELBP descriptor. The obtained results for both identification and verification are illustrated in Table 5 and Table 6. Besides, the representation of the extracted features by each ELBP alternative is shown in Fig. 16. The motivation for that is the shape structure of the human ear which is close 38
ACCEPTED MANUSCRIPT
39
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
to a vertical ellipse; therefore, the vertical ELBP descriptor is expected to be more suitable for extracting the ear features. As a result, we found that VELBP obtains the highest recognition rates on two and four datasets for identification and verification experiments as given in Table 5 and Table6, respectively. However, the rank-1 recognition rate differences between all variants of ELBP are not so high and their performance are almost similar. Moreover, the average CMC and
brevity, we do not include them here.
CR IP T
ROC plots for the identification and verification experiments were generated, but for the sake of
Table 5: Results of the identification experiments using the ELBP and its alternatives.
VELBP HELBP
IITD-II
AMI
95.33 ± 2.93
97.23 ± 0.50
72.86 ± 1.81
96.35 ± 0.72
97.22 ± 1.43
94.11 ± 3.26
97.86 ± 0.75
95.33 ± 2.86
96.72 ± 0.74
95.74 ± 3.13
UELBP UVELBP
94.12 ± 2.36 95.94 ± 2.74
UHELBP
WPUT
AWE
38.76 ± 4.53
44.80 ± 1.94
72.86 ± 2.35
36.78 ± 4.17
42.70 ± 3.03
70.57 ± 2.94
36.47 ± 4.09
AN US
ELBP
IITD-I
96.34 ± 0.84
M
ELBP alternatives
72.57 ± 2.15
72.00 ± 2.61
73.00 ± 1.88
39.32 ± 4.11
35.91 ± 3.68 34.27 ± 3.79
43.90 ± 2.18
41.20 ± 1.96 42.80 ± 1.12
39.10 ± 2.89
ELBP alternatives
VELBP HELBP
CE
UELBP
UVELBP
AC
UHELBP
IITD-I
IITD-II
AMI
WPUT
AWE
6.09 ± 3.25
3.97 ± 1.02
21.04 ± 2.28
25.11 ± 3.44
28.94 ± 2.32
4.79 ± 1.19
21.64 ± 2.45
26.28 ± 3.34
29.51 ± 2.05
PT
ELBP
ED
Table 6: Results of the verification experiments for the ELBP and its alternatives.
6.89 ± 3.18
3.68 ± 1.16
5.47 ± 2.44
4.78 ± 1.40
5.50 ± 2.39
5.34 ± 1.40
5.89 ± 3.10 6.28 ± 2.29
4.51 ± 1.39
39
20.94 ± 2.54 21.00 ± 3.05 21.67 ± 3.40 21.55 ± 2.86
24.74 ± 3.82 29.50 ± 4.38 29.79 ± 4.35 29.79 ± 4.36
29.13 ± 2.26 30.36 ± 2.78 30.37 ± 2.58 31.16 ± 2.78
ACCEPTED MANUSCRIPT
40
(b) ELBP
(c) UELBP
(d) VELBP (8,1,2)
(e) UVELBP (8,1,2)
AN US
(a) Original image
CR IP T
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
(f) HELBP (8,2,1)
(g) UHELBP (8,2,1)
M
Figure 16: Representation of ear image features coded using the ELBP and its alternatives: (a) Original ear image, (b) ELBP, (c) uniform ELBP, (d) vertical ELBP (VELBP) (e) uniform version of VELBP, (f) horizontal ELBP (HELBP),
ED
(g) uniform version of HELBP.
PT
Also, several experiments are conducted to estimate the performance of the proposed simple thresholding scheme descriptor (i.e., ALBP) using variable window size and with or without over-
CE
laps. it is noticed that overlapping neighboring windows increases the recognition performance in the range of 2%. However, we decided to use the same window size without overlap for the ALBP and all other LBP-based descriptors to fairly compare the performance under similar experimental
AC
conditions. The obtained recognition rate for ALBP descriptor is in the range obtained by other LBP variants and excels in some cases as can be seen from Table 3. Though, its thresholding scheme simplicity of ALBP, the obtained results show the applicability and success of the proposed descriptor to represent and extract image information in ear recognition systems in case of controlled and uncontrolled imaging conditions with a slight improvement in performance compared with other LBP variants. 40
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
41
Overall, the main aim of this study is to address the question whether the success achieved by LBP-based texture descriptors in texture classification is guaranteed in ear recognition. The obtained results indicates that the LBP-based texture descriptors perform well in ear recognition under the controlled imaging conditions and are capable of achieving high recognition performance reaching up to 99% using small ear images of 100 × 100 pixels. Where, the best overall
CR IP T
performance is obtained by the AECLBP (Song et al., 2015) followed by the CLBP (Guo et al.,
2010) texture descriptor. Another advantage of the LBP-based feature descriptors is their low computational complexity and ease of implementation, which are important factors in designing biometric recognition systems for real-world applications. Also, our findings suggest that most
AN US
of the tested LBP variants are invariant to linear scaling and multiplication in pixels gray level, and some of them such as DRLBP (Mehta & Egiazarian, 2016) allow to select the most common patterns. While, few of the LBP variants are more resistant to geometrical images displacement (e.g., rotation). On the other side, some variants including the basic LBP operator produce longer histograms (i.e., 2P bins for P neighboring pixels), which lead to large storage requirements, and
M
lose some textural information as they utilizes only the signs of differences of the neighboring pixels. Thus, when they were utilized in ear recognition with severe imaging conditions (e.g.,
ED
WPUT dataset), their performance deteriorated markedly to reach 37% on the average. More interestingly, the results presented in this comparative experimental study is not exhaustive and can
PT
be complemented to see how sensitive are the examined methods when considering different measures such as Euclidean, Cosine, correlation, minkowski or fuzzy based similarity (Hassaballah &
CE
Ghareeb, 2017) for producing the similarity scores to examine which one works better for such kind of datasets or LBP variants. Finally, we believe that considering the LBP-based operators as
AC
a feature extraction mechanize is a very exciting line of research and future advances on the field can have an important impact not only on ear recognition but also on several related-applications of computer vision and intelligent systems. 6. Conclusion This paper presents a comprehensive comparative experimental study for evaluating the performance of several LBP-based features in ear recognition under controlled and unconstrained 41
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
42
imaging conditions. By using very simple thresholding scheme, another LBP variant termed averaged local binary patterns (ALBP) is introduced. The experiments are carried out on five available and widely used ear datasets. The obtained results show that LBP-based descriptors are strong candidates for ear feature extraction under controlled imaging conditions due to their simplicity and efficient computation. They can achieve average recognition rate up to 99% on controlled
CR IP T
imaging conditions. However, the recognition performance fall drastically when moving to the unconstrained or in-the-wild imaging conditions and the size of the dataset increases as in the case of WPUT and AWE ear datasets. It is noticed from the results of all experiments that recognition performance on the WPUT and AWE datasets are much lower than on the other datasets. This
AN US
performance degradation can be attributed to the various distortions such as rotation, scaling, occlusion, age variations, different ethnicity, and the large variation in lighting conditions exist in both WPUT and AWE datasets. Further, the obtained results by all LBP variants confirm that the WPUT dataset is more challenging than the recent one-AWE.
Our experimental evaluation reveals future avenues for further research on using the LBP-
M
based features in ear recognition. Where, extracting the most discriminant features and further improvements are necessary in the severe imaging conditions to avoid the drawbacks of the cur-
ED
rent LBP variants and to enhance their discriminative power. For instance, as indicated from the obtained results on the AMI dataset, pose variations have a major impact on the recognition per-
PT
formance of all assessed methods. Thus, both in-plane rotations and out-of-plane rotations are still unsolved challenges. These variations can be addressed by suitable alignment techniques to
CE
reduce the interference from non-ear parts and to represent ear features better. Another viable solution to improve recognition performance is to use feature fusion techniques, where a set of
AC
features is extracted using multiple feature descriptors for the same ear image. After that a specific technique such as Discriminant Correlation Analysis (DCA) can be utilized for feature fusion and dimensionality reduction. An important new research direction is to explore different deep learning architectures for feature extraction in conventional feature-classifier pipeline techniques or in a fully end-to-end system development. Also, the problem of large scale ear recognition challenge is still an open issue and further improvements of effective feature representation operators are needed. Besides, as a direct extension to the current work, it would be very interesting to see how 42
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
43
sensitive are the examined methods when considering several measures to produce the similarity scores to examine which one works better for such kind of datasets and methods. 7. Acknowledgments
CR IP T
The authors would like to thank the Editor-in-Chief Prof. Binshan Lin and the anonymous editor and reviewers for their valuable comments and constructive suggestions, which considerably improved the quality of the paper. Also, the owners of all ear datasets that are used in the experiments in this study are gratefully acknowledged.
AN US
References
Abaza, A., & Bourlai, T. (2013). On ear-based human identification in the mid-wave infrared spectrum. Image and Vision Computing, 31, 640–648.
Abaza, A., Ross, A., Hebert, C., Harrison, M. A. F., & Nixon, M. S. (2013). A survey on ear biometrics. ACM Computing Surveys, 45, 1–35.
M
Ahonen, T., Hadid, A., & Pietik¨ainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 2037–2041. Annapurani, K., Sadiq, M. A. K., & Malathy, C. (2015). Fusion of shape of the ear and tragus–a unique feature
ED
extraction method for ear authentication system. Expert Systems with Applications, 42, 649–656. Anwar, A. S., Ghany, K. K. A., & ElMahdy, H. (2015). Human ear recognition using SIFT features. In Third World
PT
Conference on Complex Systems (pp. 1–6).
Arbab-Zavar, B., & Nixon, M. S. (2008). Robust log-Gabor filter for ear biometrics. In International Conference on Pattern Recognition (pp. 1–4).
CE
Arbab-Zavar, B., & Nixon, M. S. (2011). On guided model-based analysis for ear biometrics. Computer Vision and Image Understanding, 115, 487–502.
AC
Basit, A., & Shoaib, M. (2014). A human ear recognition method using nonlinear curvelet feature subspace. International Journal of Computer Mathematics, 91, 616–624.
Benzaoui, A., Hadid, A., & Boukrouche, A. (2014). Ear biometric recognition using local texture descriptors. Journal of Electronic Imaging, 23, 053008.
Benzaoui, A., Kheider, A., & Boukrouche, A. (2015). Ear description and recognition using ELBP and wavelets. In International Conference on Applied Research in Computer Science and Engineering (pp. 1–6). Boodoo-Jahangeer, N. B., & Baichoo, S. (2013). LBP-based ear recognition. In IEEE International Conference on Bioinformatics and Bioengineering (pp. 1–4).
43
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
44
Bustard, J. D., & Nixon, M. S. (2010). Toward unconstrained ear recognition from two-dimensional images. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40, 486–494. Chan, T.-S., & Kumar, A. (2012). Reliable ear identification using 2-D quadrature filters. Pattern Recognition Letters, 33, 1870–1881. Chang, K., Bowyer, K. W., Sarkar, S., & Victor, B. (2003). Comparison and combination of ear and face images in appearance-based biometrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1160–1165.
CR IP T
Chora´s, M. (2005). Ear biometrics based on geometrical feature extraction. Electronic Letters on Computer Vision and Image Analysis, 5, 84–95.
Chora´s, M. (2008). Perspective methods of human identification: Ear biometrics. Opto-Electronics Review, 16, 85–96.
Chowdhury, D. P., Bakshi, S., Guo, G., & Sa, P. K. (2018). On applicability of tunable filter bank based feature for
AN US
ear biometrics: A study from constrained to unconstrained. Journal of medical systems, 42, 11.
ˇ Struc, ˇ Emerˇsiˇc, Z., V., & Peer, P. (2017). Ear recognition: More than a survey. Neurocomputing, 255, 26–39. Fathi, A., & Naghsh-Nilchi, A. R. (2012). Noise tolerant local binary pattern operator for efficient texture analysis. Pattern Recognition Letters, 33, 1093–1100.
Fooprateepsiri, R., & Kurutach, W. (2011). Ear based personal identification approach forensic science tasks. Chiang Mai Journal of Science, 38, 166–175.
M
Frejlichowski, D., & Tyszkiewicz, N. (2010). The West Pomeranian University of Technology Ear Database - A Tool for Testing Biometric Algorithms. In International Conference on Image Analysis and Recognition (pp. 227–234).
ED
Gald´amez, P. L., Raveane, W., & Arrieta, A. G. (2017). A brief review of the ear recognition process using deep neural networks. Journal of Applied Logic, 24, 62–72. Ghoualmi, L., Draa, A., & Chikhi, S. (2016). An ear biometric system based on artificial bees and the scale invariant
PT
feature transform. Expert Systems with Applications, 57, 49–61. Gonzalez, E. (2008). AMI Ear Dataset. (last visit, may 2018). URL: http://www.ctim.es/research_works/
CE
ami_ear_database.
Guo, Y., & Xu, Z. (2008). Ear recognition using a new local matching approach. In IEEE International Conference on Image Processing (pp. 289–292).
AC
Guo, Z., Zhang, L., & Zhang, D. (2010). A completed modeling of local binary pattern operator for texture classification. IEEE Transactions on Image Processing, 19, 1657–1663.
Hafiane, A., Seetharaman, G., & Zavidovique, B. (2007). Median binary pattern for textures classification. In International Conference on Image Analysis and Recognition (pp. 387–398).
Hanmandlu, M., & Mamta (2013). Robust ear based authentication using local principal independent components. Expert Systems with Applications, 40, 6478–6490. Hassaballah, M., Abdelmgeid, A. A., & Alshazly, H. A. (2016). Image features detection, description and matching.
44
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
45
In Image Feature Detectors and Descriptors: Foundations and Applications (pp. 11–45). Springer. Hassaballah, M., & Ghareeb, A. (2017). A framework for objective image quality measures based on intuitionistic fuzzy sets. Applied Soft Computing, 57, 48–59. Heikkil¨a, M., Pietik¨ainen, M., & Schmid, C. (2009). Description of interest regions with local binary patterns. Pattern Recognition, 42, 425–436. rics, 6, 351–359.
CR IP T
Hezil, N., & Boukrouche, A. (2017). Multimodal biometric recognition using human ear and palmprint. IET BiometHuang, Z., Liu, Y., Li, C., Yang, M., & Chen, L. (2013). A robust face and ear based multimodal biometric system using sparse representation. Pattern Recognition, 46, 2156–2168.
Hurley, D. J., Nixon, M. S., & Carter, J. N. (2005). Force field feature extraction for ear biometrics. Computer Vision and Image Understanding, 98, 491–512.
AN US
Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits and Systems for Video Technology, 14, 4–20.
Kumar, A., & Chan, T.-S. T. (2013). Robust ear identification using sparse representation of local texture descriptors. Pattern Recognition, 46, 73–85.
Kumar, A., & Wu, C. (2012). Automated human identification using ear imaging. Pattern Recognition, 45, 956–968. Information Sciences, 358, 56–72.
M
Liu, L., Fieguth, P., Zhao, G., Pietik¨ainen, M., & Hu, D. (2016). Extended local binary patterns for face recognition. Mawloud, G., & Djamel, M. (2016). Weighted sparse representation for human ear recognition based on local de-
ED
scriptor. Journal of Electronic Imaging, 25, 013036.
Mehta, R., & Egiazarian, K. (2016). Dominant rotated local binary patterns (DRLBP) for texture classification. Pattern Recognition Letters, 71, 16–22.
PT
Meraoumia, A., Chitroub, S., & Bouridane, A. (2015). An automated ear identification system using Gabor filter responses. In 13th IEEE International Conference on New Circuits and Systems (pp. 1–4).
CE
Morales, A., Diaz, M., Llinas-Sanchez, G., & Ferrer, M. A. (2015). Earprint recognition based on an ensemble of global and local features. In International Carnahan Conference on Security Technology (pp. 253–258). Nanni, L., Lumini, A., & Brahnam, S. (2012). Survey on LBP based texture descriptors for image classification.
AC
Expert Systems with Applications, 39, 3634–3641.
Nejati, H., Zhang, L., Sim, T., Martinez-Marroquin, E., & Dong, G. (2012). Wonder ears: Identification of identical twins from ear images. In International Conference on Pattern Recognition (pp. 1201–1204).
Nguyen, H.-T., & Caplier, A. (2012). Elliptical local binary patterns for face recognition. In Asian Conference on Computer Vision (pp. 85–96). Ojala, T., Pietik¨ainen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29, 51–59.
45
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
46
Ojala, T., Pietik¨ainen, M., & .M¨aenp¨aa¨ , M. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transaction on Pattern Analysis and Machine Intelligence, 24, 971–987. Omara, I., Li, F., Zhang, H., & Zuo, W. (2016). A novel geometric feature extraction method for ear recognition. Expert Systems with Applications, 65, 127–135. Pan, Z., Li, Z., Fan, H., & Wu, X. (2017). Feature based local binary pattern for rotation invariant texture classification. Expert Systems with Applications, 88, 238–248. IET Biometrics, 1, 114–129.
CR IP T
Pflug, A., & Busch, C. (2012). Ear biometrics: A survey of detection, feature extraction and recognition methods. Pflug, A., Paul, P. N., & Busch, C. (2014). A comparative study on texture and surface descriptors for ear biometrics. In International Carnahan Conference on Security Technology (pp. 1–6).
Pietik¨ainen, M., Hadid, A., Zhao, G., & Ahonen, T. (2011). Computer vision using local binary patterns. Springer
AN US
Science & Business Media.
Pietik¨ainen, M., Ojala, T., & Xu, Z. (2000). Rotation-invariant texture classification using feature distributions. Pattern Recognition, 33, 43–52.
Prakash, S., & Gupta, P. (2012). A rotation and scale invariant technique for ear detection in 3D. Pattern Recognition Letters, 33, 1924–1931.
Prakash, S., & Gupta, P. (2014). Human recognition using 3D ear images. Neurocomputing, 140, 317–325.
M
Rakshit, R. D., Nath, S. C., & Kisku, D. R. (2018). Face identification using some novel local descriptors under the influence of facial complexities. Expert Systems with Applications, 92, 82–94.
ED
Satpathy, A., Jiang, X., & Eng, H.-L. (2014). LBP-based edge-texture features for object recognition. IEEE Transactions on Image Processing, 23, 1953–1964.
Song, K., Yan, Y., Zhao, Y., & Liu, C. (2015). Adjacent evaluation of local binary pattern for texture classification.
PT
Journal of Visual Communication and Image Representation, 33, 323–339. Trefn`y, J., & Matas, J. (2010). Extended set of local binary patterns for rapid object detection. In Computer Vision
CE
Winter Workshop (pp. 1–7).
Wang, X., & Yuan, W. (2010). Gabor wavelets and general discriminant analysis for ear recognition. In 8th World Congress on Intelligent Control and Automation (pp. 6305–6308).
AC
Yuan, L., Liu, W., & Li, Y. (2016). Non-negative dictionary based sparse representation classification for ear recognition with occlusion. Neurocomputing, 171, 540–550.
Yuan, L., Mu, Z.-C., Zhang, Y., & Liu, K. (2006). Ear recognition using improved non-negative matrix factorization. In International Conference on Pattern Recognition (pp. 501–504).
Zhang, Y., Mu, Z., Yuan, L., Zeng, H., & Chen, L. (2017). 3D ear normalization and recognition based on local surface variation. Applied Sciences, 7, 104. Zhou, J., Cadavid, S., & Abdel-Mottaleb, M. (2012). An efficient 3-D ear recognition system employing local and
46
ACCEPTED MANUSCRIPT
M. Hassaballah et al. / Expert Systems with Applications 00 (2018) 1–46
AC
CE
PT
ED
M
AN US
CR IP T
holistic features. IEEE Transactions on Information Forensics and Security, 7, 978–991.
47
47