Computers in Industry 100 (2018) 129–136
Contents lists available at ScienceDirect
Computers in Industry journal homepage: www.elsevier.com/locate/compind
Robust feature point detectors for car make recognition a,⁎
b
a
Somaya Al-Maadeed , Rayana Boubezari , Suchithra Kunhoth , Ahmed Bouridane a b
T b
Department of Computer Science and Engineering, Qatar University, Doha, Qatar Department of Computer Science and Digital Technologies, Northumbria University, UK
A R T I C LE I N FO
A B S T R A C T
Keywords: Vehicle make and model recognition Automatic number plate recognition Feature extraction SIFT Harris descriptors
An Automatic Vehicle Make and Model Recognition (AVMMR) system can be a useful add-on tool to Automatic Number Plate Recognition (ANPR) to address potential car cloning, including intelligence collection by the police to outline past and recent car movement and travel patterns. Although several AVMMR systems have been proposed, the approaches perform sub-optimally under various environmental conditions, including occlusion and/or poor lighting distortions. This paper studies the effectiveness of deploying robust local feature points that can address these limitations. The proposed methods utilize a modification of two-dimensional feature points such as SIFT, SURF, etc. and their combinations. When SIFT gets combined with the multi-scale Harris/multiscale Hessian methods, it could outperform existing approaches. Experimental evaluations using 4 different benchmark datasets are conducted to demonstrate the robustness of the proposed techniques and their abilities to detect and identify car makes and models under various environmental conditions. SIFT- DoG, SIFT- multiscale Hessian, and SIFT- multiscale Harris are shown to yield the best results for our datasets with higher recognition rates than those achieved with other existing methods in the literature. Therefore, it can then be concluded that the combination of certain covariant feature detectors and descriptors can outperform other methods.
1. Introduction Intelligent transportation systems have contributed mainly to the fields of traffic monitoring, vehicle theft control, etc., which ultimately aim to minimize human intervention. In addition to the license plate information, the identification of the exact make and model of cars is useful and provides many additional cues in certain applications. Therefore, similar to an automatic number plate recognition system, a computer vision system that can automatically detect and identify the make and model of vehicles is advantageous, especially when combined with an ANPR system to accurately identify a vehicle. Additionally, the method can act as an efficient tool against car cloning. Currently, vehicle recognition systems are solely based on ANPR, and the technology is deployed in many applications ranging from cars parked in both public and restricted areas to the detection of vehicles on police/security “watch lists”. However, identity theft of vehicles (the process of replacing vehicle registration plates with those from an identical vehicle) is becoming an easy task for criminals to clone a vehicle, thus enabling them to easily commit crimes ranging from petty theft to organized crime. For example, the scale of cloning in Ireland and the UK is currently deemed to be immeasurable. When combined with ANPR, an AVMMR system can provide an extra level of security to
⁎
Corresponding author. E-mail address:
[email protected] (S. Al-Maadeed).
https://doi.org/10.1016/j.compind.2018.04.014 Received 30 August 2017; Received in revised form 16 April 2018; Accepted 17 April 2018 0166-3615/ © 2018 Elsevier B.V. All rights reserved.
fight car cloning since it is difficult for a criminal to steal and use a car registration number when the make, model, and colour of the cloned car is unknown. Reports by police and security organizations have recently indicated that vehicle cloning is more prominent worldwide and leads to security breaches and increased costs. The vendors have commented that the information collected from ANPR have been used by police to discover where a plate has been in the past, to identify whether a vehicle was at the scene of a crime, to identify the travel patterns of vehicle, and even to discover the vehicles that may be associated with each other [1]. These scenarios can be more efficiently addressed by combining ANPR with automatic vehicle make and model recognition so that the make, model, and colour of the vehicle are used to enhance recognition reliability. Moreover, this intelligence information can be shared with other agencies. Accurate vehicle make and model recognition can be very useful for police camera control, including traffic offenses, car thefts, cloning, automation and terrorist activities, especially when combined with an ANPR system. Recently, a number of vehicle monitoring and security systems have been developed, and these systems are based on ANPR or utilize ANPR to detect the regions of interest. Since it is relatively easy to forge number plates, ANPR alone may not be the most reliable solution. To increase the level of security, ANPR can be related to the car’s
Computers in Industry 100 (2018) 129–136
S. Al-Maadeed et al.
removal, greyscale conversion, and histogram equalization. Then, object detection is achieved by a subtraction operation of the background image. The presence of an object is indicated by a colour difference between the two fields. The last step consists of feature extraction using a Gabor filterbank, which is able to extract the features. The similarity measure between the Gabor jet of the test and the template image is used to recognize the exact match for the test image. The number of vehicle classes is limited to three in this work. The authors in Ref. [9] used contour features for car recognition and the technique starts by detecting the regions of interest. Then, a Canny operator extracts the image’s edges and generates an image pyramid for the edge contour of the car. The features, such as the round rate, Fourier descriptors, direction ratio, and circumference and area ratio of the car wheel, are then computed. The paper shows the results of only 4 classes with no reference to the dataset used. The performance is not very high, mainly because of the poor quality of contour extraction. In Ref. [10], a car image is segmented initially by applying a background subtraction technique. Using the resulting binary image, which represents the rear-view shape of the car, a number of features can be extracted. The characterization of the car is then extracted from the shape features, back light features, and the colour. Finally, a similarity measure generated from these features allows for the determination of the car model at hand from a list of models stored and registered in a database. A discussion of the features suitable for vehicle model detection in aerial videos is provided in Ref. [11]. Scale and rotation invariant descriptors are computed from the region of interest moments of the car image. By detecting small image structures, the model of the vehicles can be determined using a suitable classification method. The advantages of the regional moments over the regional covariance descriptors were also reported in the paper. In Ref. [12], a two-dimensional Linear Discriminant Analysis (2DLDA)-based algorithm is proposed and implemented for real-time vehicle model recognition where robust features are obtained by applying 2DLDA on the gradients of the regions of interest extracted relative to the location of the license plate. However, the algorithm is shown to be sensitive to colour variations and light distortions, thus resulting in a recognition accuracy of 94.7%. A comparative study of different approaches for car make and model recognition can be found in Ref. [13] and it includes Canny edges, Harris corners, Square mapped gradients, Recursive partitioning and local normalization with the kNN and the Naive Bayes classifier for the matching step. These approaches have also been investigated and evaluated using a new approach based on the strength of Harris corners. To achieve this, the algorithms are applied over the region of interest extracted according to the height and width of the located license plate. Testing and evaluation were carried out using a realistic dataset of 262 frontal car images. The algorithm discussed in Ref. [14] relies on the global and local descriptors of the car image. The global shape descriptors are calculated for the selected edge points in the edge map. Since the objects from the same class (cars in this case) have similar shapes, a local shape descriptor is also computed using the edge points. In addition, appearance features and their descriptors are extracted from the manually segmented regions. The experiments involved only rear-view car images. A 2DLDA-based approach is used and compared against a Principal Component Analysis (PCA) counterpart in Ref. [15] and the results showed that the former outperformed the latter with a recognition performance of 91% versus 85%, respectively. Testing was conducted using a database of 200 training images of 25 car make groups with 8 samples each under varying illumination and occlusion conditions. Another technique presented in Ref. [16] utilizes a Speeded-Up Robust Feature (SURF) descriptor-based algorithm. The solution was tested on three databases of toy car images in which an accuracy of more than 90% was obtained. A study was conducted in Ref. [17] to evaluate the performance of different algorithms, including the Scale
make and model (and optionally the colour), thus resulting in improved surveillance and tracking performances of vehicles from video streams of deployed cameras on roads and buildings. Another benefit of AVMMR is that the amount of footage to be screened can be significantly reduced when an incident is flagged, thus making it very useful for use in forensic investigations. Many techniques have been proposed for the classification of car makes and models. Various physical vehicle structures in images can be useful for recognizing cars. It can include certain identities of vehicles, such as the shape of different parts, logos, etc. The symmetric structures of the car can be captured with the aid of features such as the symmetric SURF as in Ref. [2]. Grille patterns and headlight patterns also serve to distinguish between different vehicle makes and models. The desired characteristic should be encapsulated in the region of interest (ROI) fed to the part of feature extraction and further processing. The work of this paper aims to develop an Automatic Vehicle Make and Model Recognition system. It is envisaged that such a system will be combined with ANPR technology to fight car cloning. In this work, the deployment of a combination of modified feature point detectors and their use in an AVMMR are examined. The aim was to determine the best approach to maximize the recognition performances under illumination, occlusions and noise artefacts, especially when images are extracted from video cameras installed on motorways, roads, and public places. To achieve this, various approaches have been adopted by modifying the implementation of the detectors through their combinations using a multi-scale decomposition methodology. To validate the results, four datasets available in the literature were used. The remainder of the paper is organized as follows. Section 2 reviews the existing works related to ANPR and vehicle make and model recognition. Section 3 presents the proposed system with the detailed description of feature extraction and matching in Section 4. Section 5 provides the experimental aspects and results of our work. The results are further analysed in Section 6 and are followed by conclusions in Section 7. 2. Related works Several algorithms have been proposed to extract vehicle number plates. An ANPR system mainly consists of the plate region extraction and character recognition tasks. The two step approach for license plate recognition described in Ref. [3] follows two major steps: candidate license plate region extraction using a line density filter and license plate verification using a cascaded license plate classifier trained on colour saliency features. Addressing complex scenes that involve reflective glare on license plates still remains a major issue. Rahim and Iman presented an online ANPR system to address unclear license plates, weather, lighting and traffic variations [4]. Number plate segmentation and detection involve several stages, including thresholding, connected component labelling, RANSAC application and character detection using a Support Vector machine (SVM). Prior to the recognition stage, the obtained plates are classified into three classes of clean, medium and dirty and adaptive thresholding is generally used in the next stage to utilize this information. It is then followed by a scale invariant feature extraction and an SVM based character classifier. The plate localization stage proposed in Ref. [5] is based on the strong Convolutional neural network (CNN) architecture to help identify inherent localization failures. A segmentation free optical character recognition (OCR) in this method uses Hidden Markov Models (HMMs) and a Viterbi decoding to complete the process. Another CNN based license plate recognition system is deployed for Chinese plates in Ref. [6]. A detailed review of the major techniques for ANPR can be found in Ref. [7]. Many research works have been proposed in the field of car model recognition. For example, the authors [8] make use of a template matching strategy for finding the similarity of a query car image to a known model in the database. Pre-processing is applied for noise 130
Computers in Industry 100 (2018) 129–136
S. Al-Maadeed et al.
Table 1 Summary of existing methods for car make and model recognition. Ref.
Feature Method
Classifier
Training Dataset
Testing Dataset
[17]
SIFT, ORB, SURF, ASIFT and SFOP
180 images, 5 models (36 images each)
[8] [9]
75 images, 5 models (15 images each 44 unknown objects
Similarity matching KNN and Naïve Bayes Euclidian distance
40 images 262 Frontal car images (available) 200 images, 25 make groups, 8 models each
[16]
Gabor filter for feature extraction Round rate, Fourier descriptor, direction ratio and circumference ratio Car shape and back light features Locally normalized Harris and others Two-Dimensional Statistical Linear Discriminant Analysis SURF
Euclidian and Hamming distance Template matching Similarity matching
Bag-of-words
[18] [20] [21] [22]
SIFT SURF Gabor wavelet transform, Pyramid HOG CNN
Euclidian distance SVM kNN, MLP, RF, and SVM Support vector machine
3 databases. A: 20 models with 16 views each, B: 160 images with 20 models and C: 8 models with 16 views each 1225 images 1360 images (17 car makes and models) 600 images (21 car makes) 31148 images (281 vehicle makes and models)
[23]
HOG
Local tiled CNN
29 vehicle images each from 107 vehicle models
[10] [13] [15]
3 types, 12 images (4 images each) Hundreds of images
18 images 71 test images 8 car types for the query
2499 images 13333 images (281 vehicle makes and models) 1 vehicle image each from 107 vehicle models
finest parts and the global regions are fed together to an SVM classifier. Another approach using Deep Learning is discussed in Ref. [23] and uses the frontal view of the vehicle images to generate the Histogram of Oriented Gradients (HoGs). Then, the local tiled CNN (LTCNN) method was proposed as a variant of the tiled CNN and is capable of learning various linear functions for the different local maps. The extracted HoG features are fed to the LTCNN and the results have shown a classification accuracy of 98%. An unsupervised CNN technique was proposed in Ref. [24] using a layer skipping approach using both global and local features and a softmax regression classifier. A reasonably large dataset suitable for validating car make and model detection algorithms is provided in Ref. [25]. It includes images corresponding to 163 different makes and 1716 models. Images are further categorized as belonging to 5 different viewpoints and 8 different car-parts. They have further demonstrated a few applications of the dataset, including car model classification and verification based on the CNN. Rear side view indicates a maximum classification accuracy of 77.7% among the various viewpoint images analysed. Among the car part images, the tail light exhibited the highest classification accuracy, which came to 68.4%. Another diverse dataset for vehicle make and model recognition was contributed by Faezeh Tafazzoli et al. in Ref. [26]. It provides 9170 different classes that make up 291752 images and it covers models manufactured from 1950 to 2016. The experimental results for the application of deep CNN methods on the dataset are also presented in the work. A detailed summary of the various techniques investigated in the field of automatic VMMR is depicted in Table 1. The table reveals that a number of different feature types have been exploited for car make and model recognition. Even though the SIFT method is already used for feature extraction in Ref. [17] and the Harris method in [13], a combination of SIFT with a range of feature detectors, including Harris, have been implemented in this work. Moreover, Ref. [17] proved that the Affine SIFT (ASIFT) has the best accuracy compared to other methods such as SURF, SFOP, ORB, etc. SIFT can be an ideal feature descriptor and is expected to provide better results when combined with different detectors. Our experiments involve the datasets used in Refs. [13], [15], and [25]. A comparatively larger dataset in the work [25] can provide a significant comparison of the system’s performance.
Invariant Feature Transform (SIFT), SURF, Oriented FAST and Rotated BRIEF (ORB). It also includes the Affine-SIFT (ASIFT) and Binary Robust Invariant Scalable Keypoints (BRISK), which are variants of SIFT and ORB for car make detection. The best performance according to this study was provided by ASIFT. The car make and model recognition system discussed in Ref. [18] was aimed at low contrast images utilizing the SIFT method. A series of pre-processing steps including license plate detection and car brand region of interest (ROI) detection were carried out prior to the feature extraction stage. Based on Euclidean distance matching, an accuracy of 75% was achieved to identify 30 different car makes. The Bag of SURF (BoSURF) features were tested in Ref. [19] for real-time vehicle make and model recognition. The technique employs an offline SURF feature extraction method generated from the training samples where the dominant ones were retained in a bag or dictionary to represent the vehicle images as BoSURF histograms. These BoSURF representations from various makes and models have been used by applying an SVM classifier to perform the classification process. The efficacy of the methodology is demonstrated in terms of an improved accuracy and the speed of the operations. Two specific approaches for car make and model recognition were proposed in Ref. [20]. The technique uses a combination of SIFT and SURF detectors and visual content scheme that utilizes Edge descriptor histograms. A recognition accuracy of 91.7% and 97.2% was obtained for the recognition of 17 different classes, respectively. Chen et al. presented a vehicle detection and a make and model identification approach in Ref. [2]. The desired vehicle ROI is identified by a symmetric matching of SURF pairs. To achieve this, a frontal vehicle ROI is divided into different grids from which the Histogram of Oriented gradients (HoG) descriptors are extracted. A sparse representation scheme is adopted with HOG to build the classifier using a Hamming distance approach. A Gabor pyramid HOG feature extraction method forms the basis of vehicle type classification technique proposed in Ref. [21] where a cascaded classifier ensemble is deployed to avoid misclassifications. Each type of features was combined with the four different classifiers, including the K Nearest neighbour (kNN), Multilayer perceptron (MLP), Random forest (RF) and SVM, and a majority voting technique to enhance the accept/reject decisions based on the ability of correct classification. To achieve this, the rejected patterns that may be more difficult to classify were fed to the second stage that is comprised of MLPs. The confidence of the prediction is then guaranteed by this ensemble classification scheme. Jie Fang et al. proposed a CNN framework [22] for locating the most discriminative parts of the entire car image. Furthermore, the features are extracted from the trained CNN models of the finest parts and the whole part of the vehicle images. The local image features from these
3. System overview This paper proposes a novel and robust system to identify car makes and models. The recognition performance for the vehicle make and model recognition is investigated using the concept of covariant feature detectors and corresponding descriptors. The method combines six 131
Computers in Industry 100 (2018) 129–136
S. Al-Maadeed et al.
4.1. Feature detectors Lowe proposed the Scale Invariant Feature Transform, which is one of the most popular object detection algorithms [27], and later improved the method as discussed in Ref. [28]. It allows a user to match different images and determine the similarities between them. It is based on a robust selection of keypoints that describe the patterns of the image. Scale Invariant Feature Transform (SIFT) is introduced as a combination of the DoG detector and a corresponding SIFT feature descriptor. These are described as covariant feature detectors in a sense that extracted features are independent of the illumination, scale, noise and rotation variations. The various keypoint detection methods that are included in our study are described in the following subsections. 4.1.1. DoG (Difference of gaussian) The method commences with the generation of the scale space of the image by the convolution of different sized Gaussian kernels with the image. The difference of the Gaussian is further obtained as the difference of the image blurred with two different kernel sizes. The keypoints selected finally represent the local extrema of the differences of the Gaussians (DoG).
Fig. 1. System block diagram.
different feature detectors with the SIFT descriptor. These feature detectors include the DoG, Hessian, Harris-Laplace, Hessian-Laplace, Multiscale- Harris and Multiscale Hessian. The detector can identify a number of frames (keypoints) from the image in a way that is consistent with certain variations of the illumination, viewpoint, etc. Later, the descriptor associates a specific signature to these frames, which allows for the identification of their appearance in a compact and robust manner. Fig. 1 shows the block diagram of the proposed car make recognition system. The keypoint detection and feature descriptor computations are followed by a matching procedure based on the Euclidian distance. This matching stage attempts to find the similarity between the query image and the known car make-model images based on the number of closest descriptors found in both. The ranking of this number of valid keypoints serves to highlight the image with the maximum similarity. A pre-processing stage is included in the system to identify the regions of interest (ROI) for the images that also contain background [13]. These ROIs should contain visually discriminating information that enables the extraction of distinguishing features of the various makes and models. The methodology described in Ref. [13] is followed to detect the ROI (Fig. 2). Initially, the license plate is located, and the box containing the width of the vehicle, lights, and grille are captured according to the measurements stated in this work.
4.1.2. Hessian detector The Hessian detector uses the local extrema of the determinant of the Hessian operator. In other words, it searches for the image locations that have strong changes in the gradient. It uses the Hessian matrix, which is composed of the second order derivatives of image intensity. 4.1.3. Laplacian of the harris detector The Harris Laplace detector combines the Harris operator for corner-like structures along with the scale selection procedure. The invariance of the candidate points to scale, rotation, illumination and camera noise variations make them powerful [29]. The main drawback is that this method returns only a small number of points compared to the DoG detector. 4.1.4. Laplacian of the hessian detector The Hessian Laplace detector uses the extrema of a multi-scale determinant of the Hessian operator and a multi-scale Laplacian operator to ensure the localization in space and scale, respectively. The scales found using the Laplacian operator are named the characteristic scales whose keypoints convey the maximum information [30]. In this case, the Laplacian represents the trace of the Hessian matrix. 4.1.5. Multiscale harris In this method, features are detected spatially at multiple scales using the multiscale determinant of the Hessian operator, but their scale is not estimated and is instead computed using the determinant and the trace of the adapted second moment matrix [29]. For each level, the interest points can be extracted by identifying the local maxima in the 8-neighbourhood of a point. The Laplacian-of-the-Gaussian is utilized to find the maxima over the scale. Only the points for which the Laplacian holds a maxima value are considered.
4. Feature extraction and matching The following section briefly describes the feature detector/descriptors considered in the analysis. The SIFT detector/descriptor basically incorporates the DoG keypoint extraction algorithm (detector) along with the SIFT descriptor. In this work, we also examine the five other feature detectors, including the DoG, Hessian, Harris-Laplace, Hessian-Laplace, Multiscale-Harris and Multiscale Hessian, with the SIFT descriptor.
4.1.6. Multiscale hessian The Multiscale Hessian is an approach similar to multiscale Harris, where the multiscale determinant of the Hessian operator is utilized. 4.2. Feature descriptor: SIFT To achieve rotational invariance, a constant orientation is assigned to each keypoint based on the local image’s properties. An orientation histogram is created using the gradient of the orientations within a region around the extracted keypoints. Then, the samples are added to the histogram using the weight of the Gaussian-weighted circular window and the gradient magnitude of the samples. The highest peak
Fig. 2. License plate localization and ROI extraction. 132
Computers in Industry 100 (2018) 129–136
S. Al-Maadeed et al.
against all images of the training set of each corresponding database, and the rank of the matching image was determined. The CMC for each case was built by finding the number of times during the test that a true match was found within the m top matches (m = 1,2, …, M). The results obtained are shown in Tables 2–5 for Database 1, Database 2, Database 3 and Database 4, respectively. The different ranks considered here are m = 1, 2, 3, 5 and 10. As seen from the tables above, the SIFT detector produced the best results when combined with a multiscale approach, such as the Harris or Hessian. However, for Database 1, the SIFT-DoG exhibited the highest classification accuracy. The results of over 96% for Ranks 1, 2 and 3 and even 100% for Rank 10 obtained using database 2 indicates that the correct classification is achieved within a short list of possible makes and models. Moreover, the images are affected by blurring and varying illumination conditions. Having the multiscale Hessian and the multiscale Harris outperform others in a majority of instances, which suggests that a multiscale approach captures most of the distinctive local structures of the cars. It also indicates that a blob-based approach using a multiscale decomposition is able to withstand distortions, including scale, blurring, and noise. These observations can be supported as follows. The keypoints of cars that are identical in make-model but differ when captured under various environments, such as illumination, scale variations or occlusions, are well captured by the SIFT technique using a multiscale approach. An important aspect of the investigated approaches relates to the computational complexity. A high operational and search speed is desired to facilitate human responses in practical security situations. The computational complexity for each algorithm is given in Table 6 in terms of their execution time in seconds. This time is calculated as the average time obtained from the processing of 5 random images of each database. The SIFT+ Harris Laplace is the fastest algorithm. The SIFT+ Multiscale Hessian has the most computational intensity with a time of approximately 2.5 s, except for Database 3 and Database 4, which require 10 s and 32 s, respectively, due to a greater number of comparative images. The best performing algorithms are the DoG, Multiscale Hessian, and Multiscale Harris, respectively, for Databases 1, 2 and 3. These algorithms take approximately 1.20, 2.36 and 1.67 s for each of the databases, respectively, which make the algorithms reasonable in terms of the computational complexity.
will correspond to the keypoint orientation. In addition, every orientation that has an importance of more than 80% than the main orientation will be used to create another keypoint, which differs from the first one by its orientation only. Finally, each descriptor is calculated according to the scale where the keypoint has been selected. 4.3. Combination and matching The objective was to combine different detectors with the SIFT descriptor to determine the best model for good performance results. Image matching is accomplished here with a Euclidian distance between the descriptors. For any given keypoint from a test image, the Euclidian distance is computed between its descriptor and all descriptors of the training images where the best match is related to the smallest distance. Another condition to be considered, as introduced by Lowe, is the threshold between the best match and the second-best match. If the ratio is not greater than the threshold, the match is not accepted. 5. Experiments and results To evaluate the proposed method, we have used the following four databases. Database 1 [15]: This database consists of 200 images comprising 25 model groups. Each group contains 8 images of different cars belonging to the same make-model. All images are in greyscale with a resolution of 140 × 70 pixels, which include a region around each car image. Training was conducted on 120 images and the testing on 72 images. Database 2 [15]: The second database is similar to Database 1, but the images were taken under varying conditions, including illumination and blurring, but at a higher resolution of 150 × 66. The database consists of 8 makes and 17 models, with some models being different depending on the production year. Training comprises 154 images and the testing was performed on 96 images. Database 3 [13]: This database contains 262 frontal car images and 74 car model classes. There are 21 common vehicle classes that have five or more images with a total of 177 images. Another 53 uncommon vehicle classes, which typically have one or two samples, are the remaining 85 images. The images are in colour and were captured with a high-resolution of 2592 × 1944 pixels. The training partition includes 85 images and the testing partition comprises 177 images. We have resized the extracted image’s ROI to 128 × 128 prior to the training and testing phases. Database 4 [25]: Database contains images for 163 different car makes and 1716 different models. Among these, 136726 images show the whole car image with background and 27618 images represent one of the 8 different car parts. such as the headlight, tail light, etc. For our algorithm’s validation, we have selected the car part images corresponding to the ‘Air Intake’ feature, which includes 3407 images. Training and testing involve 1374 and 1146 images, respectively, as used in the experiments of the concerned work. All the images were resized to 128 × 128 prior to running the training and testing algorithms. Fig. 3 illustrates the car makes and models taken from Databases 1, 2, 3, and 4. The matching thresholds used in the Euclidean classifier stage are set to 1.9, 1.7, 2.0, and 2.0 for database 1, database 2, database 3 and database 4, respectively. The Cumulative Match Curve (CMC) was used to measure the algorithm’s performance. The performance metrics are usually used in biometric systems to return the ranked list of candidates, such as car makes [31]. To identify an unknown car make model, the CMC’s lists of m car makes and models from the dataset were ranked from the best match to the worst (m < < M, where M is the size of the database). A forensic scientist can authenticate the results. In the testing phase, a test image (of specific car make) was matched
6. Discussion Car make and model recognition are beneficial tools to address car cloning, reduce car theft and aid car monitoring. This paper investigated various local feature detectors that are invariant to geometric and other environmental distortions, including illuminations, occlusions, and noise, to recognize car makes and models. A number of techniques were evaluated using four real databases to compare and contrast their performance. The experimental results suggested that the SIFT detector, when combined with the DoG or a multiscale decomposition strategy using either the Hessian or Harris blob-based methods, gives the best results and achieves a recognition accuracy of over 91% (Rank 1) and over 95% (Ranks 2 and above) for datasets except for Database 4. The high variability of the image make-models have ended up with a lower recognition rate of 50.87% for the same. However, it provides an approximately 2.5% improvement over the CNN based approach presented in Ref. [25]. The experimental results show that various robust local feature point methods can outperform the other methods in the literature. Table 7 shows the results with a comparison of the existing algorithms. The method comprising the SIFT and DOG delivered the best result of 91.67% for Database 1 compared to Ref. [15]. The SIFT and Multiscale Hessian methods also scored better, with a marginal drop in accuracy. Meanwhile, in Ref. [15], the classification accuracy reached approximately 91%, even after removing the 100 least significant eigenvectors that make use of the 2D-LDA for classification. The 133
Computers in Industry 100 (2018) 129–136
S. Al-Maadeed et al.
Fig. 3. Samples of the car make and models used in the analysis. Table 2 Performance matching using the CMC metric for Database 1.
Table 4 Performance matching using the CMC metric for Database 3.
Method
1st Rank (%)
2nd Rank (%)
3rd Rank (%)
5th Rank (%)
10th Rank (%)
Method
1st Rank (%)
2nd Rank (%)
3rd Rank (%)
5th Rank (%)
10th Rank (%)
SIFT + DoG SIFT + Hessian SIFT + Harris-Laplace SIFT + Hessian-Laplace SIFT + Multiscale Harris SIFT + Multiscale Hessian
91.67 81.94 72.22 86.11 84.72 90.28
95.83 86.11 83.33 87.50 91.67 91.67
97.22 90.27 87.5 90.27 93.05 94.4
97.22 93.05 88.89 95.83 95.83 97.22
97.22 95.83 94.44 97.22 98.61 97.22
SIFT + DoG SIFT + Hessian SIFT + Harris-Laplace SIFT + Hessian-Laplace SIFT + Multiscale Harris SIFT + Multiscale Hessian
89.83 81.35 82.48 85.87 93.78 88.13
91.52 87.00 86.44 93.22 95.48 90.39
94.35 90.39 90.39 94.35 96.04 93.22
96.61 93.22 93.78 97.17 97.17 96.61
98.87 97.17 96.61 98.30 98.30 97.74
Table 3 Performance matching using the CMC metric for Database 2.
Table 5 Performance matching using the CMC metric for Database 4.
Method
1st Rank (%)
2nd Rank (%)
3rd Rank (%)
5th Rank (%)
10th Rank (%)
Method
1st Rank (%)
2nd Rank (%)
3rd Rank (%)
5th Rank (%)
10th Rank (%)
SIFT + DoG SIFT + Hessian SIFT + Harris-Laplace SIFT + Hessian-Laplace SIFT + Multiscale Harris SIFT + Multiscale Hessian
87.5 86.46 84.38 90.63 88.54 96.88
92.71 89.58 86.46 95.83 96.87 98.96
93.75 91.67 90.62 96.87 96.87 98.96
96.87 92.71 92.71 96.87 97.92 100
98.96 94.79 97.92 98.96 98.96 100
SIFT + DoG SIFT + Hessian SIFT + Harris-Laplace SIFT + Hessian-Laplace SIFT + Multiscale Harris SIFT + Multiscale Hessian
33.51 40.84 38.22 45.11 49.48 50.87
43.37 50.96 45.37 54.89 57.77 58.81
48.60 57.24 50.35 60.47 61.95 63.26
54.80 63.70 56.28 66.14 67.45 70.16
64.75 71.20 63.44 72.95 73.65 77.05
elimination of these eigenvectors corresponds to ignoring the effects due to illumination variations and occlusions. It is evident that a comparatively high classification rate of 96.88% was achieved for Database 2. The SIFT method, when combined with Multiscale Hessian, outperformed the 2D-LDA method and delivered a drastic increase in accuracy. The decrease of the 100 least significant eigenvectors [15] shows a major accuracy improvement. However, the results are 10% less accurate than the results of the proposed method. These results for Database 3 were compared with Ref. [13], which indicates the results of 10 different classification systems by varying the combination of feature extraction and classification strategy. The
Table 6 Comparison of the computational complexity of each method.
134
Method
Database 1
Database 2
Database 3
Database 4
SIFT + DoG SIFT + Hessian SIFT + Harris-Laplace SIFT + Hessian-Laplace SIFT + Multiscale Harris SIFT + Multiscale Hessian
1.20 0.65 0.25 0.90 0.85 2.55
0.22 0.40 0.20 0.72 0.55 2.36
0.96 1.72 0.43 2.53 1.67 9.54
1.15 3.63 1.23 10.71 4.71 32.22
Computers in Industry 100 (2018) 129–136
S. Al-Maadeed et al.
Table 7 Summary of performance matching using the four different datasets. Method
Database 1
Database 2
Database 3
Database 4
[13] Locally normalized Harris corners + Naïve Bayes classifier [13] Pixel level Harris corners + kNN [15] Two-Dimensional Statistical Linear Discriminant Analysis [25] CNN SIFT + DoG SIFT + Multiscale Hessian SIFT + Multiscale Harris
–
–
96
–
– 87 (91 after dropping the 100 least significant) – 91.67 90.28 84.72
– 70 (87 after dropping the 100 least significant) – 87.5 96.88 88.54
78 –
– –
– 89.83 88.13 93.78
48.40 33.51 50.87 49.48
No. QUUG-CENG-CSE-14/15-7. The findings achieved herein are solely the responsibility of the authors.
exploitation of Harris corner features with the nearest neighbour classifier resulted in 78% recognition accuracy only. However, the Naive Bayes classifier could deliver the best result of 96%. Our result of 93.78% on the subsampled version of the images demonstrates the efficiency to perform well under the conditions of low image resolutions. In the experiments of the concerned work, position/scale normalization was deployed on the database prior to the frontal ROI extraction. This involves the manual intervention of marking the number plate corners to map them to canonical positions. Thus, the position, scale, rotation, and skew of the plane containing the license plate are normalized. In our experiments, the license plate is automatically located, from which the ROI values are extracted according to the box measurements given in Ref. [13]. The recognition rate for the part-based car images with database 4 have also exhibited reasonable improvement over the existing work done using the CNN. It is also important to specify the utilization of the subsampled version of the original images for our experiments. This highlights the fact that the SIFT- multiscale Hessian and a basic Euclidean matching method could even outperform the deep learning approach. The discrimination capability of SIFT was used with various feature detectors for the realization of car make and model recognition. Different combinations were proved to deliver the best results for each database. The rank 5 results for Database 2 reached up to 100% with the SIFT and the multiscale Hessian approaches. The algorithmic complexity was also reasonable, with less than 2 s average computation time for the 4 databases. The combination of the SIFT with a detector, such as the DOG, Multiscale Hessian, or Multiscale Harris, produces the best classification results.
References [1] Electronic Frontier Foundation, Street-level Surveillance, (2018) (Internet: https:// www.eff.org/pages/automated-license-plate-readers-alpr , Last Accessed: January 2018). [2] L. Chen, J. Hsieh, Y. Yan, D. Chen, Vehicle make and model recognition using sparse representation and symmetrical SURFs, Pattern Recognit. 48 (2015) 1979–1998. [3] Y. Yuan, W. Zou, Y. Zhao, X. Wang, X. Hu, N. Komodakis, A robust and efficient approach to license plate detection, IEEE Trans. Image Process 26 (2017) 1102–1114. [4] R. Panahi, I. Gholampour, Accurate detection and recognition of dirty vehicle plate numbers for high-speed applications, IEEE Trans. Intell. Transp. Syst. 18 (2017) 767–779. [5] O. Bulan, V. Kozitsky, P. Ramesh, M. Shreve Segmentation-and, Annotation-free license plate recognition with deep localization and failure identification, IEEE Trans. Intell. Transp. Syst. 18 (9) (2017) 2351–2363. [6] Y. Liu, H. Huang, J. Cao, T. Huang, Convolutional neural networks-based intelligent recognition of Chinese license plates, Soft Comput. (2017) 1–17. [7] S. Du, M. Ibrahim, M. Shehata, W. Badawy, Automatic license plate recognition (ALPR): A state-of-the-art review, IEEE Trans. Circuits Syst. Video Technol. 23 (2013) 311–325. [8] T.R. Lim, A.T. Guntoro, Car recognition using Gabor filter feature extraction, 2 (2002) 451–455. [9] B. Cai, X. Zhang, D. Zhang, Y. Lu, Fuzzy clustering based car recognition algorithm 2009, International Conference on Information Engineering and Computer Science, Wuhan, 2009, pp. 1–4. [10] D. Santos, P.L. Correia, Car recognition based on back lights and rear view features, (2009) 137–140. [11] G. Doretto, Y. Yao, Region Moments Fast invariant descriptors for detecting small image structures, (2010) 3019–3026. [12] H. Huang, Q. Zhao, Y. Jia, S. Tang, A 2dlda based algorithm for real time vehicle type recognition, (2008) 298–303. [13] G. Pearce, N. Pears, Automatic make and model recognition from frontal images of cars, (2011) 373–378. [14] M. AbdelMaseeh, I. Badreldin, M.F. Abdelkader, M. El Saban, Car Make and Model recognition combining global and local cues, (2012) 910–913. [15] I. Zafar, E.A. Edirisinghe, B.S. Acar, H.E. Bez, Two Dimensional Statistical Linear Discriminant Analysis for Real-time Robust Vehicle Type Recognition, (2007). [16] D.M. Jang, M. Turk, Car-Rec. A real time car recognition system, (2011) 599–605. [17] H. Sperker, A. Henrich, Feature-based object recognition—a case study for car model detection, (2013) 127–130. [18] P. Badura, M. Skotnicka, Automatic car make recognition in low-quality images, Inf. Technol. Biomed. 3 (2014) 235–246. [19] A.J. Siddiqui, A. Mammeri, A. Boukerche, Real-time vehicle make and model recognition based on a bag of SURF features, IEEE Trans. Intell. Transp. Syst. 17 (2016) 3205–3219. [20] R. Baran, A. Glowacz, A. Matiolanski, The efficient real-and non-real-time make and model recognition of cars, Multimedia Tools Appl. 74 (2015) 4269–4288. [21] B. Zhang, Reliable classification of vehicle types based on cascade classifier ensembles, IEEE Trans. Intell. Transp. Syst. 14 (2013) 322–332. [22] J. Fang, Y. Zhou, Y. Yu, S. Du, Fine-grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture, IEEE Trans. Intell. Transp. Syst. 18 (2017) 1782–1792. [23] Y. Gao, H.J. Lee, Local tiled deep networks for recognition of vehicle make and model, Sensors 16 (2016) 226. [24] Z. Dong, M. Pei, Y. He, T. Liu, Y. Dong, Y. Jia, Vehicle type classification using unsupervised convolutional neural network, (2014) 172–177. [25] L. Yang, P. Luo, C. Change Loy, X. Tang, A large-scale car dataset for fine-grained categorization and verification, (2015) 3973–3981. [26] F. Tafazzoli, H. Frigui, K. Nishiyama, A Large and Diverse Dataset for Improved Vehicle Make and Model Recognition, (2017) 874–881. [27] D.G. Lowe, Object recognition from local scale-invariant features, 2 (1999) 1150–1157. [28] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.
7. Conclusion and future work The paper proposed an automatic vehicle make and model recognition system based on the SIFT method and a combination of six different feature detectors to address various external conditions such as occlusions, poor lighting, etc. The proposed approach takes into account the peculiarity of the car model and the combined feature points are deployed using a multiscale decomposition approach, thus ensuring invariance against scale and rotation distortions, too. Extensive experimentation was carried out using four datasets consisting of images with high variability. Future work aims to investigate the algorithmic performance on real-time videos comprising video surveillance data captured on a local campus. Other classification methods can also be investigated with our feature detector- descriptor combinations. The video may include numerous car images under varying illumination conditions. Further optimization of the existing algorithm is also expected to meet real-time video processing requirements. Acknowledgements This publication was supported by Qatar university Internal Grant 135
Computers in Industry 100 (2018) 129–136
S. Al-Maadeed et al.
Matching, (2007). [31] S. Almaadeed, A. Bouridane, D. Crookes, O. Nibouche, Partial shoeprint retrieval using multiple point-of-interest detectors and SIFT descriptors, Integr. Comput.Aided Eng. 22 (2015) 41–58.
Comput. Vision 60 (2004) 91–110. [29] M. Hassaballah, A.A. Abdelmgeid, H.A. Alshazly, Image features detection, description and matching, image feature detectors and descriptors, Springer (2016) 11–45. [30] A. Bhatia, Hessian-Laplace Feature Detector and Haar Descriptor for Image
136