Available online at www.sciencedirect.com
ScienceDirect Procedia Technology 16 (2014) 1215 – 1227
CENTERIS 2014 - Conference on ENTERprise Information Systems / ProjMAN 2014 International Conference on Project MANagement / HCIST 2014 - International Conference on Health and Social Care Information Systems and Technologies
Image-based descriptors for snail classification by species M. Belbuta,*, N. Martins-Ferreiraa,b, N. Alvesa,b a
CDRSP, Marinha Grande, Leiria 2410, Portugal b ESTG, IPL, Leiria 2410, Portugal
Abstract We present a method for extracting a set of descriptors from snail images that allows for efficient statistical classification of four different species. © © 2014 2014 The The Authors. Authors.Published Publishedby byElsevier ElsevierLtd. Ltd.This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Peer-review under responsibility of the Organizing Committees of CENTERIS/ProjMAN/HCIST 2014 Peer-review under responsibility of the Organizing Committee of CENTERIS 2014. Keywords: Classification; Quadratic Discriminant Analysis; Image Descriptors
1. Introduction We present the development and implementation of a system for the separation of snails by means of image analysis, for an industrial application. This process was previously accomplished by manual workers. However, due to the yearly fluctuations of the workload, these workers had to be hired each year, and reinforced for peaks in demand or supply, being unattractive and difficult to find people to do it. Expansion of the business required a fast response to peak demand, and also a certifiable production from start to finish. These constraints and limitations lead to the necessity of a purely automated solution. The workflow of the overall process is as follows: the factory receives four different species of snails: Otala Lactea (OL), Helix Aspersa Maxima (HA), Helix Cepaea (HC), Theba Pisana (TP), which has to be separated by species and also by caliber
* Corresponding author. Tel.: +0-000-000-0000; fax: +0-000-000-0000. E-mail address:
[email protected]
2212-0173 © 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). Peer-review under responsibility of the Organizing Committee of CENTERIS 2014. doi:10.1016/j.protcy.2014.10.137
1216
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
(maximal dimension), in prescribed size ranges for each species. After a successful separation the different species are packed and properly accommodated and they are then ready to be commercialized. Nomenclature OL HA HC TP
Otala Lactea (OL), coded as number 1 Helix Aspersa Maxima (HA), coded as number 2 Helix Cepaea (HC), coded as number 3 Theba Pisana (TP), coded as number 4
We will first give an overall view of the whole process and then in the remaining section we will give the necessary details which concern the image recognition and processing that make the process works with the expected results. The first step was to characterize, in terms of dimensions, the population of snails. This confirmed the existence of overlap between species, precluding the purely mechanical separation. Analysis of that data led to establishing the proper ranges for each species, as presented below. A mechanical system was designed and built for washing and separating three broad ranges of sizes. The first mechanical modules feed the washed and partly dried snails sequentially to the computer vision module, which after inspection and classification, sends information to a Programmable Logic Controller (PLC) controlling a set of pneumatic valves for the physical separation on the snails using air jets. The three main phases in the overall process which were automatized are displayed in Figure 1.
Figure 1: The three main phases of processing
Initial research available in literature on the identification traits of these snail species showed that small specific features were used for precise species identification. Visual inspection of a sample population for each species revealed subtle differences in patterns and shape between different species, large variations of characteristics intraspecies, and several stable equilibrium poses for each specimen. A sample population was washed and photographed under controlled flash illumination with a Canon digital camera. Photographs were taken using different colored matte backgrounds, and on different stable equilibrium poses. In order to allow a consistent dimensional comparison, the geometrical configuration for the image acquisition was kept constant during the whole process, i.e. keeping a constant fixed distance to the subject and using a constant focal length. Between 12 and 20 specimen of each species were photographed. As we will see, the most convenient way to model the problem is to identify a collection of descriptors that are relevant in characterizing the images in the following way: if two given snails, from different species, are pictured in two different images, then the descriptors of each one of the images are such that the classification will give different classes. This is a very important aspect of the overall project and the solution which we are presenting here was the result of experience with many trials.
1217
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227 Table 1 Examples of each species and respective caliber for commercialization HA Caracoleta
HC Riscado
TP Amarelo
OL Branco
Species
Caliber
Small Med. Large
26 – 31 mm 31 – 37 mm > 37 mm
Single size
> 25 mm
Single size
> 20 mm
Small
17 – 20 mm
Large
> 20 mm
Formally, we are interested in classifying a set of images, each one containing the picture of a snail, accordingly to its species. This means that we have to build a procedure which accepts as its input a collection of images and associates to each one of them a number, 0,1,2,3 or 4, accordingly to whether the snail picture in each image is not recognized as belonging to any one of the four species presented in table 1, and in this case it returns 0, or if it is recognized as being of the specie OL, HA, HC TP, returning respectively 1,2,3,4. As it is observed in diagram flow of fig. 2, the outcome of the procedure returns a four-tuple [p1,p2,p3,p4] of probabilities with p1+p2+p3+p4=1 and pi, i=1,2,3,4 represents the probability or the likelihood of the snail, picture in the input file, being of type i=1,2,3,4. Alternative approaches were also considered. Three approaches were initially considered: neural networks, rulebased, or statistical. The neural net approach was dismissed due to the difficulty of assessing the confidence of the resulting classification. A rule-based system built on fuzzy logic would require more thorough understanding of the characteristics of each species, which was not our goal. In the end, the statistical approach was selected. While solving the overall problem we also had to deal with a considerable number of considerations that had to be taken into account, namely the major constraints the system should respond to; the overall workflow of the processing; the initial dataset and testing; the details of the image processing phase; the details of the classification phase; the quantitative results; etc. We choose not to mention any of those details in here and rather we restrict ourselves to the formalization of the problem; description of the method used to solve the problem; the chosen set of descriptors to characterize the acquired images of the snails; some results and further developments and optimizations which are still possible but that were not implemented. 2. Formalization of the problem and the selected method for its solution In this section we formalize the problem from a mathematical point of view, and present the selected method for the implementation of the solution of our problem. 2.1. Formalization of the problem The problem consists in classifying the picture of a snail, which is obtained from the input of an image file, provided either manually by the user or automatically from a camera acquiring images in real time. In each case the output must be a number from 1 to 4, if the picture is recognized and classified as one of the species form Table 1, or the output returns the number 0 if such a classification is not possible. The first difficulty in this work was to identify the needed characteristics which could be extracted from a picture in order to better characterize it for our purposes. Many other approaches were possible at this point, for example we could choose to extract complex features from an image such as a list of vectors or even a list of matrices. For the purpose of simplicity, we decided to extract only characteristics suitable to be described by real numbers.
1218
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
After the assumption that only real number descriptors would be considered we looked for possible methods of classification. After some analysis we arrive at the conclusion that the so-called Method of Quadratic Discriminant Analysis, due to Fisher [1], was the most appropriate for our context.
2.2. Overview of Discriminant Classification method. Discriminant Analysis is a method for supervised statistical classification, whereby the probability of a class dependent variable value is predicted based on a number of continuous independent variables (descriptors or features) for each sample. Being supervised, means that the dependent variables are predetermined and a training set with known categories is used to construct the model. This is done by fitting a set of internal parameters as to maximize the variance between different categories' descriptors while minimizing the variance within the same category. Linear discriminant analysis uses a linear combination of the descriptors, but assumes Gaussian distribution. A more flexible variant is Quadratic discriminant analysis, which we used in this case.
2.3. Descriptor selection and dimensionality reduction. Dimensionality reduction is the process or reducing the number of independent variables input to the classifier, either by dropping some variables or by computing new derived ones. This might be required because the raw data has a large number of variables, not all of them with a predictive value, which might impact not only the computational performance but also the efficacy of the classifier. This can be achieved by several means: using prior knowledge, by statistical analysis, or by trial-and-error. Table 2 Examples of each specie and respective mask used in the process of extracting the descriptors HA Caracoleta
HC Riscado
TP Amarelo
OL Branco
In selecting a set of descriptors, the computational cost of each has to be considered. The selected descriptors represent a compromise between the efficacy of the resulting classifier and the speed of processing. Although in some computer vision applications the whole raw pixel collection that forms an image could be considered as the independent variables, for our descriptors we could not use the raw pixels, as that would be too large; down-sampling the images would mean losing some potentially important information. So we tried to extract some aggregated descriptors that would contain both colorimetric and geometric information. These should be normalized in such a way that neither the colorimetric nor the geometric information would explicitly depend on the size. So the average values for each of the red, green and blue channel, the total area, and the perimeter-to-area ratio are used. These are very easy to compute and proved to be efficient descriptors. We initially attempted to use average hue, saturation and value (HSV color model) instead of RGB, but found no significant benefit (most likely because these are obtained as linear combination of the other).
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
1219
2.4. Selected method for solving the problem First we start with a collection of images for which we know the respective class. This is called a training set of images. With the classified information we can feed the classifier and obtain the Mahalanobis distance for each class. That measure is then used to determine the class of non-classified images. The outline of the procedure is presented in Figure 2.
Figure. 2. Outline of the overall procedure.
Further details on the scheme above are presented in the next subsection were we explain how do the classifiers are obtained and with which purpose. The method selected to solve this problem was the so-called Discriminant Analysis. This method is suitable for problems with a discrete output and with real valued inputs. Overall, this method works by dividing or separating the input space (n-dimensional) in regions associated to each class. More specifically, a boundary is determined that separates each pair of classes. After initial training and constructing of the classifier, actual classification is computationally efficient. In theory, this method requires a multivariate Gaussian distribution of the input. Two variants of the method exits: the original Linear Discriminant Analysis (Fisher) assumes a common covariance matrix for all classes, and results in linear boundaries between each pair of classes. If that assumption fails, the variant Quadratic Discriminant Analysis (QDA) can be used, which uses separate covariance matrices for each class, with the corresponding boundaries being quadratic. One advantage of this method is that it is relatively immune to over-fitting, as compared to Support Vector Machines (SVM) or Nearest-Neighborhood Classification, since it does not explicitly use any particular input data point [2]. 2.5. Descriptors analysis and specification The descriptors used are obtained after segmentation of the image, obtaining a mask. Based on the tests done, according to the criteria described in section 2.3, the following descriptors were selected: x mean R value; x mean G value;
1220
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
x mean B value; x Area; x Perimeter/Area ratio; By using the average mean of each color channel we allow the model to take into account the different characteristic colors of each species, being moreover invariant to the dimensions of each particular specimen. Using the total area of a specimen’s mask makes the model aware of the different dimensions typical for each species. The perimeter to area ratio descriptor takes into account the differences in pose of the pictured specimen. All of these descriptors are invariant under rotation on the image plane. The following table illustrates the values for the selected descriptors in some examples of representative input snails for each class. In Appendix A we present a similar table for some examples of misclassified specimen. Table 3 Examples of descriptors for representative specimen of each species. Mask and cropped image
Area (px2)
Perim/Area (px-1)
Mean R
Mean G
Mean B
60098
0.043762
107.92
84.38
47.228
19933
0.066673
143.05
121.17
50.831
25795
0.038612
153.69
134.54
94.236
10364
0.064743
150.97
131.01
84.763
Some details on the Image Processing and Segmentation are as follows. The segmentation tests revealed that a mild blue background was the most convenient for segmentation purposes, followed by the green. This was expected, due to the overall color of the snails being dark brown and yellow. Obtaining an approximate location of the snail in a frame could arguably be sufficient for the color-based classification, since it uses average color values. Nevertheless, several other considerations make it necessary to obtain an accurate contour for each specimen: first, there is a requirement for identifying damaged snails; secondly, discriminating between some species might require inspection of specific morphological features; and finally, different specimen poses could present distinct color characteristics. In order to speed up processing, masking is done at half the resolution, since the conversion to HSV was one of the most time consuming steps. The low resolution image is converted to HSV color space, and a mask is obtained
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
1221
by thresholding on the hue channel. The obtained rectangular ROI is then up-scaled and used to crop the original image. The balance between processing speed and accuracy led to the selection of a mid-resolution progressive-scan CCD color camera (Bassler Scout scA640-70fc). These cameras can acquire 70 frames per second, and have a 659by-494 resolution. The image acquisition is performed at a fast shutter speed to minimize motion blur due to the moving conveyor. Two cool white, high intensity LED strips with good color restitution are used to provide adequate light intensity and consistent color values. The geometry of the light sources was adjusted to avoid specular reflections or shading, across a length of the conveyor. This allows for some slack in the timing of the acquisition and position of the specimen, and for taking more than one exposure if needed. LED illumination was also selected to allow for stroboscopic lighting (for reduced power consumption and to accommodate faster transport speeds). Contour Inspection Training and Classification The area is obtained by summing the values of the 1-bit Mask: Area=sum(Mask(:)) The first three are simply computed by multiplying the ROI image by the mask, summing for each channel, and dividing by the area: RGB_mean=sum(sum(Img.*Mask))/Area The perimeter is obtained by applying an edge detection filter, namely using the Laplacian of Gaussian (LoG) algorithm; this algorithm was chosen because it can ensure a closed contour. Perimeter=sum(sum(edge(Mask,'log',0))) PerAreaRatio=Perimeter/Area 3. Implementation of the solution, analysis, and examples of application In this section we give further details on the formulas and procedures that were implemented, which are based in [2]. 3.1. Implementation of the solution and results Creating a QDA classifier begins by computing the 'position' (centroid or mean values of each descriptor, ̴ܿ݇) and 'shape' (covariance matrix, ̴ܵ݇) of each class' distribution. The distance of each point ݒin the sample space to each class ݇ is the Mahalanobis distance ݀ሺݒǡ ݇ሻ, defined as follows:
݀ሺݒǡ ݇ሻ ൌ ሺ ݒെ ̴ܿ݇ሻ ݒ݊݅ כሺ̴ܵ݇ሻ כሺ ݒെ ̴ܿ݇ሻԢ
(1)
For the classification of an unknown sample, the Mahalanobis distances to each class are compared. Naturally, if the distance is the same to classes A and B, there is an equal probability of that sample belonging to any of those classes. So, by solving the condition
݀ሺݒǡ ܣሻ ൌ ݀ሺݒǡ ܤሻ for ݒ, we obtain a quadric surface that separates classes A and B.
(2)
1222
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
The classifier assigns an output class by minimizing the cost of misclassification, given the posterior probabilities of sample ݒbelonging to class ݇. The posterior probabilities are computed as a multivariate normal distribution, by (3)
ܲሺݒȁ݇ሻ ൌ ܽ ݔ݁ כሺെ݀ሺݒǡ ݇ሻȀʹሻ
where ݀ሺݒǡ ݇ሻ is the Mahalanobis distance and ܽ is a constant normalization factor for each class. The aforementioned minimization is
yˆ
K
arg min ¦ P (v | k )C ( y | k ) y 1... K
(4)
k 1
If we assume the cost of misclassification ܥሺݕǡ ݇ሻ to be equal to unity for all classes, and zero for the correct class, this is the same as maximizing the posterior probability. 3.2. Analysis of the results Using a dataset consisting of 295 images, we obtain the following resubstitution confusion matrix: Table 4 Confusion matrix Real \Predicted
Class 1
Class 2
Class 3
Class 4
Class 1
70
0
2
0
Class 2
0
117
0
0
Class 3
3
1
36
0
Class 4
0
0
0
66
where each row corresponds to a species, and each column to a classification output. We can see that our training set contains 72 images for class 1, 117 for class 2, 40 for class 3 and 66 for class 4. An analysis of the six misclassified cases is presented in Apendix A. 3.3. Further analysis of the results The graphical visualization of the five-dimension surfaces is not possible, but in some cases a two-dimensional cut can illustrate the mechanism behind this classification method. For illustrative purposes, we present the figure of a misclassified point projected in a plane parallel to two of the axes (namely, the first and second descriptors – Area and Perimeter-to-Area ratio). The white point in center bottom corresponds to sample number three, which belongs to species 1, but is classified as species 3.
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
1223
Fig. 3. Cut of the 5-dimensional space of descriptors. Hue is mapped to the output classification (ݕො), saturation to ‘certainty’ ܲሺݒǡ ݕොሻ, and value to the Mahalanobis distance.
4. Conclusion An application was developed that reads a set of images and prompts the user for the classification. The result is a trained classifier for that set of images. When the user presents a new image, the system outputs a probability of it belonging to each of the previously learned species. Moreover, the user can specify a minimal desired classification rate, that is, (TP+FP)/(NC+FN), and an adjustable specificity TP/(TP+FP), with different weights for each. The system determines an appropriate probability threshold for positive classification. Samples whose maximal probability class has a probability below the threshold will be considered unclassifiable (NC). This method was subject to cross validation randomly selecting 80% of the available samples for training and applying the resulting classifier to the remaining 20% of the population. This procedure was repeated several times giving consistently high ratios of true positives, of about 95% on average for a non-classifiable rate below under 10%. It should be noted however that the relatively high ratio of non-classifiable specimen is in part due to a fortuituos less favorable pose of the specimen in the picture which can be corrected by feeding the non-classified specimens again to the system.
Acknowledgements Research supported by CDRSP from the Polytechnic Institute of Leiria. References [1] Fisher, R. A. The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, Vol. 7, pp. 179-188 [2] Sarah J. Dixon, Richard G. Brereton, Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure. In: Chemometrics and Intelligent Laboratory Systems, Volume 95, Issue 1, 15 January 2009, Pages 1–17.
1224
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
5. Appendix A The following table is analogous to table 3 and illustrates the values for the selected descriptors in some examples of misclassified specimen. Table 5 Examples of descriptors for some misclassified specimen. Area (px2)
Perim/Area (px-1)
Mean R
Mean G
Mean B
1
31132
0. 027399
112.89
89.885
52.019
2
36188
0. 064469
97.214
74.759
43.252
3
26549
0. 036951
98.384
74.798
41.027
4
12471
0. 18154
91.716
73.59
45.957
5
32625
0. 042238
98.833
77.943
45.721
6
34931
0. 042713
108.96
84.923
51.072
N
Mask and cropped image
Again, as it is done in section 3.3 for illustrative purposes, we present the corresponding figures of the above misclassified points.
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
Table 6 Cut of the 5-dimensional space of descriptors. Color is mapped to the output classification (ݕො).
N
1
2
2-D Cut of the 5-dimensional space
1225
1226
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
3
4
M. Belbut et al. / Procedia Technology 16 (2014) 1215 – 1227
5
6
1227