Thresholding of noisy shoeprint images based on pixel context

Thresholding of noisy shoeprint images based on pixel context

Pattern Recognition Letters 28 (2007) 301–307 www.elsevier.com/locate/patrec Thresholding of noisy shoeprint images based on pixel context Hongjiang ...

1MB Sizes 1 Downloads 67 Views

Pattern Recognition Letters 28 (2007) 301–307 www.elsevier.com/locate/patrec

Thresholding of noisy shoeprint images based on pixel context Hongjiang Su *, Danny Crookes, Ahmed Bouridane Queen’s University Belfast, Image and Vision Group, School of EECS, Northern Ireland Science Park, Queen’s Road, Queen’s Island, Belfast BT3 9DT, United Kingdom Received 20 January 2006; received in revised form 26 July 2006 Available online 18 September 2006 Communicated by Y.J. Zhang

Abstract In a typical shoeprint classification and retrieval system, the first step is to segment meaningful basic shapes and patterns in a noisy shoeprint image. This step has significant influence on shape descriptors and shoeprint indexing in the later stages. In this paper, we extend a recently developed denoising technique proposed by Buades, called non-local mean filtering, to give a more general model. In this model, the expected result of an operation on a pixel can be estimated by performing the same operation on all of its reference pixels in the same image. A working pixel’s reference pixels are those pixels whose neighbourhoods are similar to the working pixel’s neighbourhood. Similarity is based on the correlation between the local neighbourhoods of the working pixel and the reference pixel. We incorporate a special instance of this general case into thresholding a very noisy shoeprint image. Visual and quantitative comparisons with two benchmarking techniques, by Otsu and Kittler, are conducted in the last section, giving evidence of the effectiveness of our method for thresholding noisy shoeprint images.  2006 Elsevier B.V. All rights reserved. Keywords: Thresholding; Pixel context; Reference pixels; Shoeprint image

1. Background Shoeprints are often found at crime scenes and provide valuable forensic evidence. It has been estimated that more than 30% of all burglaries provide usable shoeprints that can be recovered from the crime scene (Alexandre, 1996). Because of the pattern of repeated offences, rapid classification of such shoeprints would enable investigating officers not only to link different crimes, but to identify potential suspects while the crime is still ‘hot’. Usually, to classify a shoeprint image, an operator is needed to segment and code basic shapes, such as wavy patterns, concentric circles, logos, etc., which are then used to match the print against a database of shoeprints which have been processed similarly. Problems with this method

*

Corresponding author. Tel.: +44 289 0971884; fax: +44 289 0971802. E-mail address: [email protected] (H. Su).

0167-8655/$ - see front matter  2006 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2006.08.008

are that (i) no attempt has been made to code the spatial information of the patterns, (ii) modern shoes tend to have more intricate and varied patterns which are difficult to describe with the fixed shape set, and (iii) they depend on subjective judgments leading to inconsistent classification. Geradts and Keijzer (1996), Geradts (2002) have proposed an approach for automatic classification of shoe soles. The algorithm first segments a shoeprint image into basic shapes by traditional thresholding, erosion and dilation operations. Then, the Fourier descriptors and invariancemoments are calculated for each basic shape. The best features are selected and then classified with a neural network. Another approach by Alexander et al. (2000) was based on using fractal encoding methods as part of a matching algorithm. Recently, Chazal et al. (2005) have presented a system for automatically sorting a database of shoeprints based on the outsole patterns in the Fourier domain in response to a reference shoeprint. The algorithm utilizes the power spectral density (PSD) function of a shoeprint

302

H. Su et al. / Pattern Recognition Letters 28 (2007) 301–307

image as the descriptor. However, one of the problems in real world automatic shoeprint classification is that a real shoeprint is often poor quality, having lots of noise. This was claimed to cause much trouble when extracting the basic shapes and later when classifying the database of shoeprint images automatically. Fig. 1 gives examples of thresholding a very noisy shoeprint image. The added noise is Gaussian noise with r = 80, psnr = 10.04, and the thresholding methods used are the Otsu’s (1979) and the Kittler and Illingworth (1986). We see from the figure that the thresholded images can hardly be used for extracting meaningful basic shapes or patterns. In this paper, we aim to incorporate some denoising techniques into the process of thresholding, expecting to provide a reasonable solution for thresholding a very noisy shoeprint image. The proposed method can also be applied to other image datasets without modification. Thresholding has been the subject of intense research interest for several decades. In current techniques, the thresholding is usually performed either globally or locally. The global methods use one calculated threshold value to divide image pixels into object and background classes. Some also use several values in the different stages of thresholding. On the other hand, the local schemes use different values potentially for each pixel, selected according to the local neighbourhood (adaptive thresholding). Hybrid approaches utilize both global and local information to decide the thresholds. An overview of the traditional thresholding algorithms is given in (Sezgin and Sankur, 2004), which has divided them into six categories: histogram shape-based, clustering-based, entropy-based, object attribute-based methods, spatial and local methods. It also

compares 40 representative algorithms in terms of an integrated performance criterion. The work in this paper is motivated by a recently developed non-local mean filtering technique (Buades et al., 2005b), which has been verified and shown to be effective in removing different kinds of noise in (Buades et al., 2004, 2005a,b; Kindermann et al., 2004; Mahmoudi and Sapiro, 2005), and has been shown to perform better than a range of traditional denoising approaches. In this paper, we extend this approach to give a more general model, and then combine it into the thresholding process. Roughly speaking, for each pixel, the thresholding depends not only on a global threshold, but also on a decision rule, which refers to its reference pixels and coefficients, called its pixel context (PC). Here, the global thresholding can use any one of the traditional global thresholding techniques, and the decision rule is based on a sum of reference coefficients, which are obtained from the reference pixels of the current working pixel. Fig. 2 gives an outline of the PC-based method. Therefore, our technique is an enhancement of traditionally any thresholding technique, and it is based on a second stage in which initial thresholding decisions are reviewed based on pixel context information. In Section 2, we detail our method, including the introduction of global thresholding algorithms, the definition of the pixel context model, the detail of the implementation, and the evaluation of different thresholding techniques. Section 3 introduces the database of shoeprint images, and compares our approach with the two most referenced thresholding algorithms in terms of the evaluation criterion in Section 2. Finally, we summarize the work in Section 4.

Fig. 1. An example of thresholding a very noisy shoeprint image (a) (r = 80, psnr = 10.04) using Otsu’s (b) and Kittler’s (c) thresholds.

Compute pixel context Shoeprint

Combined decision

Thresholded

image

rule

Shoeprint

Global thresholding

Fig. 2. An outline of our pixel context (PC) thresholding method.

H. Su et al. / Pattern Recognition Letters 28 (2007) 301–307

2. Thresholding based on pixel context (PC) The overview of this method is shown in Fig. 2. Given a shoeprint image, a traditional global thresholding is performed to obtain the pre-classification of an object (shoeprint) and a background. Meanwhile, the system computes the pixel context for each pixel in the image. Then, the class to which a pixel is attributed is decided by referring to its context. When all pixels are processed, the system generates the thresholded shoeprint image. We now go into more detail on these steps, and also give some quantitative evaluation of the thresholding algorithms. 2.1. Global thresholding Global thresholding, as a separate stage in our method, can use any one of the traditional global thresholding methods. We utilize the thresholding algorithm proposed by Otsu in our method. This approach has been one of the most referenced methods in thresholding for its good performance. Otsu (1979) suggested minimizing the weighted sum of within-class variance of the foreground and background pixels to establish an optimum threshold. Another technique we would like to introduce here is proposed by Kittler and Illingworth (1986), which has been concluded as the best thresholding approach, compared with 40 other methods, in different application cases in (Sezgin and Sankur, 2004). For this reason, it will be one of the benchmarking techniques in our comparisons. Given a conditional probability density function p(vji) and a priori probability Pi, there exists a grey level s for which grey levels v satisfy the criterion as follows:  > 0 v 6 s; P 1 pðvj1Þ  P 2 pðvj2Þ ð1Þ < 0 v > s; s is the Bayes minimum error threshold by which the image should be binarized. In Kittler’s approach, the minimum threshold has been calculated in a simple way without estimating the distributional parameters. 2.2. Pixel context model Recently, Buades et al. (2005b) have proposed a new method – non-local mean filtering – to reduce image noise, which takes advantage of the high degree of redundancy in an image (in particular, the observation that each small region has many similar regions in the same image). They define the non-local neighbourhood of a pixel i as the set of pixels j whose local windows ‘look like’ the local window around i. Then all pixels in that neighbourhood can be used for predicting the grey value at i. This algorithm has been shown to be consistent under the assumption that the image is what they call a fairly general stationary random process. In this paper, we extend their concept to give a more general case: the expected result of an operation on a pixel

303

i can be estimated by performing the same operation on all of its reference pixels in the same image. Let V denote the random value at pixel i, and R denote the result of an operation F on V, i.e., R ¼ F ðV Þ:

ð2Þ

Here, R is a random value too. Now let us assume as samples of V the grey values vj at all pixels in the image. To make this assumption reasonable, we assign a credit value, the reference coefficient rj, to each sample vj, which is computed by correlating the two samples vi and vj. Usually, this operation has a variety of physical meanings, such as weight, reference coefficient, and so on. Intuitively, if the sample vj is exactly the same as vi (rj = 1), we say that a sample of the random value V appears more than once. If the sample vj is completely unrelated to vi (rj = 0), it means vj cannot be used as a sample of V. ThenPwe can view rnj , the reference coefficient normalized by j rj , as the approximation to the discrete probability density function ofPV, and rnj satisfies the usual conditions 0 6 rnj 6 1 and j rnj ¼ 1. It is well known that if X is a continuous random variable with the probability density function p(x), and y = y(x) is a non-random relationship between the variables x and y, then the mean value of the random variable Y can be evaluated as follows: Z ^y ¼ yðxÞpðxÞdx: ð3Þ In our discrete case, Eq. (3) turns out to be: X Fb ðV Þ ¼ rnj  F ðV Þ;

ð4Þ

j

where Fb ðV Þ denotes the expected result of an operation F on the random value V at pixel i. If F(V) = V, then Eq. (4) turns out to be the neighbourhood smoothing filter, as follows: X ^vðiÞ ¼ wj  vðjÞ; ð5Þ j

where wj has a similar meaning to rnj . In Eq. (4) F can be a linear or a non-linear operation. If F is a linear operation on V, it will not be hard to obtain the formula: Fb ðV Þ ¼ F ð Vb Þ from Eqs. (4) and (5). This means the expected result of a linear operation on a pixel can be estimated by performing the neighbourhood smoothing first, then doing the same operation on the smoothed image. But, for a non-linear operation, we can only use Eq. (4) for estimating the expectation of the operation F. So far, we have obtained a combined set fðrnj ; vj Þjj 2 J g for each pixel i, where J is the set of the reference pixels. We call this combined set the pixel context of the current pixel. Now let us consider the computation of a value rj in a pixel context. The simplest way to compute rj is to calculate the correlation between the grey values vi and vj, i.e., rj ¼ 2vi  vj =ðv2i þ v2j Þ. Then Eq. (5) turns out to have a

304

H. Su et al. / Pattern Recognition Letters 28 (2007) 301–307

similar form to neighbourhood smoothing in (Yaroslavsky, 1985), which, however, has been shown not to be effective in removing image noise. If we apply the Gaussian weighted correlation between the local windows around i and j as in Eq. (6), then Eq. (5) turns out to be similar to the non-local mean filtering in (Buades et al., 2005b), which has been shown to be effective in removing noise while still preserving image detail. ! C ij rj ¼ exp  2 pffiffiffiffiffiffiffiffiffiffiffiffi ; ð6Þ h C ii C jj where C kn ¼

W X W X l

gl;m  ðvk ðl; mÞ  vk Þðvn ðl; mÞ  vn Þ; k; n ¼ fi; jg

m

ð7Þ and vk ¼

W X W X l

gl;m  vk ðl; mÞ

k ¼ fi; jg;

ð8Þ

m

where gl,m are the Gaussian weights, W is the local window size around pixels i and j, and h, which controls the decay of the exponential function rnj , is obtained by normalizing rj P with the sum Z i ¼ j rj . We must note that directly computing Eqs. (6) and (4) can be computationally prohibitive. A commonly used way to solve this problem is to limit the reference range to a neighbourhood of the pixel i (ji  jj < s). Fig. 3 shows the reference coefficients of three different ‘working pixels’ in a clean shoeprint image and in a noisy image. Pixel (b) is a background pixel in a region of object; pixel (c) is an object pixel in an object region; and pixel (d) is an edge pixel. The reference coefficients are computed by Eq. (6). From the figure, we can see not all reference pixels

have significant coefficients (i.e., close to 1.0, or white in the images). 2.3. Thresholding with the pixel context model In this paper, the operation we perform on a pixel i is a classification, a non-linear operation, and the reference coefficient rj of a pixel j is defined as the probability pj of the pixel i having the same category with the pixel j. If there are C1  N categories for all pixels in an image, the probability P ni of a pixel i being classified into the category Cn can P be calculated by the sum j2n pj . With this definition, we can reclassify all the pixels in an image into N categories. For thresholding, N is 2, i.e., 1-object and 2-background. Then the decision rule will be drawn as follows: ( 1 P 1i P P 2i ; i2 ð9Þ 2 P 1i < P 2i : Eq. (9) means that for a pixel i, if the sum of probabilities pj (where j has been classified as the object) is larger the sum of the probabilities pj (where j has been classified as the background), i will be attributed to the object, otherwise i is background. 2.4. Quantitative evaluation In order to quantitatively evaluate the performance of our method, we apply the misclassification error (ME) as the thresholding measurement, which has been widely used to evaluate the performance of a classifier. ME is defined as the percentage of object (background) pixels wrongly assigned to the background (object). It can be simply expressed as: ME ¼ 1 

jOg \ Ot j þ jBg \ Bt j ; jOg j þ jBg j

ð10Þ

Fig. 3. Some examples of the reference coefficients computed by Eq. (6). In each row, the leftmost image is the test shoeprint image (clean in the first row, noisy in the second); the circles (b), (c) and (d) denote the current working pixels, and the corresponding reference coefficients are displayed in order in the other three images.

H. Su et al. / Pattern Recognition Letters 28 (2007) 301–307

where, Og and Bg denote the sets of the object and background pixels in the ground-truth binary image (Section 3.2 will introduce how to obtain the ground-truth binary image), Ot and Bt denote the sets of the object and background pixels in the thresholded image, and jÆj means the cardinality of the set. It is not hard to deduce that ME varies from 0 to 1. 3. Experiments 3.1. Shoeprint database A forensic shoeprint database is usually purchased from a commercial company such as Foster and Freeman who supply the SOLEMATE database. However, for the purposes of this research, and to eliminate differences due to different capture methods, the investigations described here were mainly carried out on a set of 31 shoeprint images that were collected using the ‘Perfect Shoemark Scan’ technique. Perfect shoe scan equipment, which consists of a chemical drenched sponge and paper that is reactive to that chemical, is a simple and cheap technique for the capture of shoemarks directly from a shoe sole. The images were scanned to greyscale images and stored without compression at a resolution of 256 · 256 pixels. Fig. 4 shows exam-

Fig. 4. Examples of six shoeprint images captured by ‘Perfect Shoemark Scan’.

305

ples of six shoeprint images captured by the system and used in the investigation. 3.2. Comparisons Our proposed thresholding algorithm was tested with the above shoeprint image dataset against two well-known binarization techniques, Otsu’s and Kittler’s, which were presented in Section 2.1. Both of them are based on clustering analysis without any ad-hoc parameters. However, in our method, to compute the probability pj for each reference pixel j, three parameters are introduced: the sigma a of the Gaussian weighted function, the control factor h, and the local window size W. Empirical experiments show that a window of size 5 · 5 or 7 · 7, and the sigma a = [3, 5] of the Gaussian weighted function are robust enough for a variety of noise levels. The control factor h should be taken to be of the order of the noise level (r). However, in our experiments, the value 25 for h proved perfectly adequate for all the tested noise levels (r from 0 to 100). Fig. 6 presents examples showing the visual performance of the proposed algorithm and the benchmarking techniques on shoeprint images with the noise level of r = 60. Ground-truth binary images are shown on the second row. Because ‘Perfect Shoemark Scan’ introduces very little noises on the background of the shoeprint images, the ground-truth binary image can be obtained with either Otsu’s or Kittler’s algorithm at the noise level of r = 0. In our experiments, we choose Otsu’s algorithm, since it is visually more reliable (see Fig. 5). From Fig. 6 below, where the noise level is 60, we can see that Otsu thresholds are very often smaller than expected, so lots of noise is classified as object, while the Kittler thresholds are quite often larger than expected, and many smooth regions in the object are separated by pixels misclassified as background, but some noise in a background area still remains in the object class. So the thresholded images with both traditional well-performed algorithms can hardly be used for extracting meaningful shapes and patterns used in the later stages of shoeprint image indexing. But our proposed pixel context (PC) method achieves thresholded shoeprint images which are visually very similar to the ground-truth binary images.

Fig. 5. (a) Ground-truth binary image obtained with Otsu’s (b) and Kittler’s (c) algorithms at the noise level of r = 0.

306

H. Su et al. / Pattern Recognition Letters 28 (2007) 301–307

Fig. 6. Examples of the thresholded image with our proposed pixel context (PC) algorithm compared with the Otsu and Kittler algorithms. (a) Noisy shoeprint images with the noise level of r = 60. (b) Ground-truth binary images. (c) Thresholded images by Otsu’s algorithm. (d) Thresholded images by Kittler’s algorithm. (e) Thresholded images by the proposed PC method.

The second experiment is on several typical shoeprint images with different noise levels, where r varies from 0 to 100 with an interval of 10. Fig. 7 is the average misclassification errors on the tested shoeprint images with Otsu’s, Kittler’s and the proposed pixel context algorithms. Fig. 7 shows that the proposed PC method performs much better than Otsu’s approach in terms of misclassification error, when the noise level r is larger than 30, and it is also better than Kittler’s algorithm, which has recently been shown to perform better than 40 other approaches published in the literature. We also notice that when the noise level near to zero, there is a large difference between

Otsu’s and Kittler’s approaches. That is because we use Otsu’s method to obtain the ground-truth binary images. Without considering this factor, the three compared methods should have a similar performance at low noise levels. In the last experiment, a scene image Fig. 8(a) from Foster and Freeman Ltd., has been used to test the thresholding method proposed in this paper. In addition to the thresholding, a few of other standard operations (like enhancement, border cleaning, and morphological reconstruction) are also performed to get the final result shown in Fig. 8(b).

H. Su et al. / Pattern Recognition Letters 28 (2007) 301–307 0.35 0.3

Otsu Kittler PC

0.25 0.2 ME 0.15 0.1 0.05 0

0

10

20

30

40 50 60 70 Sigma of noise

80

90

100

Fig. 7. The average misclassification errors on the tested shoeprint images with Otsu’s, Kittler’s and the PC algorithms.

Fig. 8. An example of thresholding a scene image. (a) A scene image; (b) the thresholded result.

4. Summary In this paper, we have extended non-local mean filtering to give a more general case, called the pixel context based model for pixel-based image processing. Then we combined a special case of this model into the thresholding process. For each pixel, the thresholding depends not only on a global threshold, but also on a decision rule, which refers to its reference pixels and coefficients. Here, the global thresholding method can be any one of the traditional global thresholding methods. General speaking, the global

307

threshold should be close to the expected threshold, since it can provide good reference information for the algorithm to make a decision later. The decision rule is based on a sum of reference coefficients, or probabilities, which are obtained from the reference pixels of the current working pixel. We have compared the proposed method visually and quantitatively with two existing techniques, Otsu’s and Kittler’s. The experimental results suggest that the proposed method can provide a more reliable segmentation of an object (shoeprint) from the background of a noisy shoeprint image than the other two techniques. We also notice that although we have applied our technique to noisy shoeprint images, the proposed method should be able to be applied to other image datasets without modification, such as document thresholding and medical image foreground extraction. References Alexander, A.G., Bouradine, A., Crookes, D., 2000. Automatic classification and recognition of shoeprints. Special issue of the information bulletin for shoeprint/toolmark examiners 6 (1), 91–104. Alexandre, G., 1996. Computerized classification of the shoeprints of burglars’ soles. Forensic Sci. Int. 82, 59–65. Buades, A., Coll, B., Morel, J.M., 2004. On image denoising methods, preprint CMLA 2004, 15, Spain. Buades, A., Coll, B., Morel, J.M., 2005a. Neighbourhood filters and PDE’s, preprint CMLA 2005, 18, Spain. Buades, A., Coll, B., Morel, J.M., 2005b. Image denoising by non-local averaging, IEEE ICASSP, PA, USA, March 18–23. Chazal, P.D., Flynn, J., Reilly, R.B., 2005. Automated processing of shoeprint images based on the Fourier transform for use in forensic science. IEEE Trans. Pattern Anal. Mach. Intell. 27 (3), 341–350. Geradts, Z., 2002. Content-based information retrieval from forensic databases, Ph.D. Thesis. The Netherlands Forensic Institute of the Ministry of Justice in Rijswijk, The Netherlands, June. Geradts, Z., Keijzer, J., 1996. The image data REBEZO for shoeprint with developments for automatic classification of shoe outsole designs. Forensic Sci. Int. 82, 21–31. Kindermann, S., Osher, S., Jones, P.W., 2004. Deblurring and denoising of images by non-local functionals, UCLA, Computational and Applied Mathematics Reports, 04-75. Kittler, J., Illingworth, J., 1986. Minimum error thresholding. Pattern Recognit. 19 (1), 41–47. Mahmoudi, M., Sapiro, G., 2005. Fast image and video denoising via nonlocal means of similar neighbourhoods. IEEE Signal Process. Lett. 12 (12). Otsu, N., 1979. A threshold selection method from grey level histograms. IEEE Trans. Systems Man Cybernet. 9 (1), 62–66. Sezgin, M., Sankur, B., 2004. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imag. 13 (1), 146–165. Yaroslavsky, L.P., 1985. Digital picture processing. An introduction. Springer-Verlag.