Unsupervised multiscale segmentation of color images

Unsupervised multiscale segmentation of color images

Pattern Recognition Letters 28 (2007) 523–533 www.elsevier.com/locate/patrec Unsupervised multiscale segmentation of color images Cla´udio Rosito Jun...

2MB Sizes 1 Downloads 146 Views

Pattern Recognition Letters 28 (2007) 523–533 www.elsevier.com/locate/patrec

Unsupervised multiscale segmentation of color images Cla´udio Rosito Jung

*

UNISINOS – Universidade do Vale do Rio dos Sinos, PIPCA – Graduate School of Applied Computing, Av. UNISINOS, 950, Sa˜o Leopoldo 93022-000, RS, Brazil Received 30 September 2005; received in revised form 17 July 2006 Available online 27 November 2006 Communicated by Y.J. Zhang

Abstract This paper proposes a new multiresolution technique for color image representation and segmentation, particularly suited for noisy images. A decimated wavelet transform is initially applied to each color channel of the image, and a multiresolution representation is built up to a selected scale 2J. Color gradient magnitudes are computed at the coarsest scale 2J, and an adaptive threshold is used to remove spurious responses. An initial segmentation is then computed by applying the watershed transform to thresholded magnitudes, and this initial segmentation is projected to finer resolutions using inverse wavelet transforms and contour refinements, until the full resolution 20 is achieved. Finally, a region merging technique is applied to combine adjacent regions with similar colors. Experimental results show that the proposed technique produces results comparable to other state-of-the-art algorithms for natural images, and performs better for noisy images.  2006 Elsevier B.V. All rights reserved. Keywords: Segmentation; Watersheds; Wavelets; Multiresolution; Color images; Region merging

1. Introduction Image segmentation consists of partitioning an image into isolated regions, such that each region shares common properties and represents a different object. Such task is typically the first step in more advanced vision systems, in which object representation and recognition are needed. Although isolating different objects in a scene may be easy for humans, it is still surprisingly difficult for computers. An additional problem arises when dealing with color images, due to the variety of representations (color spaces) that can be used to characterize color similarity. Several authors have tackled the problem of color image segmentation, using a variety of approaches, such as active contours, clustering, wavelets and watersheds, among others. Some of these techniques are revised next.

*

Tel.: +55 51 3591 1122x1626; fax: +55 51 3590 8162. E-mail address: [email protected]

0167-8655/$ - see front matter  2006 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2006.10.001

Sapiro (1997) proposed a framework for object segmentation in vector-valued images, called color snakes. As in the original snakes formulation for monochromatic images, color snakes present the nice property of smooth contours, but require a manual initialization and may face converge problems. Comaniciu and Meer (2002) proposed a unified approach for color image denoising and segmentation based on the mean shift. A kernel in the joint spatial-range domain is used to filter image pixels in the CIELUV color space, and filtered pixels are clustered to obtain segmented objects. Although this technique presents good results, it requires a manual selection of spatial (hs) and color (hr) bandwidths, and optionally a minimum area parameter (M) for region merging. Liapis et al. (2004) proposed a wavelet-based algorithm for image segmentation based on color and texture properties. A multichannel scale/orientation decomposition using wavelet frame analysis is performed for texture feature selection, and histograms in the CIELAB color space are

524

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

used for color feature extraction. Two labelling algorithms are proposed to obtain the final segmentation results based on either or both features. This technique also achieves nice segmentation results for natural complex images, but the number of different color–texture classes must be selected by the user, which may not be easy to define in practical applications. Ma and Manjunath (2000) and Deng and Manjunath (2001) also proposed approaches for image segmentation based on color and texture. In (Ma and Manjunath, 2000), a predictive coding model was created to identify the direction of change in color and texture at each image location at a given scale, and object boundaries are detected where propagated ‘‘edge flows’’ meet. In JSEG (Deng and Manjunath, 2001), a color quantization scheme is initially applied to simplify the image. Then, local windows are used to compute J-images, that return high values near object boundaries and low values in their interior. Finally, a multiscale region growing procedure is applied to obtain the final segmentation. It is important to notice that JSEG is intended to be an unsupervised segmentation method, meaning that it is free of user-defined parameters. Nock and Nielsen (2003, 2004, 2005) proposed fast segmentation techniques based on statistical properties of color images. These approaches take into account expected homogeneity and separability properties of image objects to obtain the final segmentation through region merging. In particular, the techniques described in (Nock and Nielsen, 2003, 2004) are unsupervised and well suited for noisy images, while the method presented in (Nock and Nielsen, 2005) requires some user assistance. Other authors have combined watershed segmentation with multiresolution image representations. Scheunders and Sijbers (2002) used a non-decimated wavelet transform to build a multiscale color edge map, which was filtered by a multivalued anisotropic diffusion. The watershed transform is then applied at each scale, and a hierarchical region merging procedure is applied to connect segmented regions at different scales. Despite the denoising power of both wavelet transform and anisotropic diffusion, the experimental results presented in their paper indicate a considerably large number of segmented regions. Vanhamel et al. (2003) also explored multiscale image representations and watersheds for color image segmentation. In their approach, the scale-space is based on a vector-valued diffusion scheme, and color gradients in the YUV color space are computed at each scale. After applying the watershed transform, the dynamics of contours in scale-space are used to validate detected contours. Results presented in the paper are visually pleasant, but noisy images were not tested. Kazanov (2004) proposed a multiscale watershedbased approach for detecting both small and large objects, focused on scanned pages of color magazines. For detecting small objects, a small-support edge detector is used. For larger objects, a multiscale version of the gradient estimator is computed. One potential problem of this technique is its high sensitivity to noise/texture, due to the use of small-support edge detectors.

Although this bibliographical revision was mostly focused on techniques involving wavelets, watersheds or multiresolution analysis, there are several other recent competitive approaches for color image segmentation, such as Makrogiamis et al. (2003), Chen et al. (2004), Nikolaev and Nikolayev (2004), Marfil et al. (2004), Navon et al. (2005). Also, it can be noticed that most authors have not considered the problem of noisy color image segmentation, specially for large amounts of noise contamination. This work extends the procedure proposed in (Jung, 2003) for multiscale segmentation of color images with several improvements, such as the formulation of a statistical model for gradient magnitudes of color images using joint information of color channels, the automatic thresholding of color gradient magnitudes based on a posteriori probabilities, and the inclusion of a similarity metric in the CIELAB color space for region merging, based on perceived contrast between colors. It should be noticed that the approach in (Jung, 2003) can only be applied to monochromatic images. In the proposed approach, a decimated wavelet transform (WT) is initially applied to each color channel, producing a multiresolution image representation up to a selected scale 2J. A color gradient magnitude image is computed at the coarsest resolution 2J, and an adaptive threshold is used to remove spurious responses. The watershed transform is applied to thresholded magnitudes, obtaining an initial segmentation. The inverse wavelet transform (IWT) is then used to project this initial segmentation to finer scales, until the full resolution image is achieved. Finally, a region merging procedure based on CIELAB color distances is applied to obtain the final segmentation. As it will be discussed along this manuscript, the denoising power of the WT combined with the probabilistic edge estimator effectively reduces the well-known oversegmentation problem of the watershed transform, even for images with significant noise contamination. Also, the same set of default parameters produces good results for most images, indicating that the proposed technique can be used for unsupervised image segmentation. The remainder of this paper is organized as follows. Section 2 provides a very brief revision on wavelets and watersheds. The proposed method is described in Section 3, and experimental results are provided in Section 4. Finally, conclusions are drawn in the final section. 2. The wavelet transform and the watershed transform 2.1. Decimated wavelet transforms In a few words, the WT of an intensity image up to the scale 2J is a set of detail subimages W h2j , W v2j ; W d2j , for j = 1, . . ., J, and a smoothed and downsampled version A2J of the original image, called in this work the approximation image. According to Mallat’s pyramid algorithm (Mallat, 1989), such subimages can be obtained by a combination of convolutions with low-pass and high-pass

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

filters associated with a mother wavelet followed by downsamplings. It is important to notice that the original image can be reconstructed using images fW h2j ; W v2j ; W d2j gfj¼1;2;...;J g and A2J . For color images, such multiscale representation can be obtained for each color channel (the RGB color space was used in this work). Although there is a great variety of mother wavelets to choose from, the well-known Haar wavelet (Strang and Nguyen, 1996) was selected in this work, mainly due to its small support (hence providing good localization in space). Also, it requires small computational complexity (linear with respect to the size of the input image) to compute the wavelet decomposition with the Haar wavelet. The importance of using a small support wavelet basis will be more evident in Section 3.2. 2.2. The watershed transform A powerful tool for image processing based on mathematical morphology is the watershed transform (Meyer and Beucher, 1990). In particular, watersheds are useful to obtain image partitions, and they are attractive solutions for image segmentation when applied to edge maps (grayscale or color). Such edge maps may be regarded as topographical relieves, that are flooded up starting at local minima. When water coming from different regions meet a ‘‘dam’’ is built. At the end of flooding, the watershed transform produces a set of connected regions (segmented objects) separated by dams (object contours). A well-known problem associated with watersheds is oversegmentation. Images typically contain texture and/ or noise, that produce spurious gradients (and also local minima), generating a large number of segmented objects. There several approaches to reduce the oversegmentation problem of watersheds, such as markers (Vincent and Soille, 1991), statistical methods (Gies and Bernard, 2004) and fuzzy techniques (Patino, 2005), just to name a few. In this work, oversegmentation is reduced by removing small gradient magnitudes according to an adaptive threshold. 3. The proposed algorithm The proposed algorithm can be summarized into the following steps: (1) select a desired scale 2J, and compute the WT of each color channel up to this scale; (2) compute and threshold the color gradient magnitude of approximation image at the coarsest resolution; (3) apply the watershed transform to thresholded magnitudes, and obtain an image representation at scale 2J; (4) project the image representation to the full resolution 20, using the IWT; (5) apply a contrast-based region merging technique. These steps are detailed next.

525

3.1. Thresholded color magnitudes ~ vj ; W ~ dj g ~ hj ; W and ~ A2J denote the multiresLet fW 2 2 2 fj¼1;2;...;J g olution wavelet decomposition of the original color image ~ I into J scales, where ~ ~ s j ¼ ðW s;R A2J ¼ ðAR2J ; AG2J ; AB2J Þ; W ; W s;G ; W s;B Þ; s 2 fh; v; dg 2 2j 2j 2j

ð1Þ

represent the approximation and detail images corresponding to each individual color channel R, G or B. If Dcv and Dch denote the vertical and horizontal differences of the Prewitt operator applied to images Ac2J , then a color gradient magnitude at the coarsest scale 2J is obtained through sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 2 2 M 2J ¼ ðDcv Þ þ ðDch Þ : ð2Þ c2fR;G;Bg

Although images Ac2J were denoised by the low-pass filtering involved in the WT, they still contain residual noise. This residual noise typically produces small magnitude values in M 2J that are responsible for oversegmentation during the watershed transform. To remove such small gradients, we apply an adaptive threshold based on the expected distribution of noise-related and edge-related magnitudes, similarly to the procedure adopted in (Henstock and Chelberg, 1996). If pn(r) and pe(r) are the distributions of gradient magnitudes associated with noise and edge magnitudes, respectively, then pðrÞ ¼ wn pn ðrÞ þ ð1  wn Þpe ðrÞ

ð3Þ

provides an adequate fit for the histogram of magnitudes (here, wn represents the a priori probability of distribution pn). According to Bayes’ rule, a certain pixel with gradient magnitude r has the following posterior probability of being an edge: pðedgejrÞ ¼

ð1  wn Þpe ðrÞ : pðrÞ

ð4Þ

Given a certain probability threshold 0 6 P 6 1, such pixel is qualified as an edge if p(edgejr) P P. For grayscale images, Henstock and Chelberg (1996) assumed Gaussian distributions for horizontal and vertical differences of both edge and noise pixels, used gamma probability density functions to model both pn(r) and pe(r), and chose P = 0.5 as the optimal threshold. The Gaussian assumption is widely accepted to model local differences of noise-related pixels. On the other hand, local intensity differences of natural images are not normally distributed, mostly due to a sharp peak near the origin related to homogeneous regions (Srivastava et al., 2003). However, Scharcanski et al. (2002) showed that horizontal and vertical intensity differences related exclusively to edges (disregarding homogeneous regions) can be approximated through a normal distribution. Hence, we assume that local differences for both noise and edges are normally distributed in each color channel.

526

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

Also, it is known that, if X1, X2, . . ., Xm are m independent identically-distributed (i.i.d.) random variables with qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi normal distribution, then Y ¼ X 21 þ X 22 þ    X 2m is a random variable with a chi distribution. It should be noticed that intensity differences Dcv and Dch for edge-related coefficients present some correlation in most natural images, but we assume the simplification of statistical independence to obtain a simple and closed-form expression for pn(r) and pe(r) based on chi distributions:1 pn ðrÞ ¼

1 5 r2 =2r2n re ; 8r6n

pe ðrÞ ¼

1 5 r2 =2r2e re ; 8r6e

ð5Þ

where r2n is the variance of noisy coefficients and r2e is the variance of coefficients related to edges. Parameters rn, re and wn required in Eq. (3) are estimated by ML (maximum-likelihood) (Casella and Berger, 1990), and the thresholded magnitude image M t2J is given by  0; if pðedgejM 2J ðx; yÞÞ < P ; t M 2J ðx; yÞ ¼ ð6Þ M 2J ðx; yÞ; if pðedgejM 2J ðx; yÞÞ P P : The watershed transform of M t2J is then computed, and the initial segmentation at the coarsest resolution 2J is obtained. Larger values of P result in less spurious edges, but may also remove relevant small contrast edges, producing contour gaps (and leakage during watersheds). On the other hand, smaller values of P result in less noise removal and also less contour gaps (increasing oversegmentation). In this work, we selected P = 0.25, aiming to keep small contrast edges. Although there is a slight increase in the number of segmented regions if compared to the more intuitive choice P = 0.5, this problem can be solved using a post-processing algorithm based on region merging, as discussed in Section 3.3. To illustrate our thresholding procedure, let us consider the 256 · 256 house image and its noisy version (PSNR = 16.90 dB)2 shown in Fig. 1. Fig. 2(a) and (b) illustrate color gradient magnitudes for the original house image at scale 21 before and after thresholding, and Fig. 2(c) and (d) show magnitudes at scale 22. Fig. 2(e)– (h) illustrate analogous results for the noisy version. As it can be observed, our thresholding procedure did not destroy any relevant edge in the ‘‘clean’’ image, and it was able to reduce spurious responses in the noisy version. It can also be noticed that noise is smoothed out as the scale 2J gets larger, due to low-pass filtering involved in the wavelet decomposition. Fig. 3 illustrates how oversegmentation is attenuated by the proposed thresholding technique. Watershed lines corresponding to the noisy magnitude images shown in Fig. 2(e)–(h) are illustrated in Fig. 3(a)–(d). It can be 1

It should be noticed that more generic distributions to model edgerelated magnitudes (such as gamma) were tested, with no significant differences if compared to the chi distribution. 2 The original image can be found at the USC-SIPI Image Database (http://sipi.usc.edu/database).

Fig. 1. (a) Original house image. (b) Noisy house image.

noticed that applying the watershed transform to raw color magnitudes results in several small segmented regions, as expected. On the other hand, the proposed thresholding procedure can reduce significantly the number of segmented regions at scales 21 and 22. 3.2. Initial segmentation and projection to finer resolutions The output of the previous step is an initial segmentation at scale 2J, consisting of a set of labelled connected components separated by watershed lines. It should be noticed that the size of the segmented image is about 2J of the size of the original image, due to the downsampling used in the WT. Now, let us consider a representation ~ R2J of the approximation image ~ A2J , obtained by replacing each labelled region of the watershed transform with the color average of corresponding region in ~ A2J , and setting ~ R2J ¼ ~ 0 at watershed lines. As noticed by Kim and Kim (2003), a simple pixel duplication procedure to restore the full resolution image produces an unpleasing blocking effect, and edge definition is lost. However, such problem can be overcame if we compute inverse wavelet transforms, similarly to the approach adopted in (Jung, 2003). In order to project ~ R2J into adjacent scale 2J1, detail h;c v;c coefficients W 2J , W 2J and W d;c are needed, for 2J c 2 {R, G, B}. To keep a piecewise constant representation of the approximation image ~ A2J , we use only detail coefficients related to region borders. In fact, due to the good localization property of the WT (particularly, the very small support of Haar wavelets), we can determine exactly the region of influence of each wavelet coefficient during the IWT. Let us consider the updated wavelet coefficients (

Ac2J ðx; yÞ; if ðx; yÞ belongs to region border; Rc2J ðx; yÞ; otherwise;  s;c W 2J ðx; yÞ; if ðx; yÞ belongs to region border; UW s;c ðx; yÞ ¼ 2J 0; otherwise; UAc2J ðx;yÞ ¼

ð7Þ ð8Þ

for s 2 {h, v, d}, c 2 {R, G, B}. The IWT is applied to UAc2J and UW s;c for each color channel, generating a new set of 2J higher resolution images S c2J1 . To illustrate this procedure, let us consider the watershed segmentation of the noisy house image produced at scale 22, shown in Fig. 3(d).

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

527

Fig. 2. Color gradient magnitude images before and after thresholding for the house image at different scales. (a)–(d) Original image. (e)–(h) Noisy version.

~ R2J1 ðxl ; y l Þ; R2J 1 ðx; yÞ ¼ ~

Fig. 3. Watershed lines corresponding to the noisy magnitude images shown in Fig. 2(e)–(h), before and after thresholding.

The corresponding piece-wise constant representation image ~ R2J (with J = 2) is shown in Fig. 4(a), and the updated ! approximation image UA 2J is illustrated in Fig. 4(b). The result of the IWT is shown in Fig. 4(c), and it can be observed that image ~ S 2J1 is almost piece-wise constant: each plateau of ~ R2J was mapped to a plateau in ~ S 2J1 , but updated detail coefficients generate pixel fluctuations between adjacent plateaus when the IWT is computed. Such pixels are responsible for reconstructing contour definition of segmented regions, and are called ‘‘fuzzy watershed pixels’’. The following step is to assign these ‘‘fuzzy pixels’’ to a neighboring homogeneous region. Let us consider a fuzzy pixel (x, y) and its 8-connectedness neighborhood. Some of these neighboring pixels belong to existing homogeneous regions, while others may be also fuzzy pixels. Let Nh(x, y) 6 8 denote the number of pixels adjacent to (x, y) that effectively belong to existing homogeneous regions, and let (xk, yk) denote the positions of such pixels, for k = 1, . . ., Nh(x, y). The color dissimilarity dk between fuzzy pixel (x, y) and its neighbor (xk, yk) is computed through   dk ¼ D ~ R2J1 ðx; yÞ; ~ R2J 1 ðxk ; y k Þ ; k ¼ 1; . . . ; N h ðx; yÞ; ð9Þ where D(Æ,Æ) represents the distance between two colors according to a given vector norm. Then, (x, y) is assigned to the region for which the value dk is the smallest. In other words, the value of ~ R2J1 at position (x, y) is redefined as

where l ¼ argmin d k :

ð10Þ

k

When computing dk in Eq. (9), there are several choices for the distance measure D(Æ,Æ), as well as several color space representations. In particular, the CIELAB color space arises as a natural candidate, since it is roughly uniform and color dissimilarity can be approximated using the L2 vector norm (Euclidean distance), as noted in (Ohta et al., 1980). However, the L2 norm in CIELAB was tested, and produced inferior results if compared to the L2 norm in RGB (object contours using CIELAB presented more jaggedness than contours obtained with RGB). A possible explanation for this behavior is that noise contamination can affect significantly the chromaticity component in the RGB ! CIELAB transformation, leading to erroneous color matching. An additional advantage of using RGB is that no color space transformations are performed, reducing the computational burden. Yet another possibility would be to employ the L1 norm (sum of absolute differences) in RGB, since it is faster to compute than the L2 norm. The L1 norm in RGB was also tested, but the computational gain was not significant in the proposed approach, and quantitative results (in terms of PSNR of segmented images) using the L1 norm were slightly inferior to results obtained with the L2 norm. Hence, the L2 vector norm in RGB coordinates was chosen to compute color dissimilarity, as in the segmentation procedure described in (Fuh et al., 2000). By the end of this step, a piece-wise constant representation ~ R2J 1 at scale 2J1 is obtained, as shown in Fig. 4(d). Since ~ R2J 1 is formed by homogeneous regions, it is trivial to find region borders and obtain updated wavelet coeffi! ! cients UA 2J 1 and UW s2J1 , according to Eqs. (7) and (8). The IWT is computed, fuzzy pixels are resolved, and this process is repeated until full resolution is restored (i.e., until scale 20 is reached). The final segmentation result for the noisy house image (i.e. image ~ R20 ) obtained using 22 as the initial scale is depicted in Fig. 5(a). It can be observed that all relevant structures were preserved, and only 64 regions were segmented. 3.3. Region merging

2

Fig. 4. (a) Initial segmentation result of the noisy house image at scale 2 . ! ! (b) Updated approximation image UA 22 . (c) Projection of UA 22 to scale 1 2 . (d) Piece-wise constant representation image ~ R21 at scale 21.

The algorithm described so far produces a partition of the original image into connected regions with uniform colors. Although the adaptive magnitude thresholding procedure indeed reduces oversegmentation, adjacent regions

528

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

Fig. 5. Segmentation result for the noisy house image starting at scale 22. (a) Without post-processing (64 regions). (b) With post-processing (28 regions).

with similar colors may persist, and a region merging algorithm can be applied. We adopt an approach similar to Cheng and Sun’s (2000), as described next. Let Ns denote the number of segmented regions obtained by the algorithm described in the previous section. For each pair of adjacent regions Bi and Bj with CIELAB color vectors ~ ci and ~ cj , the color difference measure d(i, j) is computed using the L2 norm   dði; jÞ ¼ DEab ¼ ~ ci ~ cj  2 : ð11Þ It should be noticed that the representative color of each segmented region was obtained by averaging colors in corresponding regions of image ~ A2J , as described in Section 3.2. This averaging process reduces significantly noise influence, and the undesired chromaticity distortion observed when assigning fuzzy pixels in CIELAB does not happen at this stage. Hence, it is convenient to use the CIELAB color space for region merging instead of the RGB color space. Furthermore, only Ns color transformations are required, resulting in practically no increase in the computational burden of the proposed method. The first step of the region merging algorithm is to locate the pair of adjacent regions having the smallest color difference. If this difference is smaller than a certain color threshold Tc, the two regions will be merged. The color of the new region is computed again and the mean value of the colors is assigned to the pixels within this region. The new global minimum of d(i, j) is then computed, and the procedure is repeated until all adjacent regions reach the minimum contrast Tc. In (Cheng and Sun, 2000), the color threshold is the difference of the mean and standard deviation of all distances d(i, j), i.e. Tc = l  r. Hence, the threshold is highly dependent on the distribution of colors obtained in the segmentation procedure. In particular, perceptually different adjacent regions may be erroneously merged if the original image contains a large variety of different colors, since the adaptive threshold Tc may be relatively large. For example, the adaptive threshold Tc = l  r is approximately 31 for the noisy synthetic image shown in the bottom row of Fig. 8, which leads to erroneous region merging.

On the other hand, it is known that the CIELAB color space is roughly uniform, and a difference value d(i, j) around 2.3 corresponds to a JND3 (just noticeable difference) (Mahy et al., 1994). Also, color differences around 5 units represent an accuracy acceptable for high-quality color reproduction (Berns, 2001). In this work, the selected default threshold is twice the acceptable value for highquality color reproduction, i.e. Tc = 10. It should be noticed that smaller values of Tc lead to less region merging (hence, more segmented regions), while larger values of Tc lead to opposite results. The result of the region merging procedure applied to Fig. 5(a) is shown in Fig. 5(b). Low-contrast regions were merged, and the number of segmented regions dropped from 64 to 28 after the post-processing. 3.4. Selection of the scale 2J In the procedure described so far, an initial scale 2J must be provided for the initial segmentation using watersheds. The selection of 2J depends on the detail resolution desired by the user (which is usually application-dependent), the amount of noise contamination and the dimensions of the original image. When a small value of J is selected, the proposed technique tends to retrieve a larger amount of segmented regions, being able to capture small image details. As J increases, the segmentation procedure tends to produce a smaller number (and larger) regions, providing a coarser image representation. The influence of initial scale 2J for the original and noisy house images shown in Fig. 1 is illustrated in Fig. 6. Segmentation results starting at scales 21, 22 and 23 for the original house image are shown in the first row, and analogous results for the noisy version are illustrated in the second row. As it can be observed, finer resolutions produce good results for the clean image, and even small details and texture changes are detected. However, a coarser initial scale 2J is required to reduce noise influence and avoid broken contours in the noisy version. Visual inspection indicates that scale 22 provides nice results for both original and noisy versions, keeping relevant structures. Although the selection of J is application- and imagedependent, it is important to provide a default value to achieve unsupervised segmentation. A subjective evaluation indicated that a good compromise between noise removal and detail preservation was obtained using J = 1 for 128 · 128 (or smaller) images, J = 2 for 256 · 256 images, and J = 3 for 512 · 512 images (i.e. the initial scale appears to be proportional to the logarithm of image dimensions). Hence, for a color image with dimensions n · m, the default value for parameter J is   

minfn; mg J ¼ max 1; 1 þ rnd log2 ; ð12Þ 128 3 One JND is an estimate on the minimum distance that leads to a perceptual color difference.

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

529

Fig. 6. Segmentation results for the original house image and its noisy version, using different initial scales 2J.

where rnd(x) rounds x to the nearest integer. In this work, images up to 512 · 512 pixels were tested, leading to a maximum wavelet scale J = 3. However, there is no image size limitation for the proposed segmentation method. 4. Experimental results This section presents results of the proposed technique (called Waveseg) applied to several images containing both natural and artificial noise. We also compared Waveseg with three state-of-the-art segmentation methods, namely Mean-shift (Comaniciu and Meer, 2002), JSEG (Deng and Manjunath, 2001) and SRM (Nock and Nielsen, 2004). For Mean-shift, we manually selected required parameters hs, hr and M to obtain good visual results, following the guidelines provided in (Comaniciu and Meer, 2002). In particular, Mean-shift seemed to be an attractive technique for segmenting noisy images, since the method can also be used for image denoising. Segmentation using JSEG involves three parameters: a threshold for color quantization qJ, the number of scales lJ, and a threshold for region merging mJ. We used default values as described in (Deng and Manjunath, 2001) for all images, unless stated otherwise. For SRM, we used default parameters indicated in (Nock and Nielsen, 2004). We ran Waveseg on the unsupervised mode, meaning that J was selected using Eq. (12), and the region merging threshold was fixed Tc = 10. In the first experiment, we applied JSEG, Mean-shift and SRM to the original and noisy versions of the house image, and results are shown in Fig. 7. As it can be observed, JSEG is efficient for segmenting the original image (although the window was missed). However, the default set of parameters for JSEG returned no segmented region at all for the noisy version, and the region merging

parameter mJ had to be fine-tuned to 0.1 to obtain the corresponding result displayed in Fig. 7. It can also be noticed that Mean-shift produced an excellent result for the original image (using hs = 8, hr = 12 and M = 100). However, the same set of parameters resulted in a very bad segmentation result for the noisy version, and they had to be re-adjusted (hs = 12, hr = 21 and M = 300) in order to get a barely acceptable segmentation, as illustrated in Fig. 7 (center-bottom). SRM also missed part of the window in the ‘‘clean’’ version, and segmented all relevant objects in the noisy version (however, the lower rooftop was blended with the bricks). A visual comparison of these results with the proposed approach using default values (J = 2, central image in Fig. 6) indicates that Waveseg produces a better segmentation for the noisy image. Waveseg result for the original image seems superior to segmentation with JSEG and SRM, and maybe slightly inferior to Mean-shift. For a quantitative analysis of noisy images, we computed the PSNR4 of segmentation results and the number of segmented regions. Other examples of noisy image segmentation are illustrated in Fig. 8. The first column illustrates a noisy version (PSNR = 13.31 dB) of the 512 · 512 peppers image and a 330 · 330 noisy synthetic image containing 144 squares based on the MacBeth color chart (PSNR = 15.65 dB). The following columns show segmentation results using Waveseg, JSEG, Mean-shift and SRM, respectively. For the peppers image, JSEG’s default parameters returned no segmented region at all (as happened with the noisy house image), and we had to set mJ = 0.05 to obtain the

4 Region contours were ignored when computing the PSNR of segmented images.

530

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

Fig. 7. From left to right: segmentation results for the original house image using JSEG, Mean-shift and SRM. Top row: original house image. Bottom row: noisy house image.

Fig. 8. Segmentation results for noisy images using Waveseg (second column), JSEG (third column), Mean-shift (fourth column) and SRM (last column).

result shown in Fig. 8(c). For Mean-shift, we used the set of parameters (hs, hr, M) = (16, 27, 10,000) to obtain the segmented image. Visual inspection indicates that JSEG produced a very coarse image representation, and Mean-shift returned very noisy contours. The proposed technique had a performance similar to SRM, presenting smooth contours and more details than JSEG and Mean-shift. However, a contour gap at the bottom-left portion of the central green pepper resulted in leakage during watershed, leading to an erroneous contour. For the MacBeth image, JSEG was applied with mJ = 0.05, since default values resulted in just a few segmented squares. For Mean-shift, we used (hs, hr, M) = (20, 22, 150). It can be observed that all four methods failed to detect lighter grey squares, due to low-contrast differences. However, Waveseg was able to segment most of the squares (126 out of 144), returned

the largest PSNR, produced better defined contours (specially for the smallest squares) than its competitors, and did not produce spurious regions. The MacBeth synthetic image shown in the bottom row of Fig. 8 is particularly useful for quantitative comparisons, since it is easy to obtain ground truth segmentation results (opposed to most natural images). Several noisy versions of this synthetic images were produced with increasing amounts of additive Gaussian noise, and segmentation results obtained with Waveseg, JSEG and SRM were compared (Mean-shift was not included, since it requires too many parameter adjustments). A quantitative analysis of these techniques in terms of PSNR and number of segmented regions is summarized in Table 1. As noise contamination increases, the number of segmented regions for all methods decrease, as well as the

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

531

Table 1 PSNR and number of segmented regions for noisy versions of the MacBeth image, using Waveseg, JSEG and SRM Noisy

1 dB

28.92 dB

23.07 dB

19.75 dB

17.45 dB

15.73 dB

14.37 dB

13.28 dB

12.36 dB

Waveseg

31.14 dB 133 reg.

31.10 dB 133 reg.

30.17 dB 133 reg.

28.92 dB 133 reg.

27.58 dB 128 reg.

26.03 dB 127 reg.

24.21 dB 123 reg.

22.72 dB 121 reg.

20.71 dB 119 reg.

JSEG

24.14 dB 114 reg.

28.74 dB 123 reg.

28.82 dB 127 reg.

26.27 dB 114 reg.

15.32 dB 47 reg.

11.23 dB 11 reg.

10.28 dB 5 reg.

10.39 dB 3 reg.

10.23 dB 3 reg.

SRM

24.49 dB 89 reg.

26.25 dB 94 reg.

26.39 dB 95 reg.

25.72 dB 94 reg.

23.92 dB 90 reg.

22.55 dB 84 reg.

22.06 dB 83 reg.

20.89 dB 87 reg.

20.07 dB 80 reg.

resulting PSNR. However, Waveseg presented the closest number of segmented regions to ground truth (144 regions), and also produced the highest PSNR for all levels of noise corruption. We also applied our technique to several 321 · 481 natural images contained in the Berkeley dataset (Martin et al., 2001), and compared results with JSEG, Mean-shift, SRM and manual delineations provided with the database (darker contours were marked by more human subjects).

An overall visual inspection indicates that all algorithms achieve a similar performance, being able to detect roughly the same relevant objects marked by human subjects, as shown in Fig. 9. They also present similar problems, such as the misdetection of the giraffes’ legs in the third image. It is also interesting to notice that no manual delineation includes clouds in the first and third images, but clouds are strongly marked as relevant objects in the fourth image.

Fig. 9. Segmentation of several natural images. Top row: original images. Second row: segmentation using the proposed method. Third row: segmentation using JSEG. Fourth row: segmentation using Mean-shift. Fifth row: segmentation using SRM. Last row: manual contour delineation.

532

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533

Table 2 Running times for Waveseg, Waveseg without region merging, JSEG, Mean-shift and SRM

Waveseg (s) Waveseg (w/o) (s) JSEG (s) Mean-shift (s) SRM (s)

House

Noisy house

Peppers

MacBeth

Horses

Landscape

Giraffes

Pyramids

6.0 3.8 4.35 1.7 <1

5.4 3.4 8.3 16.7 <1

31.6 16.4 46.1 130.3 <2

13.9 7.4 20.1 85.6 <1

12.2 6.9 20.7 20.7 <1

13.9 7.9 16.5 27.5 <1

16.2 8.4 9.6 18.9 <1

20.1 10.5 16.3 18.3 <1

All results regarding Waveseg were obtained through a MATLAB implementation of the proposed algorithm (available for download at http://www.inf.unisinos.br/ ~crjung/research.htm), while results of JSEG, Mean-shift and SRM were obtained using C/C++ implementations provided by the authors. Running times on a portable PC computer with a Centrino 1.7 GHz processor are provided in Table 2. Although implementations in MATLAB tend to be slower than C/C++, it can be observed that running times for Waveseg are similar (or smaller) than JSEG and Mean-shift, but larger than SRM. In particular, the most time consuming parts of our MATLAB implementation are the assignment of fuzzy pixels (since a loop is used, and it is known that loops generate bottlenecks in MATLAB programming), and the region merging procedure (due to the non-optimized data structure used to find and merge adjacent regions). For sakes of comparison, running times for Waveseg without the region merging procedure were also included in Table 2. It is expected that an optimized implementation of Waveseg in a compiled language would reduce significantly execution time. 5. Discussion and conclusions In this work, a multiresolution color segmentation algorithm was presented. The WT is applied up to a selected scale 2J, and color gradient magnitudes are computed at the coarsest resolution. An adaptive threshold based on statistical properties of gradient magnitudes is estimated, and watersheds are applied to obtain an initial segmentation at the coarsest resolution 2J. The initial segmentation is then projected to finer resolutions using the IWT, until the final segmentation at scale 20 is obtained. Finally, a region-merging post-processing step is applied to blend regions with similar colors. The proposed method involves the selection of two parameters: the number of dyadic scales J and the threshold for color region merging Tc. However, default values were used in all comparisons with competitive techniques, since we computed J automatically according to Eq. (12), and we fixed Tc = 10. It is clear that even better results can be achieved by fine-tuning J and Tc for each individual image, but our intent was to show that very good results can be obtained without user intervention. Indeed, experimental results indicated that Waveseg produced results comparable to other state-of-the-art techniques for natural images, and superior results for noisy images

(in particular, the unsupervised version of JSEG failed to segment most noisy images, needing manual parameter adjustments). Furthermore, it should be noticed that the fine-tuning of J and Tc is not only image-dependent, but also context-dependent (since the ultimate evaluation of a segmentation technique depends on the desired application). Although the procedure described in this paper was intended for color image segmentation, the extension for multivalued images (such as some types of medical images and multispectral satellite images) is straightforward. The only required changes are (i) to increase the number of Gaussian random variables used to deduce Eq. (5); (ii) to adapt the similarity metric of the region merging algorithm. Future work will concentrate on combining segmentation results using different initial scales 2J, and improving the region merging procedure. Acknowledgement This work was developed in collaboration with HP Brazil R&D and Brazilian research agency CNPq. The author would like to thank the anonymous reviewers, for their valuable contributions. References Berns, R.S., 2001. The science of digitizing paintings for color-accurate image archives: A review. J. Imaging Sci. Technol. 45 (4), 305–325. Casella, G., Berger, R.L., 1990. Statistical Inference. Wadsworth and Brooks/Cole, Pacific Grove. Chen, J., Pappas, T., Mojsilovic, A., Rogowitz, B., 2004. Perceptuallytuned multiscale color–texture segmentation. In: Internat. Conf. Image Process., pp. II: 921–924. Cheng, H., Sun, Y., 2000. A hierarchical approach to color image segmentation using homogeneity. IEEE Trans. Image Process. 9 (12), 2071–2082. Comaniciu, D., Meer, P., 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Machine Intell. 24 (5), 603–619. Available from: . Deng, Y., Manjunath, B.S., 2001. Unsupervised segmentation of color– texture regions in images and video. IEEE Trans. Pattern Anal. Machine Intell. 23 (8), 800–810. Available from: . Fuh, C., Cho, S., Essig, K., 2000. Hierarchical color image region segmentation for content-based image retrieval system. IEEE Trans. Image Process. 9 (1), 156–162. Gies, V., Bernard, T., 2004. Statistical solution to watershed oversegmentation. In: Internat. Conf. Image Process., pp. III: 1863– 1866.

C.R. Jung / Pattern Recognition Letters 28 (2007) 523–533 Henstock, P., Chelberg, D.M., 1996. Automatic gradient threshold determination for edge detection. IEEE Trans. Image Process. 5 (5), 784–787. Jung, C.R., 2003. Multiscale image segmentation using wavelets and watersheds. In: Proc. SIBGRAPI. IEEE Press, Sa˜o Carlos, SP, pp. 278–284. Kazanov, M., 2004. A new color image segmentation algorithm based on watershed transformation. In: Internat. Conf. Pattern Recognition, pp. II: 590–593. Kim, J.B., Kim, H.J., 2003. Multiresolution-based watersheds for efficient image segmentation. Pattern Recognition Lett. 24, 473–488. Liapis, S., Sifakis, E., Tziritas, G., 2004. Colour and texture segmentation using wavelet frame analysis, deterministic relaxation, and fast marching algorithms. J. Visual Comm. Image Represent. 15 (1), 1– 26. Ma, W.Y., Manjunath, B.S., 2000. Edge flow: A technique for boundar detection and image segmentation. IEEE Trans. Image Process. 9 (8), 1375–1388. Mahy, M., van Eyeken, L., Oosterliuck, A., 1994. Evaluation of uniform color spaces developed after the adoption of CIELAB and CIELUV. Color Res. Appl. 19 (2), 105–121. Makrogiamis, S., Theouaratos, C., Economou, G., Folopoulos, S., 2003. Color image segmentation using multiscale fuzzy c-means and graph theoretic merging. In: Internat. Conf. Image Process., pp. I: 985– 988. Mallat, S.G., 1989. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell. 11 (7), 674–693. Marfil, R., Rodriguez, J., Bandera, A., Sandoval, F., 2004. Bounded irregular pyramid: A new structure for color image segmentation. Pattern Recognition 37 (3), 623–626. Martin, D., Fowlkes, C., Tal, D., Malik, J., 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. 8th Internat. Conf. Comput. Vision., vol. 2, pp. 416–423. Available from: .

533

Meyer, F., Beucher, S., 1990. Morphological segmentation. J. Visual Comm. Image Represent. 1, 21–46. Navon, E., Miller, O., Averbuch, A., 2005. Color image segmentation based on adaptive local thresholds. Image Vision Comput. 23 (1), 69–85. Nikolaev, D., Nikolayev, P., 2004. Linear color segmentation and its implementation. Computer Vision and Image Understanding 94 (1–3), 115–139. Nock, R., Nielsen, F., 2003. On region merging: The statistical soundness of fast sorting, with applications. In: Conf. Comput. Vision Pattern Recognition, pp. II: 19–26. Nock, R., Nielsen, F., 2004. Statistical region merging. IEEE Trans. Pattern Anal. Machine Intell. 26 (11), 1452–1458. Nock, R., Nielsen, F., 2005. Semi-supervised statistical region refinement for color image segmentation. Pattern Recognition 38 (6), 835–846. Ohta, Y., Kanade, T., Sakai, T., 1980. Color information for region segmentation. Comput. Graph. Image Process. 13 (3), 222–241. Patino, L., 2005. Fuzzy relations applied to minimize over segmentation in watershed algorithms. Pattern Recognition Lett. 26 (6), 819–828. Sapiro, G., 1997. Color snakes. Computer Vision and Image Understanding 68 (2), 247–253. Scharcanski, J., Jung, C.R., Clarke, R.T., 2002. Adaptive image denoising using scale and space consistency. IEEE Trans. Image Process. 11 (9), 1092–1101. Scheunders, P., Sijbers, J., 2002. Multiscale watershed segmentation of multivalued images. In: Internat. Conf. Pattern Recognition, pp. III: 855–858. Srivastava, A., Lee, A.B., Simoncelli, E.P., Zhu, S.C., 2003. On advances in statistical modeling of natural images. J. Math. Imaging Vision 18 (1), 17–33. Strang, G., Nguyen, T., 1996. Wavelets and Filter Banks. WellesleyCambridge Press. Vanhamel, I., Pratikakis, I., Sahli, H., 2003. Multiscale gradient watersheds of color images. IEEE Trans. Image Process. 12 (6), 617–626. Vincent, L., Soille, P., 1991. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Machine Intell. 13 (6), 583–598.