Gradient-based polyhedral segmentation for range images

Gradient-based polyhedral segmentation for range images

Pattern Recognition Letters 24 (2003) 2069–2077 www.elsevier.com/locate/patrec Gradient-based polyhedral segmentation for range images Songtao Li, Do...

307KB Sizes 0 Downloads 59 Views

Pattern Recognition Letters 24 (2003) 2069–2077 www.elsevier.com/locate/patrec

Gradient-based polyhedral segmentation for range images Songtao Li, Dongming Zhao

*

Electrical and Computer Engineering, University of Michigan-Dearborn, 4901 Evergreen Road, Dearborn, MI 48128-1491, USA Received 15 April 2002; received in revised form 22 October 2002

Abstract A novel method is developed for robust polyhedral segmentation of 3-D range images. The method consists of three operations: (1) a two-dimensional gradient histogram space is generated based on gradients along directions of x- and ycoordinates; (2) a four-neighborhood iterative expanding algorithm is developed for region grouping according to a gradient feature space; (3) for noise and regions with geometrical distortion, a merge process is applied to the firstround segmentation results. The experiments show that the proposed algorithm generates good results for understanding polyhedral objects in range images. Ó 2003 Elsevier Science B.V. All rights reserved. Keywords: Range image; Segmentation; Region grouping; Surface normal

1. Introduction Segmentation is an assignment of pixels into one of many disjoint sets such that the pixels in each set share a common property (Arman and Aggarwal, 1993). Currently, many machine vision techniques use range images to obtain useful descriptions of 3-D scenes. Segmentation has been most often used in range image analysis and recognition as it helps partition an image array into low-level entities (Besl and Jain, 1988). In general, segmentation methods for range images are classified into two categories. One is based on region features, and the other is based on edge information.

*

Corresponding author. Tel.: +1-313-593-5527; fax: +1-313593-9967. E-mail address: [email protected] (D. Zhao).

In region-based segmentation methods (e.g. Yang and Kak, 1986; Kang and Ikeuchi, 1993; Liu and Wang, 1999), the pixels having similar properties are grouped together. The techniques of surface fitting and feature clustering are two commonly applied to analyze the region properties for range image segmentation. Region growing techniques select a set of seed regions first and grow the seed regions by iterative merging adjacent regions with similar properties. Besl and Jain (1988) reported Gaussian curvature and mean curvature sign labeling to realize an approximate segmentation, where an iterative region growing process based on variable-order surface fitting is employed to obtain an accurate surface approximation. This method requires that many empirically determined sensitive thresholds be used in each stage of the procedure. Lee et al. (1998) proposed a segmentation technique using adaptive

0167-8655/03/$ - see front matter Ó 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0167-8655(03)00044-8

2070

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

least kth order squares estimator, where the planar affected by noise is obtained by minimizing the kth order statistics of the squared of residuals. One important development in surface segmentation for range images is credited to Hoover et al. (1996). The algorithms in HooverÕs paper are based on planar fitting, in which each pixel is aligned for a planar fit and then region growing is done according to pixels with similar plane equations. Both approaches (Hoover et al., 1996; Lee et al., 1998) show good experimental results, however they demand exceptionally long computing time. For most techniques based on clustering, the true number of regions (clusters) has to be known a priori. The clustering methods without a cluster number, typically suffer from inflexibility and higher computational complexity due to statistical parameter estimation and heuristic criteria. Jolion et al. (1991) proposed a clustering algorithm based on a minimum volume ellipsoid (MVE) estimator and employed the Kolomogorov–Smirnov test as an iteration criterion to achieve the best fitting without knowing the number of clusters. In a survey paper by Hoover et al. (1996), the authors introduced their separately developed segmentation methods. The segmentation algorithm by Hoover works by computing a planar fit for each pixel and then growing regions whose pixels have similar plane equations. In the algorithm by Flynn, the segmentation is obtained by an initial clustering and a further merging process. The applied surface normals are estimated by a principle component fit. In the algorithm by Eggert, the segmentation and merging process is based on an evaluation of two second-order surface features, Gaussian (H ) and mean (K) curvatures. The parameters of the methods with number ranging from 4 to 12 are trained using same range data. The experimental results are also compared. In the other category of range image segmentation techniques, regions are separated by edges between regions (Zhao and Zhang, 1997; Bellon and Direne, 1999). In edge-based methods, the discontinuities are extracted first, and the segmentation is then guided by the obtained contours. Inokuchi et al. (1982) presented an edge-region segmentation ring operator for depth maps that identify different types of edges (step edge, convex

roof edge, concave roof edge) and surface planar regions. A method by Fan et al. (1987) detects edges using zero crossings and extreme of curvature along one of four given directions. The boundaries are then classified as jump boundaries, folds, and ridge lines. The jump boundaries and folds are used to merge the surfaces into surface patches. A hybrid method for range image segmentation by Lim et al. (1994) uses a combination of an edge detection and a region growing method to obtain accurate edges for segmentation. The original edge detection result is used to steer the region growing process towards accurate border partitioning. The region growing process eliminates internal microedges and provides for missing edges. In a highlevel range image segmentation method based on scan line approximation by Jiang et al. (2000), a quadratic curve function is first fit along a scan line. The edge points within this scan line are obtained through a splitting process of the scan line. Various directional scan lines are applied to extract the edges within a range image. A final segmentation result is obtained according to the extracted boundary edges. The edge based segmentation techniques are sensitive to noise. The linkings of edge pixels are usually difficult to tackle when edges are broken. In more recent studies, Stamos and Allen (2000) proposed a planar segmentation method for range images. First, locally planar points are extracted based on the fitted plane equations and the eigenvectors of the deviation matrix around the points. Then the locally planar points are merged into regions according to the distance and angle differences. Finally, a plane is fitted applying all merged locally planar points. This approach generates good experimental results, along with a requirement of extensive computing time. In a paper by Koester and Spann (2000), an unsupervised region growing method is proposed based on a two-level hierarchy image structure. At first, the lower primitive components are extracted applying an estimation technique of least-median-of-squares. Then the extracted primitive components are iteratively merged into high-level primitive regions from the mutual inlier ratio (MIR). The ratio is obtained using robust regression techniques. In this study, a segmentation approach based on two-dimensional gradient histogram is pro-

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

posed and the images are taken from a range sensor. The gradients along directions of x- and y-coordinates are applied in the segmentation method. According to the distribution of the gradients histogram, a four-neighborhood region growing algorithm is developed to achieve the preliminary planar patch segmentation based on a proposed grouping criterion. The gradient features are derived from the first-order derivatives. Compared with the second-order derivative, typically used in some methods, the first-order derivatives are insensitive to noise and easy to compute. Since some pixels may not be grouped into correct regions due to noise effect, a merge process is applied on the first-round segmentation results. The selection of the segmentation parameters is also discussed in this study. The tests show that this algorithm can be used to segment polyhedral objects in range images and it is less sensitive to noise and computationally efficient.

2. Gradient-based region growing It is well known that the high-order derivative features are more sensitive to noise than low-order derivative features. To real range images, the results usually do not meet the expectation from theoretical aspects because of noise effects and quantization errors. Therefore, the first-order derivatives are used for the segmentation in this study. The segmentation features are one-dimensional slice gradients in two coordinate directions. For a given surface~ r ¼ ðx;y; f ðx;yÞÞ, f ðx;yÞ : ðx; yÞ 2 Z  Z ! R, according to the theory of differential * geometry, the unit surface normal n of a point on a patch is given by 1 n ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðrx ; ry ; 1Þ; 2 rx þ ry2 þ 1

*

ð1Þ

where rx and ry are the partial derivatives of ~ r with respect to x and y. The surface normal is uniquely determined by the two gradients along the two perpendicular axes. In the discrete domain, the first-order partial derivatives are approximated using mask operators proposed by Besl and Jain (1985). A planar patch on which the pixels have

2071

Fig. 1. The gradient histogram space.

same surface normal is partitioned through clustering the range gradients in the parameter space. In this gradient-based method, a gradient histogram is defined in a 3-D parameter space for the polyhedral segmentation. As shown in Fig. 1, X axis represents fx , Y -axis represents fy , and Z-axis represents the number of the pixels with the same fx and fy . For a given surface planar patch of a range image, it is assumed that the noise is uniform and Gaussian. Thus, the distribution of pixel numbers with respect to the two range gradients fx and fy is Gaussian. In the feature space each peak corresponds to a group of regions with the same surface normal and the top of each peak corresponds to the mean gradient values of each region. A segmentation of planar surfaces is obtained by partitioning peaks in the gradient histogram space. In the proposed region growing algorithm, two neighboring pixels are grouped into the same patch if their surface properties, fx and fy belong to the same Gaussian distribution in the gradient histogram space. A peak point searching algorithm is developed through iterative searching the local maxima in the gradient histogram space. For two neighborhood points P1 and P2 , with gradients (fx1 ; fy1 ) and (fx2 ; fy2 ), their corresponding histogram points, H ðfx1 ; fy1 Þ and H ðfx2 ; fy2 Þ, are shown in Fig. 2. A grouping criterion is used: if H ðfx1 ; fy1 Þ and H ðfx2 ; fy2 Þ share the same peak area in the feature space, P1 and P2 are grouped into the same region; otherwise P1 and P2 belong to different regions. A coarse segmentation result is obtained by applying a four-neighborhood expanding grouping algorithm based on the grouping criterion. The

2072

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

by Brady et al. (1985) is applied in this segmentation method. In a Gaussian smoothing, twodimensional image data are convolved with the rotation-invariant Gaussian filter gðx; yÞ ¼ eðx

Fig. 2. A paradigm for grouping criterion.

four-neighborhood expanding algorithm is an iterative grouping method beginning from a seed pixel. The seed pixels are the ungrouped pixels in range image scanned one by one pixel. For each area grouping operation, the grouping algorithm expands in four directions from a seed pixel of this region. The ungrouped pixel in each direction is compared with the seed pixel according to the grouping criterion. The new grouped pixels are added to this region, and saved as new seed pixels for the future grouping operations. This iterative grouping algorithm terminates till there is no new seed pixel grouped into this region. During grouping process, some points may be grouped into incorrect regions due to noise or other geometrical distortion. In this region growing approach, if the total pixel number of a grouped region is smaller than a threshold RNum , this region is concerned as dominated by noise or geometrical distortion. For incorrect coarse segmentation results, a merge process between a noise region and its right neighbor regions is developed. For two adjacent regions R1 and R2 while the pixel number of region R1 is smaller than RNum , and the pixel number of region R2 is larger than RNum , a merge criterion is proposed: if jfx1  fx2 j þ jfy1  fy2 j < D, R1 and R2 are merged into one region; otherwise R1 and R2 cannot be merged into one region. The variables fx1 , fx2 , fy1 and fy2 are the mean gradients of each region, and D is a threshold to gradient values. The selection of thresholds RNum and D is discussed in Section 3. Prior to calculating the surface properties, a smoothing process is applied on real range images, which include detector noise, quantization and calibration errors. A Gaussian smoothing method

2 þy 2 Þ=2r2

;

ð2Þ

where ðx; yÞ is the coordinate of each point in the image, and r2 is the variance. Alternatively, Gaussian smoothing is approximated through a convolution with a 3  3 operator 2 3 1 2 1 4 2 12 2 5: ð3Þ 1 2 1 This Gaussian operator is additional to the mask by Besl. The introduced Gaussian smoothing effects further reduce noise in the range image samples. The gradients of each pixel are computed, and the histogram is then obtained according to the gradients. A coarse segmentation result is obtained using a four-neighborhood iterative expanding algorithm according to the grouping criterion. For those grouped regions dominated by noise or geometrical distortion, a merge process is applied based on the merge criterion. In this region growing approach, the cluster region number is not required as a priori. The procedure is described by the following steps: 1. The gradients of each pixel in the smoothed range image are computed, and the gradient histogram is obtained. 2. The segmentation result map Q is initialized by setting each point with a non-processing value )1, also setting the current region label as 0, and setting the current scanning position in Q as i ¼ 0 and j ¼ 0. 3. Scanning and checking the map pixels {(i; j)} one by one. 4. If Qði; jÞ 6¼ 1 go to step 6. 5. The pixel (i; j) is taken as a seed of a new region. The current region label adds 1. According to the gradients of this point in the range image, this seed grows along its four neighboring directions based on the grouping criterion. The grouped points are taken as new growing point candidates. This four neighboring expansion process is iterated until no new point is grouped into

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

this region. At each grouping process, the value of the grouped pixel in the segmentation map Q is set as the current region label. The attributes of this region, such as region pixel number and mean gradient f are also modified. 6. If the scanning process is not over, go to step 3. 7. The regions with its pixel number smaller than threshold RNum are processed according to the merge criterion. 3. Threshold selection There are two thresholds, RNum and D, used in this gradient-based region growing method. Fig. 3 shows a paradigm of selecting threshold RNum . In this gradient histogram, the pixels whose gradient histogram values are larger than the RNum are taken as effective region points, otherwise they are taken as noise points. When RNum is selected to be smaller than a representative value for target regions, some pixels due to noise and some small blocks due to distortion are then taken as valid regions, thus causing under thresholding. When RNum is larger than necessary, some small but valid regions are treated as noise, thus causing over thresholding. In this study, an area ratio of the noise points to the whole points in a range image is applied to decide the threshold RNum . Normally the area ratio value depends on the areas such as noise area, geometrical distortion area and jump edges. According to a candidate threshold R, the area ratio value, f ðRÞ, is given by 1 XX f ðRÞ ¼ H ðfx ; fy Þ; H ðfx ; fy Þ < R; Total fx fy ð4Þ

2073

where R is a candidate threshold value in the gradient feature space, and Total is the total pixel number of the processed range image. For a predefined area ratio value p, threshold RNum applied in this segmentation method is derived by f ðRNum Þ ¼ p:

ð5Þ

Fig. 3 shows a paradigm of selecting threshold RNum . The threshold RNum is obtained by applying an iterative searching algorithm. The candidate threshold R is increased one by one from zero. For each candidate threshold R, the area ratio value is computed according to Eq. (4). If Eq. (5) is satisfied, the candidate R is taken as the threshold RNum . The area ratio value p in Eq. (5) is a predefined parameter. The selection of value p depends on the contents of range images. If range images contain a lot of noise, distortion and edge areas, the noise ratio is large. Therefore, parameter p is selected as a large value. Otherwise, parameter p is selected as a small value. For normal range images, parameter p is selected as five percent for the noise area inside the images. Threshold D is another parameter in the merge process. After the coarse segmentation, for those regions, whose region pixel number is smaller than threshold RNum , a merge process is applied between the small regions and their adjacent larger regions, whose region pixel number is larger than threshold RNum . As defined in merge criterion, two adjacent regions are merged if the difference between their mean gradients is smaller than threshold D. The merge process is executed region by region. For two adjacent regions, the value of threshold D is decided by the gradient variation of the large region within the pair of merging regions. Parameter rf is defined as the mean difference of gradients of the large region, and is computed by rf ¼

N h i 1 X jfxi  fx j þ jfyi  fy j ; N i¼0

ð6Þ

where fx , fy are the mean gradient values of the region, and N is the pixel number of this region, and fxi and fyi are the gradients on x-axis and yaxis, respectively, of pixel i. Based on the gradient difference parameter rf , threshold D is given by Fig. 3. A paradigm for selecting threshold RNum .

D ¼ 2rf :

ð7Þ

2074

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

4. Experiments and analysis The range images used in this study are acquired using a laser range imager. Noise is in the images. A Gaussian smoothing process, defined in Eqs. (2) and (3), is applied to obtain accurate gradient properties. Therefore prior to the step 1 in the algorithm in Section 2, the original range image is smoothed using the Gaussian mask shown in Eq. (3). As discussed in Section 3, the area ratio value of parameter p depends on the areas, such as geometrical distortion area and jump edges. In these experiments, parameter p is selected as 5%. Based on the ground-truth rule, the extracted segmentation results are matched against the original range images. The experiments show that this gradient-based segmentation method generates good results for understanding 3-D polyhedral objects. Fig. 4(a) is an original intensity image of wall and ground. Its size is 166  143 pixels. Fig. 4(b) is the corresponding range image. Image noise is apparent in a 3-D display mode as shown in Fig. 4(c). Fig. 4(d) shows the gradient histogram of the smoothed range image. There are two major peaks corresponding to the directions of the wall and the

ground regions. Some areas with small histogram values represent noise effect and geometry distortion. Through our region grouping approach, the neighborhood pixels within the vicinity of the same gradient peak are grouped together. The region grouping result without the merge process is shown in Fig. 4(e). Most of pixels are grouped correctly. However, three small patches are grouped incorrectly because of noise or geometry distortion. Fig. 4(f) shows the final result of applying the merge process. Fig. 5 shows another segmentation process. Fig. 5(a) and (b) show the original intensity and range images of a box. The image size is 298  187 pixels. A 3-D display mode shows the range image in Fig. 5(c). Image noise can be seen clearly. Fig. 5(d) shows the gradient histogram. The main peak areas and some random distribution areas due to the image noise and jump edges can be seen. The initial region grouping result is shown in Fig. 5(e). Some small patches and jump edges are grouped incorrectly. These regions are merged into their neighbor regions using the merge process. Fig. 5(f) shows the final segmentation result. Fig. 6(a) shows an original intensity image of a packaging box. The image size is 364  248 pixels.

Fig. 4. A segmentation sample of wall and ground. (a) Original intensity image; (b) correspondent range image of (a); (c) 3-D display for showing image noise; (d) gradient feature space; (e) the result of initial region grouping; and (f) the final segmentation result of applying the merge process.

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

2075

Fig. 5. A segmentation sample of a box. (a) Original intensity image; (b) correspondent range image of (a); (c) 3-D display for showing image noise; (d) gradient feature space; (e) the result of initial region grouping; and (f) the final segmentation result of applying the merge process.

Fig. 6(b) shows the corresponding range image. A 3-D display mode is shown in Fig. 6(c). Fig. 6(d) shows the feature space of the range image. Three major peaks correspond to the three directions of the planar regions. The region grouping result of the coarse segmentation is shown in Fig. 6(e). Fig. 6(f) shows the final segmentation result after applying the merge process. The noise area and jump edges in range images have been correctly grouped through the proposed gradient-based region grouping method. In this study, two thresholds used in the merge process are derived from Eqs. (4)–(7). The experimental results show that the thresholds have been selected properly. Fig. 7 shows some segmentation results of the object in Fig. 6(b), with different threshold values. It is known that if RNum is set too small, some small noise regions are taken as real separate regions and then lead to an incorrect segmentation as shown in Fig. 7(a). If RNum is set

too large, some valid regions whose region pixel number is smaller than RNum , are taken as noise regions. If these small regions and their adjacent large regions satisfy the merge criterion, these valid regions are merged incorrectly into the adjacent large regions. As shown in Fig. 7(b), two planers of the hole of the lower-left corner of the box surface are merged into one region. The different values of the threshold D also affect the final segmentation result. If D is selected too large, some real separate neighboring regions with smaller gradient difference will be merged together. There is no such instance in this image. If D is selected too small, some small regions cannot be grouped correctly into their neighborhood because of their larger gradient difference. As shown in Fig. 7(c), compared with the correct segmentation results shown in Fig. 6(f), there is an incorrectly merged region on the top surface of the box.

2076

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

Fig. 6. A segmentation sample of a part. (a) Original intensity image; (b) correspondent range image of (a); (c) 3-D display for showing image noise; (d) gradient feature space; (e) the result of initial region grouping; and (f) the final segmentation result of applying the merge process.

Fig. 7. Some incorrect segmentation results with different threshold selections: (a) the segmentation result with a smaller RNum ; (b) the segmentation result with a larger RNum ; and (c) the segmentation result with a smaller D.

5. Conclusion A gradient-based polyhedral segmentation approach for range images is proposed in this paper. Compared with other popular segmentation methods, this approach is much simpler and more ro-

bust because only the first-order features are used. The gradients along x- and y-directions and their histogram are applied as the segmentation features. The problem of surface segmentation becomes to a problem of points clustering in a feature space. A grouping criterion is proposed for

S. Li, D. Zhao / Pattern Recognition Letters 24 (2003) 2069–2077

the region growing based on the gradient histogram. A four-neighborhood iterative expanding algorithm is developed in region grouping. The regions with arbitrary shape can be grown in one expanding process. For a certain type of regions in range images, such as noise, geometrical distortion area and jump edges, a merge process is applied to the initial region growing results according to a proposed merge criterion. The thresholds used in this segmentation approach are also discussed. The experiments show that the proposed algorithm generates good results for understanding polyhedral objects in range images. References Arman, F., Aggarwal, J.K., 1993. Model-based object recognition in dense-range image––a review. ACM Comput. Surveys 25 (1), 5–43. Bellon, O., Direne, A., 1999. Edge detection to guide range image segmentation by clustering techniques. In: 1999 International Conference on Image Processing 2, pp. 725–729. Besl, P., Jain, R., 1985. Three-dimensional object recognition. Comput. Survey 17 (1), 75–145. Besl, P., Jain, R., 1988. Segmentation through variable-order surface fitting. IEEE Trans. Pattern Anal. Machine Intell. 10 (2), 167–192. Brady, M., Ronce, J., Yullie, A., Asada, H., 1985. Describing surfaces. Computer Vision. Graphics Image Process. 32, 1–28. Fan, T., Medioni, G., Nevatia, R., 1987. Segmented description of 3-D surfaces. IEEE J. Robot. Automat. RA 3 (6), 527–538.

2077

Hoover, A., Jean, G., et al., 1996. An experimental comparison of range image segmentation algorithms. IEEE Trans. Pattern Anal. Machine Intell. 18 (7), 673–689. Inokuchi, S., Nita, T., Matsuday, F., Sakurai, Y., 1982. A three-dimensional edge-region operator for range pictures. In: Proceedings of 6th International Conference on Pattern Recognition, Munich, West Germany, pp. 918–920. Jiang, X., Bunke, H., Meier, U., 2000. High-level feature based range image segmentation. Image Vision Comput. 18, 817– 822. Jolion, J., Meer, P., et al., 1991. Robust clustering with applications in computer vision. IEEE Trans. Pattern Anal. Machine Intell. 13 (8), 791–802. Kang, S., Ikeuchi, K., 1993. The complex EGI: A new representation for 3-D pose determination. IEEE Trans. Pattern Anal. Machine Intell. 15 (7), 707–721. Koester, K., Spann, M., 2000. MIR: An approach to robust clustering-application to range image segmentation. IEEE Trans. PAMI 22 (5), 430–444. Lee, K. et al., 1998. Robust adaptive segmentation of range images. IEEE Trans. Pattern Anal. Machine Intell. 20 (2), 200–205. Lim, A., Teoh, E., Mital, D., 1994. A hybrid method for range image segmentation. J. Math. Imaging Vision 4 (1), 69–80. Liu, X., Wang, D., 1999. Range image segmentation using a relaxation oscillator network. IEEE Trans. Neural Networks 10 (5), 564–573. Stamos, I., Allen, P.K., 2000. 3-D model construction using range and image data. CVPR2000, June 13–15, 2000. Yang, S., Kak, A., 1986. Determination of the identity, position, and orientation of the top most object in a pile. Comput. Vision Graphics Image Process. 36, 229–255. Zhao, D., Zhang, X., 1997. Range-data-based object surface segmentation via edges and critical points. IEEE Trans. Image Process. 6 (6).