Hough-transform detection of lines in 3-D space

Hough-transform detection of lines in 3-D space

Pattern Recognition Letters 21 (2000) 843±849 www.elsevier.nl/locate/patrec Hough-transform detection of lines in 3-D space Prabir Bhattacharya a, H...

302KB Sizes 0 Downloads 39 Views

Pattern Recognition Letters 21 (2000) 843±849

www.elsevier.nl/locate/patrec

Hough-transform detection of lines in 3-D space Prabir Bhattacharya a, Haiying Liu a, Azriel Rosenfeld b,*, Scott Thompson b a

Department of Computer Science and Engineering, University of Nebraska, Lincoln, NE 68588-0115 , USA b Center for Automation Research, University of Maryland, College Park, MD 20742-3275, USA Received 14 September 1999

Abstract Detecting straight lines in 3-D space using a Hough transform approach involves a 4-D parameter space, which is cumbersome. In this paper, we show how to detect families of parallel lines in 3-D space at a moderate computational cost by using a (2 ‡ 2)-D Hough space. We ®rst ®nd peaks in the 2-D slope parameter space; for each of these peaks, we then ®nd peaks in the intercept parameter space. Our experimental results on range images of boxes and blocks indicate that the method works quite well. Ó 2000 Elsevier Science B.V. All rights reserved. Keywords: Range images; Straight line detection; 3-D Hough transform

1. Introduction The straight lines in the plane constitute a twoparameter family; for example, a line is uniquely determined by specifying its slope and its (signed) perpendicular distance from the origin. If edge or line fragments detected in the plane are mapped into (slope, distance) parameter space, collinear families of fragments will give rise to peaks in the parameter space. The mapping from the plane to the parameter space is called a Hough transform. Many variants on this idea have been used to detect lines, or other types of curves, in the plane. For reviews of the literature on Hough transforms see (Picton, 1987; Illingworth and Kittler, 1988; Leavers, 1993).

* Corresponding author. Tel.: +1-301-405-4526; fax: +1-301314-9115. E-mail address: [email protected] (A. Rosenfeld).

In principle, the Hough transform concept can be extended to linear subspaces of n-dimensional Euclidean space, which can be regarded as intersections of hyperplanes (Alagar and Thiel, 1988). In particular, lines in 3-D space can be regarded as intersections of planes (Tanaka and Ballard, 1985). (On this and other representations of lines in space see (Zhang and Faugeras, 1992).) The planes in 3-D space have a simple 3-D parameterization; for example, a plane is determined by two direction cosines of its normal (two ``slope'' parameters) and by its (signed) perpendicular distance from the origin (Muller and Mohr, 1984). The Hough transform does not seem to have been used to detect lines in 3-D space, perhaps because such lines constitute a 4-D parameter space. For example (Roberts, 1988), a line is determined by two of its direction cosines and by the Cartesian coordinates of its intersection with a plane through the origin perpendicular to

0167-8655/00/$ - see front matter Ó 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 0 ) 0 0 0 4 4 - 1

844

P. Bhattacharya et al. / Pattern Recognition Letters 21 (2000) 843±849

it. This representation uses two ``slope'' parameters and two ``intercept'' parameters; the analogous parameterization of a line in the plane would use its slope and the coordinate of its intersection with a line through the origin perpendicular to it. Using this 4-D Hough space to detect arbitrary collinear sets of edge or line fragments in 3-D space would be cumbersome. Man-made environments, however, often contain many straight edges or lines that belong to a small number of parallel families. (This is true, for example, for the edges de®ned by the walls, ¯oor, ceiling, doors and windows in a room.) These families can be detected as peaks in the slope parameter space, which is only 2-D (the direction of a 3-D line is determined by two direction cosines). For each such peak, the individual lines can then be detected as peaks in the intercept parameter space, which is also only 2-D. Thus, if the straight edges or lines in a scene are all oriented in only a few directions, this method can be used to detect them at moderate computational cost. In this note we illustrate the use of the (2 ‡ 2)-D Hough space to detect straight-line edges in range images of scenes containing boxes or blocks. 2. Examples The Hough transform approach de®ned above cannot be applied to detect straight edges or lines in ordinary (intensity) images of a scene, because such images do not provide full 3-D information about the lines. (If the lines were known to lie in a speci®c plane in the scene, their Hough parameters in the image plane would determine their parameters in 3-D space (Grimson, 1990; Murino and Foresti, 1997), but we do not assume here that a scene plane containing the lines is known. Full 3-D information is available in 3-D images, e.g., obtained by reconstruction from projections; but most of the commonly available 3-D images are images of the human body, which does not contain planar surfaces or straight edges. A better source of 3-D information about straight edges or lines is range imagery of man-

made objects (Jain and Jain, 1990). 1 At an occluding straight edge in such an image there is an abrupt change in range; at a dihedral angle (a straight edge where two planes meet), there is an abrupt change in the rate of change of range. Thus these types of edges give rise to high values in the ®rst or second derivative of range in the direction across the edge. Standard range sensors provide information about the ranges of visible points in the scene; thus they can be used to detect the points in the scene where the derivative of the range is high, and to determine the 3-D locations of these points. To illustrate how straight edges can be detected in a range image, Fig. 1(a) shows a synthetic digital range image of a cube viewed from an oblique direction. In this image, the value of a point at range z is proportional to c ÿ z, where c is a constant. For display purposes, the brightness of the closest point is shown as 255, and the background points (at ``in®nite'' distance) are shown as 0 (black); thus if the closest point is at distance z0 , the displayed grey level of a point at distance z is …255z0 †=z. We see that there are six occluding edges, where the faces of the cube occlude the background, and three dihedral edges, where the faces of the cube meet. At the dihedral edges, as mentioned above, there is an abrupt change in the rate of change of range; thus to detect these edges it is necessary to use a second-di€erence operator which is insensitive to direction. A simple operator of this type is the digital Laplacian, de®ned as the sum of the second di€erences of the z values in the x and y directions. (Speci®cally: Let z ˆ f …x; y†; then the sum of second di€erences at x; y is f …x ‡ 1; y† ‡ f …x ÿ 1; y† ‡ f …x; y ‡ 1† ‡ f …x; y ÿ 1† ÿ 4f …x; y†; we convert the sum into an average by dividing it by 4.) This operator also detects the occluding edges, because the rate of change across such an edge is very high, while the rate of change along the edge is lower on the closer side and is zero on

1

One of the papers in this book, 3-D vision techniques for autonomous vehicles, by M. Hebert, T. Kanade, and I. Kweon, contains (p. 330) an example of a range image of stairs and sidewalks in which there are many parallel edges.

P. Bhattacharya et al. / Pattern Recognition Letters 21 (2000) 843±849

845

Fig. 1. Synthetic cube example: (a) synthetic range image of a cube; (b) edge points detected; (c) 3-D line segments ®tted to the edge points; (d) slope clusters, represented by 's, 's, and 's. (The axes show the values of the three direction cosines.); (e)±(g) projection onto the plane perpendicular to the mean slope of each cluster.

the background side. Note that the operator responds to an edge in two positions, located just on opposite sides of the edge, but these responses have opposite signs; thus we can get single (``thin'') responses, located at one side of the edge, by ignoring the responses that have positive sign (say). (We use the negative-sign responses because, as we

saw in the last paragraph, the value of a point decreases as its range increases.) The negative Laplacian values for the cube, thresholded at ÿ0:6, are shown in Fig. 1(b). (If the slant of a surface is 45 , z increases in the slant direction at the same rate that (x; y)-position changes. Thus at a 90 dihedral angle whose faces are slanted at 45 , say

846

P. Bhattacharya et al. / Pattern Recognition Letters 21 (2000) 843±849

in the x direction, the y di€erences are 0 and the x di€erences are both 1, so the average of the differences is 1/2; we used a threshold slightly larger than this in Fig. 1(b). If we rotate the dihedral angle so that one face becomes more strongly slanted and the other becomes more frontal to the sensor, the frontal di€erence decreases to 0 but the strongly slanted di€erence becomes in®nite, so the average of the di€erences is larger; thus a threshold that detects the edge in the 45 case will certainly detect it in other cases, and dihedrals sharper than 90 will also be detected.) (Concave dehedrals will be discussed later.) To apply our Hough transform approach to the detected edges, we must ®rst estimate their directions in 3-D space. The Laplacian operator provides some information about these directions, because it is based on measuring the rates of change of z in the x and y directions; but this information is quite inaccurate, because it is based on range di€erences at neighboring pixels. To obtain more accurate direction information we locally ®t straight line segments (in 3-space) to the edge points detected by the Laplacian. Speci®cally, for each detected edge point p, let Ep be the set of edge points that lie within distance d of p in the image (in our experiments we used d ˆ 10 in pixel units) and at which the direction of maximum rate of change of range is close to that at p; such edge points should lie on the same edge as p. We ®t a straight line segment to Ep , regarded as a set of 3D points …x; y; z†. The direction of this line segment in 3-space is an estimate of the 3-D direction of the edge on which the points Ep lie. Fig. 1(c) shows the results of ®tting 3-D line segments to the edge points shown in Fig. 1(b). Since Fig. 1(a) is a cube, we expect that its edges should lie in three mutually perpendicular directions. Due to the digitization e€ects, there is some variation in the directions of the line segments ®tted to the Ep s in Fig. 1(c); but these directions all belong to three relatively compact clusters, as shown in Fig. 1(d). (The points belonging to the clusters are plotted as squares, circles, and stars, respectively.) These clusters can easily be detected using the k-means clustering algorithm, in which we take k ˆ 3 because we expect there to be three clusters.

The three visible dihedral edges of the cube are mutually perpendicular, and two of the occluding edges are parallel to each dihedral edge; thus each of the slope clusters contains edge segments that belong to three parallel edges. Thus, if we project the edges in the direction corresponding to the mean of each direction cluster, the projection contains three intercept clusters corresponding to the three parallel edges. The three projections are shown in Fig. 1(e)±(g). The cube in Fig. 1(a) is synthetic, but we obtain similar results when we use real range images. (The images were obtained using a Cyberware 3030 3-D scanner.) Fig. 2(a) shows a high-resolution frontal range image of a box; in this digital image, a difference of one unit in x; y, or z represents a distance of 300 lm. The two visible faces of the box form a vertical convex dihedral edge about 1/3 of the way between the two vertical occluding edges. The horizontal edge at the top of the box actually consists of two edges in a horizontal plane, both receding from the sensor; hence there are three direction clusters, one representing the three vertical edges and the other two representing the two horizontal edges. The edges are shown in Fig. 2(b) and the results of line ®tting are shown in Fig. 2(c). The line ®tting process breaks up some of the edges into pieces, but these pieces are nearly collinear and can easily be grouped. The three clusters in direction space are shown in Fig. 2(d). (Fig. 2(e) shows the grouped line segments viewed from an oblique direction.) If we project the edges in the direction parallel to each of the two horizontal direction clusters, we obtain a single intercept; but if we project them parallel to the vertical direction, we obtain three intercepts, corresponding to the two occluding edges and the dihedral edge (Fig. 2(f)). Fig. 3(a) shows a (lower-resolution) range image of a stack of blocks; there are many visible short vertical edges, some convex dihedral and some occluding, and many visible convex, concave, or occluding edges in horizontal planes, some receding from the sensor and some approaching it. Fig. 3(b) shows the edges and Fig. 3(c) shows the ®tted line segments. (Since some of the dihedral edges in this image are concave, we had to use both the negative and the positive values of the

P. Bhattacharya et al. / Pattern Recognition Letters 21 (2000) 843±849

847

Fig. 2. Box example: (a) range image of a box; (b) edge points detected; (c) 3-D line segments ®tted to the edge points; (d) slope clusters, represented by 's, 's, and 's. (The axes show the values of the three direction cosines.); (e) oblique view of the grouped line segments; (f) projection onto the plane perpendicular to the mean slope of the cluster.

Laplacian, thresholded at 0:6, respectively.) The close parallel lines in Fig. 3(c) represent convex and concave edges that are close together in this frontal view. Fig. 3(d) shows the clusters in direction space. As the edges are all in three principal directions, there are still three clusters which the k-means algorithm distinguishes fairly well, but because this range image was not taken at high resolution, the clusters are less compact and the transitions between the  and clusters are not abrupt; however, there are still just two areas where the 's and 's are dense. In this example, there are many edges in each cluster because the scene contains many objects. To obtain them, we can group the nearly collinear line segments; an oblique view of the grouped segments is shown in Fig. 3(e)). Note that in this view, the close parallels have moved apart. The intercepts belonging to each cluster are plotted in Fig. 3(f)±(h).

3. Concluding remarks Detecting straight lines in 3-D space is a signi®cant problem in computer vision because many man-made environments contain lines, and often families of parallel lines. Detecting 3-D lines using the Hough transform approach involves a 4-D space, which is cumbersome. In this paper, we have shown how to detect families of parallel lines in 3-D space at moderate computational cost by using a (2 ‡ 2)-D Hough space. We applied our approach to range images of scenes containing boxes or blocks. We ®rst found peaks in the 2-D slope parameter space; for each of these peaks, we then found peaks in the intercept parameter space. To estimate the directions in 3-D space accurately, we ®t straight line segments locally to edge points detected by a Laplacian operator. Our experimental results indicate that the method works quite well.

848

P. Bhattacharya et al. / Pattern Recognition Letters 21 (2000) 843±849

Fig. 3. Stack of blocks example: (a) range image of a stack of blocks; (b) edge points detected; (c) 3-D line segments ®tted to the edge points; (d) slope clusters, represented by 's, 's, and 's. (The axes show the values of the three direction cosines.); (e) oblique view of the grouped line segments; (f)±(h) projection onto the plane perpendicular to the mean slope of each cluster.

Parameter space decomposition is often used to reduce the complexity of high-dimensional Hough transforms; (see e.g., Section 3.4 of Leavers, 1993). Since this approach is based on initially detecting clusters in a subspace, it is potentially susceptible to false alarms. In scenes that contain lines in many directions, this could be a serious problem. But in many man-made scenes, the lines lie primarily in three principal directions; this was true in all our examples, including the relatively complex

Fig. 3. Our method worked well for these scenes because we knew that there would be only three clusters in the slope space. Acknowledgements The authors thank Profs. Kevin Bowyer of the University of South Florida, Tampa, FL, and Ehud Rivlin of the Technion, Haifa, Israel, for

P. Bhattacharya et al. / Pattern Recognition Letters 21 (2000) 843±849

providing the range data. The ®rst two authors were supported by Grant F49620-98-1-0413 from the US Ballistic Missile Development Organization, and through the Nebraska DEPSCOR Program with matching funds from the University of Nebraska-Lincoln. References Alagar, V.S., Thiel, L.H., 1988. Algorithms for detecting m-dimensional objects in n-dimensional spaces. IEEE Trans. Pattern Anal. Machine Intell. 3, 245±256. Grimson, W.E.L., 1990. Object Recognition by Computer. MIT Press, Cambridge, MA, p. 314. Illingworth, J., Kittler, J., 1988. A survey of the Hough transform. Comput. Vision, Graphics Image Processing 44, 87±116.

849

Jain, R.C., Jain, A.K. (Eds.), 1990. Analysis and Interpretation of Range Images. Springer, New York. Leavers, V.F., 1993. Which Hough transform? Comput. Vision, Graphics Image Processing 58, 250±264. Muller, Y., Mohr, R., 1984. Planes and quadrics detection using Hough transform. In: Proc. Seventh Internat. Conf. Pattern Recognition, pp. 1101±1103. Murino, V., Foresti, G.L., 1997. 2D into 3D Hough space mapping for planar object pose estimation. Image Vision Comput. 15, 435±444. Picton, P.D., 1987. Hough transform references. Int. J. Pattern Recognition Arti®cial Intell. 1, 413±425. Roberts, K.S., 1988. A new representation for a line. In: Proc. IEEE Conf. Comput. Vision Pattern Recognition, pp. 635± 640. Tanaka, H.T., Ballard, D.H., 1985. Parallel polyhedral shape recognition. In: Proc. IEEE Conf. Comput. Vision Pattern Recognition, pp. 491±496. Zhang, Z., Faugeras, O., 1992. 3D Dynamic Scene Analysis ± A Stereo Based Approach, Springer, Berlin (chapter 4).