Grouping ., -, →,[formula], into Regions, Curves, and Junctions

Grouping ., -, →,[formula], into Regions, Curves, and Junctions

Computer Vision and Image Understanding Vol. 76, No. 1, October, pp. 54–69, 1999 Article ID cviu.1999.0787, available online at http://www.idealibrary...

904KB Sizes 0 Downloads 24 Views

Computer Vision and Image Understanding Vol. 76, No. 1, October, pp. 54–69, 1999 Article ID cviu.1999.0787, available online at http://www.idealibrary.com on

Grouping ., -, →, , into Regions, Curves, and Junctions Mi-Suen Lee∗ and G´erard Medioni Institute for Robotics and Intelligent Systems, University of Southern California, Los Angeles, California 90089-0273 E-mail: [email protected], [email protected] Received October 1, 1998; accepted June 14, 1999

ence of multiple structures, and the interaction between them, in noisy, irregularly clustered data sets. In this paper, we present a unified methodology for the robust inference of multiple salient structures such as junctions, curves, regions, and surfaces from any combination of points, curve elements, and surface patch elements inputs in 2D and 3D. The inference of curves, regions, and surfaces is traditionally studied in the fields of perceptual grouping in 2D [1, 5, 8, 10, 12], and surface reconstruction in 3D [2, 3, 4, 6, 21, 28]. This inference problem is often cast as a constrained functional optimization problem, which usually makes use of the uniqueness constraint and continuity constraint proposed by Marr [19]. Optimization techniques such as relaxation, dynamic programming, and stochastic methods are widely used. This formulation results in iterative, initialization, and parameter dependent solutions, which often fail to handle discontinuities. The difficulties in formulating the salient structure inference problem in an optimization framework comes from the fact that each point in 2D or 3D can only assume one of the three roles: either on a smooth continuous structure such as curve, region, or surface, or on a discontinuity, or an outlier. Since their properties are very different, it is hard to capture the possible states by a single continuous function and to recover the role by making binary decision. Inspired by Westin and Knutsson’s [9] tensorial framework to signal processing, and Guy and Medioni’s [12] work on nonlinear voting, we propose to make use of the representational capability of tensors to encode the three possible states, and the robustness of nonlinear voting to establish the role of each point in the domain space. We develop a technique called tensor voting to efficiently collect information in a large neighborhood containing any combination of points, curve elements, or surface patch elements. By analyzing the composition of the neighborhood information, we are able to handle the tasks of interpolation, discontinuity detection, and outlier identification simultaneously. We first describe previous work in Section 2, then present, in Section 3, our unifying tensorial framework for salient structure inference. We discuss the inference of curves and surfaces from undirected features in Section 4. We then extend the formalism to handle directed features. Inference of regions by incorporating polarity into our salient structure inference engine is presented

We address the problem of extracting segmented, structured information from noisy data obtained through local processing of images. A unified computational framework is developed for the inference of multiple salient structures such as junctions, curves, regions, and surfaces from any combinations of points, curve elements, and surface patch elements inputs in 2D and 3D. The methodology is grounded in two elements: tensor calculus for representation and nonlinear voting for data communication. Each input site communicates its information (a tensor) to its neighborhood through a predefined (tensor) field and, therefore, casts a (tensor) vote. Each site collects all the votes cast at its location and encodes them into a new tensor. A local, parallel routine such as a modified marching cube/square process then simultaneously detects junctions, curves, regions, and surfaces. The proposed method is noniterative, requires no initial guess or thresholding, can handle the presence of multiple curves, regions, and surfaces in a large amount of noise while it still preserves discontinuities, and the only free parameter is scale. We present results of curve and region inference from a variety of inputs. °c 1999 Academic Press Key Words: shape inference; perceptual grouping; robust estimation.

1. INTRODUCTION In computer vision, we often face the problem of identifying salient and structured information in a noisy data set. From greyscale images, edges are extracted by first detecting subtle local changes in intensity and then linking locations, based on the noisy signal responses. From binocular images, surfaces are inferred by first obtaining depth hypotheses for points and/or edges using local correlation measures and then selecting and interpolating appropriate values for all points in the images. Similarly, in image sequence analysis, the estimation of motion and shape starts with local measurements of feature correspondences which give noisy data for the subsequence computation of scene information. Hence, for a salient structure estimator to be useful in computer vision, it has to be able to handle the pres∗ Corresponding author. Current address: Philips Research, 345 Scarborough Road, Briarcliff Manor, NY 10510-2099, E-mail: [email protected]. 54 1077-3142/99 $30.00 c 1999 by Academic Press Copyright ° All rights of reproduction in any form reserved.

GROUPING REGIONS, CURVES, AND JUNCTIONS

in Section 6. We conclude with a discussion on future work in Section 7. 2. PREVIOUS WORK Based on the type of input, we can broadly classify previous work on the inference of curves, regions, and surfaces, into three areas, namely, dot clustering in 2D, curve and contour inference in 2D, and surface reconstruction in 3D. Zucker and Hummel [31] were the first to address the problem of region inference from dot clusters. They have developed an iterative relaxation labelling technique, which became the basic framework for many region inference algorithms. A summary of these methods, together with an algorithm developed by Ahuja and Tuceryan, can be found in [1]. Recently, Shi, and Malik [24] propose to treat region inference as a graph-partitioning problem and present a recursive algorithm that uses the normalized cut as the global criterion to segment tokens into regions. Most curve and contour inference methods use specific curve and contour models [22]. Only a few approaches infer generic curves. Parent and Zucker [20] use a relaxation labeling scheme that imposes co-circularity and constancy of curvature. Dolan and Weiss [5] demonstrate a hierarchical approach to grouping, relying on compatibility measures such as proximity and good continuation. Sha’ashua and Ullman [23] suggest the use of a saliency measure to guide the grouping process and to eliminate erroneous features in the image. The scheme prefers long curves with low total curvature and does so by using an incremental optimization scheme (similar to dynamic programming). An early work on illusory contour completion is Grossberg and Mingolla’s [10] Boundary Contour System (BCS) Model, which accounts for illusory contours in a dynamic neural network architecture. The principle of operation involves a layer of oriented edge sensitive cells, and a higher level cooperative cell layer which receives activation from the oriented cell layer by way of a bipolar receptive field in an orientation specific manner. BCS mainly deal with completing collinear illusory contours. In [13], Heitger and von der Heydt have presented a computational model for recovering illusory contours by combining convolutions of keypoint images with anisotropic, orientation selective filters. Their model assumes perfect endpoints and T-junctions as inputs. Williams and Jacobs [29] propose to use the assumption that the prior distribution of completion shape can be modeled as a random walk. They employed vector field convolutions with stochastic extension properties to assign saliency to image locations. Again, this method assumes perfect keypoint inputs. In addressing the same problem, Geiger et al. [8] present a statistical physics approach which uses a optimization criteria based on a coherent measure between pairs of junctions. In 3D, the input consists of points, curve elements, or surface elements, and the output should be a set of surfaces, possibly bounded by curves and junctions. Under some conditions, it may be possible to have an initial estimate of the target shape. The physics-based approaches [15, 17, 28], in which the initial

55

pattern deforms subject to some forces, may then be applicable. When we know in advance that the pattern consists of a single surface, methods such as Poggio and Girosi’s [21] network learning approach can be used. In general, the problem is to fit surfaces to a cloud of possibly noisy 3D points. Fua and Sander [6] have proposed a local algorithm to describe surfaces from a set of points. Szeliski et al. [26], have proposed the use of a physically based dynamic local process evolving from each data point (particle) and subject to various forces. They are able to handle objects of unrestricted topology, but assume a single connected object. Hoppe et al. [14] and Boissonnat [3] use computational geometry to address the problem by treating the data as vertices of a graph and constructing edges based on local properties, which is most appropriate in the absence of noise. Among all the grouping methods, Guy and Medioni’s work is one of the few nonlinear, noniterative approaches that is able to handle both curve inference in 2D and surface inference in 3D from point, curve element, or surface patch element inputs, although individually. The strength of their method comes from the fact that, while curve or surface interpolation can be accomplished by collecting orientation information in a local neighborhood, discontinuity detection and outlier identification can be realized as well by measuring the confidence (or saliency) and the consistency of the orientation estimations obtained from the neighbors. They have introduced a noniterative technique called vector voting to gather orientation estimation in a local neighborhood. These orientation estimations are generated by rules easily captured by vector fields, allowing an efficient convolution-like implementation for data communication. The novelty of their method lies on the use of second-order moments in accumulating and interpreting the vector votes. However, since this method uses vectors to represent data, it needs to use different rules to estimate orientation from different types of input. Moreover, due to the use of second-order moments, only the direction of the vector votes are used and, hence, the method is unable to handle the inference of directed curves and regions. 3. TENSORIAL FRAMEWORK FOR SALIENT STRUCTURE INFERENCE To derive a general salient structure estimator which interpolates, detects discontinuities, and identifies outliers simultaneously, we introduce the use of a construct called saliency tensor which is able to signal the presence of a salient structure, or a discontinuity, or a outlier at any location. The design of the saliency tensor is based on the observation that a discontinuity is indicated by the disagreement in surface or curve orientation estimation. As orientations are represented by vectors, it is therefore the variations in the vector estimation that signify a discontinuity, which obviously cannot be represented by a vector. An appropriate choice hence is a tensor, which is commonly used for capturing variations of direction in the study of dynamics, elasticity, fluids, and electromagnetics,

56

LEE AND MEDIONI

among others. On the other hand, outliers can be detected by insufficient support in the estimation of surface or curve orientation and therefore can be detected by the number of consistent votes, which we called saliency. This orthogonality of feature property and the confidence (or saliency) of the estimation has been noticed by Knutsson et al. [9], who have introduced the used of second-order symmetric tensors to the estimation of local structure and orientation in the context of signal processing for computer vision. Combining the efficiency of voting with the strength of tensor representation, we develop a salient structure inference engine which makes use of tensor voting to extract salient junctions, curves, regions, and surfaces from noisy data. In the following, we first outline the mechanism of our salient structure inference engine in Section 3.1. We then define the saliency tensor in Section 3.2. The details of tensor voting and voting interpretation are given in Section 3.3. We conclude this section with a discussion on performance in Section 3.4. In this paper, scalars are denoted by italic letters, e.g. l; tensors are denoted by bold capital letters, e.g. T; vectors are denoted by bold lower-case letters, e.g. e. The unit length vector for e is ˆ denoted as e. 3.1. Overview of the Inference Engine Our salient structure inference engine is a noniterative procedure that makes use of tensor voting to infer salient geometric structures from local features. Figure 1 illustrates the mechanisms of this engine. In practice, the domain space is digitized into discrete cells, which are pixels in 2D and voxels in 3D. We use a construct call a saliency tensor, to be defined in Section 3.2, to encode the saliency of various features such as points, curve elements, and surface path elements. Local feature

detectors provide an initial local measure of features. The engine then encodes the input into a sparse saliency tensor field. Saliency inference is achieved by allowing the elements in the tensor field to communicate through voting. A voting function specifies the voter’s estimation of local orientation/discontinuity and its saliency regarding the collecting site and is represented by tensor fields. The vote accumulator uses the information collected to update the saliency tensor at locations with features and establish those at previously empty locations. From the densified saliency tensor field, the vote interpreter produces saliency maps for various features from which salient structures can be extracted. By taking into account the different properties of different features, the integrator locally combines the features into a consistent salient structure. Once the generic saliency inference engine is defined, it can be used to perform many tasks, simply by changing the voting function. In the following, we focus our discussion on the use of tensor in salient structure inference. The process of structure integration is the same as that used in Guy and Medioni’s method, which can be found in [27]. 3.2. Saliency Tensor The design of the saliency tensor is based on the observation that discontinuity is indicated by the variation of orientation when estimating smooth structure such as surface or curve locally. Intuitively, the variation of orientation can be captured by an ellipsoid in 3D and an ellipse in 2D. Note that it is only the shape of the ellipsoid/ellipse that describe the orientation variation; we can therefore encode saliency, i.e., the confidence in the estimation, as the size of the ellipsoid/ellipse. One way to represent an ellipsoid/ellipse is to use the distribution of points on the ellipsoid/ellipse as described by the covariance matrix. By decomposing a covariance matrix T into its eigenvalues λ1 , λ2 , λ3

FIG. 1. Flowchart of the salient structure inference engine.

GROUPING REGIONS, CURVES, AND JUNCTIONS

FIG. 2. An ellipsoid and the eigensystem.

and eigenvectors eˆ 1 , eˆ 2 , eˆ 3 , we can rewrite T as    eˆ T 1 λ1 0 0    T = [ˆe 1 eˆ 2 eˆ 3 ]  0 λ2 0   eˆ 2T 0 0 λ3 eˆ T

  . 

(1)

3

Thus, T = λ1eˆ 1eˆ 1T + λ2eˆ 2eˆ 2T + λ3eˆ 3eˆ 3T , where λ1 ≥ λ2 ≥ λ3 and eˆ 1 , eˆ 2 , eˆ 3 are the eigenvectors corresponding to λ1 , λ2 , λ3 , respectively. The eigenvectors correspond to the principal directions of the ellipsoid/ellipse and the eigenvalues encode the size and shape of the ellipsoid/ellipse, as shown in Fig. 2. T is a linear combination of outer product tensors and, therefore, a tensor. This tensor representation of orientation/discontinuity estimation and its saliency allows the identification of features, such as points or curve elements, and the description of the underlying structure, such as tangents, to be computed at the same time. Since the local characterizing orientation, such as normal or tangent, varies with the target structure to be detected, such as curve or surface, the physical interpretation of the saliency tensor is specific to the structure. However, the inference of the saliency tensor is independent of the physical interpretation. This structure dependency merely means that different kinds of structures have to be inferred independently. In particular, one needs to realize that the physical meaning of having a curve on a surface is different from a curve in space itself. This also applies to points on surfaces and curves. Figure 3 and Fig. 4 show the shape and

57

representation of the saliency tensors for various features under surface and curve inference, respectively. The inference of 2D structures is considered to be embedded in 3D and, thus, is not treated separately. Every saliency tensor thus has six parameters, λ1 , λ2 , λ3 , θ, φ, and ψ, where θ, φ, ψ are the orientation of the principal axes (Fig. 2). This saliency tensor is sufficient to describe feature saliency for unpolarized structure such as nonoriented curves and surfaces. When dealing with boundaries of regions or with oriented data (edgels), we need to associate a polarity to the feature orientation to distinguish the inside from the outside. Polarity saliency, which relates the polarity of the feature and its significance, will be addressed when we discuss region inference in Section 5. According to the spectrum theorem [9], a general saliency tensor T can be expressed as a linear combination of a stick tensor, which describes a perfect orientation estimation, a plate tensor, which describes orientation uncertainty except in one direction, and a ball tensor, which describes total orientation uncertainty as ¡ ¢ T = (λ1 − λ2 )ˆe1 eˆ 1T + (λ2 − λ3 ) eˆ 1 eˆ 1T + eˆ 2 eˆ 2T ¡ ¢ (2) + λ3 eˆ 1 eˆ 1T + eˆ 2 eˆ 2T + eˆ 3 eˆ 3T , where eˆ 1eˆ T1 describes a stick, (ˆe 1eˆ T1 + eˆ 2eˆ T2 ) describes a plate, and (ˆe 1eˆ T1 + eˆ 2eˆ T2 + eˆ 3eˆ T3 ) describes a ball. Figure 5 shows the shape of a general saliency tensor as described by the stick, plate, and ball components. A typical example of a general tensor is one that represents the orientation estimates at a corner. Using orientation selective filters, multiple orientations uˆ i can be detected Pat corner points with strength si . We can use the tensor sum si2 uˆ i uˆ iT to compute the distribution of the orientation estimates. The addition of two saliency tensors simply combines the distribution of the orientation estimates and the saliencies. This

FIG. 3. Saliency tensors for surface inference.

58

LEE AND MEDIONI

FIG. 4. Saliency tensors for curve inference.

linearity of the saliency tensor enables us to combine votes and represent the result efficiently. Also, since the input to this voting process can be represented by saliency tensors, our saliency inference is a closed operation whose input, processing token, and output are the same. 3.3. Tensor Voting As described in the previous section, by capturing the variation of orientation at each location we can represent in a unified manner the three possible roles of a location as either on a smooth structure, such as surface or region or curve, a discontinuity, or an outlier. Using such a data representation, we can tackle the three subtasks of salient structure inference, namely, interpolation, discontinuities detection, and outliers identification, simultaneously, by estimating the variation of orientation at each location in the domain space. Unlike traditional approaches which seek to obtain such estimation by computing a scalar value that measures the fitness of the data set against some specific surface model, we instead obtain the estimation by combining the orientation estimations collected from every data item in a large neighborhood. Individual estimation is generated according to a simple, arbitrary model that takes into account the orientation information

FIG. 5. A general saliency tensor.

of the data item and the relative position of the target location with respect to the data item. Thanks to the redundancy and the linearity of the saliency tensor, this seemingly expensive process can be implemented by an efficient convolution-like operation which we call tensor voting. This is similar to the vector voting technique developed by Guy and Medioni [11]. By using tensors as data representation instead of vectors, however, we augment such nonlinear voting technique with a mathematical foundation that accounts for many ad hoc processes in vector voting and provides a unified framework for salient structure inference. Once the orientation certainty and the saliency are established, we can label each location in the domain space by identifying the most salient estimations in the local neighborhood. The process of salient structure inference thus proceeds by first encoding the inputs as saliency tensors according to Section 3.2, producing a sparse feature saliency tensor field. To vote, each active site has to generate a tensor value at all locations that describe the voter’s estimate of orientation at the location and the saliency of the estimation. The value of the tensor vote cast at a given location is given by the voting function which characterizes the structure to be detected, designed according to the “matter is cohesive” principle. The definition of the voting functions for surface and curve inference, together with a few illustrative examples, are given in Section 4. Regardless of the specific design of the voting function, due to the general properties of the function (see Appendix) that encode the smoothness constraint, the vote generation process can always be implemented as a convolution with three discrete saliency tensor fields: one encodes the votes due to the stick component of the voter, one encodes the votes due to the plate component of the voter, and one encodes the votes due to the ball component of the voter. Figure 6 illustrates the vote generation process in 2D. Moreover, the plate voting field and the ball voting field are generated from the stick voting field, which unifies the process for handling various orientation information.

GROUPING REGIONS, CURVES, AND JUNCTIONS

59

FIG. 6. Illustration of the voting process.

After all votes cast by active sites are collected by tensor additions at each location, the votes are interpreted by computing the eigensystem of the saliency tensor at each site, building feature saliency maps from the now densified saliency tensor field. According to Eq. (2), each saliency tensor can be broken down into three components, (λ1 − λ2 )ˆe1 eˆ T1 corresponds to an orientation, (λ2 − λ3 )(ˆe1 eˆ T1 + eˆ 2 eˆ T2 ) corresponds to a locally planar junction whose normal is represented by eˆ 3 , and λ3 (ˆe1 eˆ T1 + eˆ 2 eˆ T2 + eˆ 3 eˆ T3 ) corresponds to an isotropic junction. For surface inference, surface saliency therefore is measured by (λ1 − λ2 ), with normal estimated as eˆ 1 . The curve junction saliency is measured by (λ2 − λ3 ), with tangent estimated as eˆ 3 . For curve inference, the curve saliency is measured by (λ1 − λ2 ), with tangent estimated as eˆ 1 . The planar junction saliency is measured by (λ2 − λ3 ), with normal estimated as eˆ 3 . In both case, isotropic (or point) junction saliency is measured by λ3 . Notice that all the feature saliency values do not depend only on the saliency of the feature estimation, but also are determined by the distribution of orientation estimates. A few illustrative examples are given in Section 4, once the specific voting functions are defined. As all voting functions have to capture the smoothness constraint, sites near salient structures will also have high feature saliency values. The closer the site to the salient structure, the higher the feature saliency value the site will have. Structure features are thus located at the local maxima of the corresponding feature saliency map. The vote interpreter hence can extract features by nonmaxima suppression on the feature saliency map. To extract structures represented by oriented salient features, we use the method developed in [12]. Note that a maximal surface/curve is usually not an iso-surface/iso-contour. By definition, a maximal surface is a zero-crossing surface whose first derivative will change sign if we traverse across the surface. By expressing a maximal surface as a zero-crossing surface, an iso-surface marching process can be used to extract the maximal surface.

Similar discussion applies to maximal curve extraction. Details for extracting various structures from feature saliency maps can be found in [12, 27]. 3.4. Complexity The efficiency of the salient structure inference engine is determined by the complexity of tensor voting and vote interpretation. Given a voting function specified by the three discrete 3D tensor fields with C voxels each, tensor voting requires O(Cn) operations, where n is the number of input features. The size of C depends on the scale of the voting function and is usually much smaller than the size of the domain space, m. It is therefore the density of the input features that usually determines the time complexity of tensor voting. Since the density of input features is usually sparse, the total number of operations in tensor voting is normally a few times the size of the domain space. In vote interpretation, the saliency tensor in each location of the domain space is decomposed into its eigensystem, which takes O(1) for each of the m elements in the domain space. The marching cube/square processes [18] only takes O(m). Therefore the complexity of vote interpretation is O(m). In summary, the time complexity of the salient structure inference engine is O(Cn + m), where C is the size of the tensor fields specifying the voting function, n is the number of input features, and m is the size of the domain space. Notice that all the processing is local and, thus, can be performed in parallel, which allows a O(1) implementation on a parallel computer architecture. 4. INFERENCE FROM SYMMETRIC FEATURES While the salient structure inference engine defined above performs structure inference efficiently, it is the definition of the voting function that determines the effectiveness of the inference. For curve and surface inference, the voting function

60

LEE AND MEDIONI

specifies how curve or surface orientation is estimated by a voting feature regarding each location in the domain space. To capture the smoothness constraint, the orientations of the votes should change gradually in a local neighborhood. Thereˆ must be confore, any curve element with an orientation, say u, nected to points in a local neighborhood through noncrossing, ˆ From C1 -continuous paths with initial orientations equal to u. an energy consumption point of view, circular path, which is the only C1 -continuous path that has constant curvature and is defined for all pairs of points and orientations, is preferred over other smooth paths. Similarly, surface patch elements are connected to other points through spherical surfaces. Besides the orientation of the votes, the voting function also specifies the saliency of the votes. In general, the influence of a voter must be smooth and decreases with distance. Moreover, low curvature is preferred over high curvature. We use the Gaussian function to model the decay DF of the voter’s influence with path length s and curvature ρ as DF(s, ρ) = e−((s

2

+cρ 2 )/σ 2 )

,

where c is a constant that reflects the relative weight of path length and curvature and σ is the scale factor that determines the rate of attenuation. Following the arguments above, the voting field for the stick tensor T(1, 0, 0, 0, 0, 0), therefore, in polar coordinates is V 1 ( p(s, θ, φ)) = DF(s, ρ)ˆv vˆ T ,

where 2 sin θ θl , s= , l sin θ   cos 2θ   vˆ =  sin 2θ cos φ  sin 2θ sin φ ρ=

in the case of curve inference; and θl 2 cos θ , s= , ρ= l cos θ   −cos 2θ   vˆ =  −sin 2θ cos φ  −sin 2θ sin φ in the case of surface inference. Having defined the stick tensor voting field, the voting fields for plate and ball tensors are obtained by combining the votes due to individual orientations, as described in the Appendix. Figure 7 depicts the voting function as defined in above for the inference of curves in 2D. Applying similar argument, the voting field for surface inference is derived with similar shape, where individual votes oriented orthogonal to that of the curve voting field, as illustrated in Fig. 8. Having defined the voting function for curve and surface inference, we illustrate the tensor voting process in 2D. Figure 9

FIG. 7. Voting fields for stick and plate voters in 2D curve inference: (a) voting field for stick component; tensor votes (left) and saliency (right); (b) voting field for plate components, tensor votes (left) and saliency (right).

GROUPING REGIONS, CURVES, AND JUNCTIONS

61

a dense junction saliency map (Fig. 10d) are obtained. The underlying salient curves (Fig. 10c) and junctions (Fig. 10e) are then extracted using nonmaximal suppression. Figure 11 shows another example of curve inference in 2D by applying the same voting function. Figure 12 illustrates an example of surface inference in 3D, in which a peanut is intersected with a plane among noisy data

FIG. 8. Voting fields for stick, plate, and ball voters in 3D: (a) vote generation for the stick component; (b) a cut of the voting field for the stick component; (c) a cut of the voting field for the plate component; (d) a cut of the voting field for the ball component.

depicts two different configurations in which votes from two stick voters are combined. In the first configuration, the two stick voters are parallel and adjacent to each other. The saliency tensor field (Fig. 9b) obtained by combining the votes shown in Fig. 9a gives a map in which the best path for connecting the two voters is determined to be the straight line in the middle of the field. In the second configuration, the two stick voters are about 40◦ apart from each other. The best path for connecting the two is shown in Fig. 9d, obtained by combining the votes depicted in Fig. 9c. Applying the voting functions defined above, we have obtained results on sparse, noisy synthetic data. Figure 10a is a 128 × 128 binary image in which a ellipse and an open curve can be seen in a noisy background. After applying the tensor voting process to the image using the curve inference voting function with sigma = 18.25, a dense curve saliency map (Fig. 10b) and

FIG. 9. Illustration of the tensor voting process: (a) the two superposed voting fields; (b) the resulting saliency tensor field; (c) the two superposed voting fields; (d) the resulting saliency tensor field.

62

LEE AND MEDIONI

FIG. 10. Inference of curves in 2D.

in a 50 × 50 × 50 cube. To infer the salient structures, we use a surface voting function with sigma = 10. The junction curve extracted is shown in Fig. 12b and the integrated salient structure computed is shown in Fig. 12c. Just as other processes that use convolution, our method needs substantial execution time despite its linearity. Typically, on a 350 MHz Pentium II, it takes about 30 s to process 2D data in a 512 × 512 square, about 90 s to process 3D data in a 50 × 50 × 50 cube, and about 25 min to process 3D data in a

128 × 128 × 128 cube. To reduce this running time, we are investigating methods to perform the convolution with our basic fields as a cascade of convolutions with very small masks. 5. DIRECTED FEATURE PROCESSING So far, we have only considered the inference of structures from undirected features. However, in real applications, we often have information about the true direction of a vector, instead

GROUPING REGIONS, CURVES, AND JUNCTIONS

63

FIG. 11. Examples of 2D curve inference.

of its direction only. It is precisely this polarity information that is being expressed in the output of an edge detector, where the polarity of the directed edgel indicates which side is darker. As shown later, ignoring this information may be harmful. We therefore propose to augment our representation scheme to express this knowledge and to use it to perform directed curve grouping. We first discuss how polarity is being incorporated into our salient structure inference engine for the inference of structure from directed features. In the spirit of our methodology, we attempt to establish at every site in the domain space a measure we called polarity saliency, which relates the possibility of having a directed curve (in 2D) passing through the site. In particular, a site close to two parallel features with opposite polarities should have low polarity saliency, although the orientation saliency will be high. Encoding polarity. Figure 13 depicts the situation when the voter possesses this polarity information. Since polarity is only defined for directed features, it is necessary to associate polarity only with the stick component of the saliency tensors. Thus, a scalar value ranging from −1 to 1 encodes polarity saliency. The sign of this value indicates which side of the stick component

of the tensor is the intended direction. The magnitude of this value measures the degree of polarization at a site. Initially, all sites with directed features have polarity saliency of −1, 1, or 0 only, corresponding to opposing directed vectors, or unoriented features, respectively. Inferring polarity. Since polarity is associated with an orientation, we need to take the orientation into account when we infer polarity saliency. Note that despite the association between polarity and orientation, they are independent of each other and, thus, are inferred separately. An intuitive way Pto determine polarity is to compute the vector sum u(x, y) = v i (x, y) of these directed votes v i (x, y)’s. The length of the resulting vector gives the degree of polarization and the resulting direction relates the indented polarity. We thus assign to every site a polarity saliency PS(x, y) as PS(x, y) = sgn(u(x, y) • eˆ 1 (x, y))|u(x, y)|, where eˆ 1 (x, y) is the major direction of the saliency tensor obtained at the site (x, y).

FIG. 12. An example of surface inference in 3D.

64

LEE AND MEDIONI

FIG. 13. Encoding polarity saliency.

Extracting directed features. Directed features are those which have both high orientation and polarity saliencies. The natural way to measure directed feature saliency is to take the product of the unpolarized feature saliency and the polarity saliency. We thus assign to every site a directed curve saliency DCS(x, y) as

detected element a directed vector, normal to the oriented vector inferred there, with the convention that the more dense is to the left. This local discontinuity estimation then becomes similar to the orientation estimation for salient structure inference. The voting function Vb for boundary inference can be characterized as a radiant pattern with strength decays with distance from the center, as shown in Fig. 15b,

DFS(x, y) = λ1 · sgn(u(x, y) • eˆ 1 (x, y)) · |u(x, y)|, where λ1 and eˆ 1 (x, y) correspond to the stick component of the orientation saliency tensor obtained at site (x, y). Figure 14b shows the resulting directed curve saliency map for the pair of the directed circles that describe the filled disks in Fig. 14a. Compare to the curve saliency map shown in Fig. 14d obtained for a pair of hollow disks in Fig. 14a, the high agreement between the circles, which is acceptable for the hollow disk pair, but not the filled circle pair, is eliminated by properly incorporating the polarity information. Regions bounded by such directed features can be extracted by the method described in Section 3.3 for extracting undirected features. 6. REGION BOUNDARY INFERENCE Here we address the problem of inferring boundaries from point input. A typical input is shown in Fig. 16a, where boundaries correspond to a change in dot density. We seek to compute at every site in the domain space a measure we call boundary saliency, which relates the possibility of having the site being on the boundary of a region. A boundary point has the property that most of its neighbors are on one side. We therefore can identify boundary points by computing the directional distribution of neighbors at every data point, as illustrated in Fig. 15, and to find local maxima of this measure. For the purpose of grouping these boundary elements into curves, we then associate to each such

FIG. 14. Hollow and filled disks pairs.

V b (x, y) =

· ¸ 2 2 e−d(x,y) /σ x · , y d(x, y)

where d(x, y) = sqrt(x 2 + y 2 ) is the distance of the location from the center of the field, and σ is the scale factor that determines the rate of attenuation. Since a point on the boundary will only receive votes from one side while a point inside the region will receive votes from all directions, the norm of the vector sum of the polarized votes should indicate the “boundariness” of a point, that is, the boundary saliency. Figures 16b and c present the result of this boundary inference, using a boundary voting field with sigma = 12.5, on the 85 × 85 pixels 2D data set depicted in Fig. 16a. Observe that points are not labeled in absolute terms as borders or nonborders, but are simply casting a vote, as indicated by the boundary saliencies. Once the boundary points are identified, the problem is reduced to that of inferring directed curves. Using the boundary saliencies as the initial saliencies, and the corresponding directed orientation as the normal to the inferred vector with the convention mentioned above, the boundariness of the data points can be further verified by locating the surfaces/curves that describe the region boundaries. Figures 16d and e depict the computed directed curve saliency and junction saliency, using a curve voting field with sigma = 12.5, from the boundary information shown in Figs. 16b and c. The integrated description of the dot cluster is shown in Fif. 16f. Depicted in Fig. 17 and Fig. 18 are the results of applying our method to infer multiple junctions, curves, and region boundaries from noisy data in a 240 × 180 image and a 128 × 128 image, respectively, using voting fields with sigma = 15 in both cases. Notice that regions with very sparse data are not detected. As in all perceptual tasks, the detection of regions from sparse dot clusters are scale dependent. In this case, the proper choice

GROUPING REGIONS, CURVES, AND JUNCTIONS

FIG. 15. Boundary inference.

FIG. 16. Example of region inference from dot density.

FIG. 17. Point set with varying dot density.

65

66

LEE AND MEDIONI

FIG. 18. Salient features automatically detected by the method.

of scale is related to the density of points inside the region. The point set shown in Fig. 17a is a typical example that consists of regions at multiple scales. When the scale of the boundary voting function is larger than that of the region, the boundaries so detected are badly localized. When the scale of the boundary voting function is smaller than that of the region, the boundary is not detected at all. We believe this problem has to be solved by using multiple scales in detection, which is the subject of our current research.

also working on the distribution of our implemented system for further quantitative evaluation. As briefly described in Section 6, another subject of future research is to address the scale issue. In our method, scale is manifest as the extent of the voting mask, which is the only free parameter in our framework. We suspect that salient groups should occur across a range of scales, leading to an approach in which we would only compute saliency at a discrete set of scales, instead of a continuum. This is somewhat similar to the scale issues in edge detection.

7. CONCLUSION We have presented a unified computational framework for the inference of multiple salient structures such as junctions, curves, regions, and surfaces from a sparse, noisy set of points, curve elements, and surface patch elements. The methodology is grounded on two complementary aspects: • the data is represented as a tensor, which allows us to capture not only the local information, but also its associated confidence (saliency) and uncertainty; • the communication between sites is performed by voting, which is efficiently implemented as a convolution-like operation. The method is noniterative, requires no initial guess or thresholding, and the only free parameter is scale, which specifies the size of the neighborhood from which information is collected. We have demonstrated here the approch on a small number of synthetic examples in 2D and 3D. Our current research involves applying the methodology to different midlevel vision tasks, including edge detection, shape from stereo [16], image sequence analysis [7], and shape description. We also plan to conduct quantitative experiments, systematically, to evaluate the effectiveness of the method in terms of robustness and accuracy. By studying the reliable detection of simple patterns in varying amounts of noise, similar to the experiment shown in Fig. 19, we can then determine the breaking point of the method. We are

APPENDIX The voting function. The value of the tensor vote cast at a given location is given by the voting function V(S, p) which characterize the structure to be detected, where p is the vector joining the voting site pv and the receiving site p s , and S is the tensor at p v . Since all orientations are of equal significance, the voting function should be independent of the principal directions of the voting saliency tensor S. Since the immediate goal of voting is to interpolate smooth surfaces or curves, the influence of a voter must be smooth and, therefore, decreases with distance. Moreover, votes for neighboring collectors must be of similar strength and shape. Thus, although the voting function has infinite extent, in practice, it disappears at a predefined distance from the voter. While the voting function should take into account the orientation estimates of the voter, it is only necessary to consider each of the voter’s orientation estimate individually and then to use their combined effect for voting. Representing the voting function by discrete tensor fields. We describe in the following two properties of the voting function that allow us to implement this tensor voting efficiently by using a convolution-like operation. The first property is due to the requirement that the voting function is independent of the principal directions of the voting saliency tensor. For any voting

GROUPING REGIONS, CURVES, AND JUNCTIONS

67

FIG. 19. Experiment on a curve composed of 159 points embedded in various noisy backgrounds. For (a)–(f ), input is depicted in the upper image (256 × 256), and output saliency map (normalized individually), obtained by using σ = 18.25, is shown in lower image.

saliency tensor S = T(λ1 , λ2 , λ3 , θ, φ, ψ), the vote V(S, p) for a site p from the voter is V(T(λ1 , λ2 , λ3 , θ, φ, ψ), p) ¢ ¡ −1 −1 T(λ1 , λ2 , λ3 , θ, φ, ϕ), Rψφθ Rψφθ p = V Rψφθ Rψφθ ¡ ¢ −1 = Rψφθ V T(λ1 , λ2 , λ3 , 0, 0, 0), Rψφθ p .

(a)

The second property is due to linearity of the saliency tensor. Recall that every saliency tensor can be interpreted as being a

linear combination of a stick, a plate, and a ball as described in Eq. (3.2). If the voting function is linear with respect to the saliency tensor S, every vote can be decomposed into three components as ¡ ¢ ¡ ¢ V(S, p) = (λ1 − λ2 )V eˆ 1 , eˆ 1T , p + (λ2 − λ3 )V eˆ 1eˆ 1T + eˆ 2eˆ 2T , p ¡ ¢ + λ3 V eˆ 1eˆ 1T + eˆ 2eˆ 2T + eˆ 3eˆ 3T , p = (λ1 − λ2 )V(S1 , p) + (λ2 − λ3 )V(S2 , p) + λ3 V(S3 , p),

(b)

68

LEE AND MEDIONI

where S1 = T(1, 0, 0, θ, φ, ϕ), S2 = T(1, 1, 0, θ, φ, ϕ), and S3 = T(1, 1, 1, θ, φ, ϕ). Notice that S1 is a stick tensor, S2 is a plate tensor, and S3 is a ball tensor. Consequently, rather than dealing with infinitely many shapes of voting saliency tensors, we only need to handle three shapes of saliency tensors. Applying (a) to (b), ¡ ¢ ¡ −1 p V(S, p) = (λ1 − λ2 )Rψφθ V T(1, 0, 0, 0, 0, 0), Rψφθ ¡ ¢ −1 + (λ2 − λ3 )Rψφθ V T(1, 1, 0, 0, 0, 0), Rψφθ p ¡ ¢¢ −1 + λ3 Rψφθ V T(1, 1, 1, 0, 0, 0), Rψφθ p . (c) This means that any voting function linear in S can be characterized by three saliency tensor fields: namely V 1 ( p) = V(T(1, 0, 0, 0, 0, 0), p), the vote field cast by a stick tensor which describes the orientation [1 0 0]T ; V 2 ( p) = V(T(1, 1, 0, 0, 0, 0), p), the vote field cast by a plate tensor which describes a plane with normal [0 0 1]T ; and V 3 ( p) = V(T(1, 1, 1, 0, 0, 0), p), the vote field cast by a ball tensor. Since we only need to consider each of the voter’s orientation estimates individually, V 2 and V 3 in fact can be obtained from V 1 as Z π ¯ ¡ −1 ¢ Rψφθ V 1 Rψφθ p dψ ¯θ =0, φ=0 V 2 ( p) = 0

V 3 ( p) =

(d)

Z πZ 0

π 0

¯ −1 Rψφθ V 1 Rψφθ p dψ dφ ¯θ=0 . ¡

¢

Due to the fact that the voting function should be smooth and has a finite influence extent, we can use a finite discrete version of these characterizing saliency tensor fields to represent the voting function efficiently. The voting process. Given the saliency tensor fields which specify the voting function, for each input site saliency tensor, voting thus proceeds by aligning each voting field with the principal directions of the voting saliency tensor and centering the fields at the corresponding location. Each tensor contribution is weighted as described in Eq. (2). This process is similar to a convolution with a mask, except that the output is tensor instead of scalar. Votes are accumulated by tensor addition. The initial assessment of feature saliencies at each site is kept implicitly in the form of a strong vote for the site. As a result, all sites, with or without feature initially, are treated equally during the inference. ACKNOWLEDGMENT This research was supported by NSF under grants IRI-9423144 and IRI9811883.

REFERENCES 1. N. Ahuja and M. Tuceryan, Extraction of early perceptual structure in dot patterns: Integrating region, boundary, and component Gestalt, Comput. Vision Graphics Image Process. 48, 1989, 304–356.

2. A. Blake and A. Zisserman, Invariant surface reconstruction using weak continuity constraints, in Proc. Conf. Computer Vision and Pattern Recognition, Miami Beach, FL, 1986, pp. 62–67. 3. J. D. Boissonnat, Representation of objects by traingulating points in 3-D space, in Proc. 6th ICPR, 1982, pp. 830–832. 4. T. E. Boult and J. R. Kender, Visual surface reconstruction using sparse depth data, in Proc. Conf. Computer Vision and Pattern Recognition, Miami Beach, FL, 1986, pp. 68–76. 5. J. Dolan and R. Weiss, Perceptual grouping of curved lines, in Proc. Image Understanding Workshop, 1989, pp. 1135–1145. 6. P. Fua and P. Sander, Segmenting unstructured 3D points into surfaces, in ECCV 92, Santa Margherita Ligure, Italy, May 1992, pp. 676–680. 7. L. Gaucher and G. Medioni, Accurate motion flow estimation using a multilayer representation, in Interpretation of Visual Motion Workshop, CVPR98, Santa Barbara, June 1998. 8. D. Geiger, K. Kumaran, and L. Parida, Visual organization for figure/ground separation, in Proc. CVPR, San Francisco, CA, 1996, pp. 155–160. 9. G. H. Granlund and Knutsson, “Signal processing for computer vision,” Kluwer Academic, Dordrecht, 1995. 10. S. Grossberg and E. Mingolla, Neural dynamics of form perception: Boundary completion, illusory figures, and neon color spreading, Psychol. Rev. 92, 1985, 173–211. 11. G. Guy and G. Medioni, Inferring global perceptual contours from local features, Int. J. Comput. Vision 20, No. 1/2, 1996, 113–133. 12. G. Guy and G. Medioni, Inference of surfaces 3D curves, and junctions from sparse, noisy, 3-D data, IEEE Trans. Pattern Anal. Mach. Intell. 19, No. 11, 1997, 1265–1277. 13. F. Heitger, and R. von der Heydt, A computational model of neural contour processing: Figure-ground segregation and illusory contours, in Proc. European Conf. Computer Vision, Berlin, Germany, 1993, pp. 32–40. 14. H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and W. Stuetzle, Surface reconstruction from unorganized points, Comput. Graphics 26, 1992, 71– 78. 15. M. Kass, A. Witkin, and D. Terzopoulos, Snakes: Active contour models, Internat. J. Comput. Vision, January 1988, 321–331. 16. M. S. Lee and G. Medioni, Inferring segmented surface description from stereo data, in Proc. Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, June, 1998. 17. C. Liao and G. Medioni, Surface approximation of a cloud of 3-D points, CAD94 Workshop, Pittsburgh, PA. 18. W. E. Lorensen and H. E. Cline, Marching cubes: A high resolution 3-D surface reconstruction algorithm, Comput. Graphics 21, No. 4, 1987. 19. D. Marr, Vision: A computational investigation into the human representation and processing of visual information, Freeman, San Francisco, 1982. 20. P. Parent and S. W. Zucker, Trace inference, curvature consistency, and curve detection, IEEE Trans. Pattern Anal. Mach. Intell. 11, No. 8, 1989, 823–839. 21. T. Poggio and F. Girosi, A theory of networks for learning, Science 247, 1990, 978–982. 22. S. Sarkar and K. L. Boyer, Integration, inference, and management of spatial information using Bayesian networks: Perceptual organization, PAMI 15, No. 3, 1993, 256–274. 23. A. Sha’ashua and S. Ullman, Structural saliency: The detection of globally salient structures using a locally connected network, in Proc. Int. Conf. on Computer Vision, 1988, pp. 321–327. 24. J. Shi and J. Malik, Normalized cuts and image segmentation, in Proc. Computer Vision and Pattern Recognition, Jun. 1997, Puerto Rico, pp. 731– 737. 25. C. V. Stewart, MINPRAN: A new robust estimator for computer vision, IEEE Trans. Pattern Anal. Mach. Intell. 17, No. 10, 1995, 925–938.

GROUPING REGIONS, CURVES, AND JUNCTIONS

69

26. R. Szeliski, D. Tonnesen, and D. Terzopoulos, Modeling surfaces of arbitrary topology with dynamic particles, in Proc. CVPR, New York, NY, 1993, pp. 82–85.

29. L. R. Williams and D. Jacobs, Stochastic completion fields: A neural model of illusory contour shape and salience, in Proc. 5th ICCV, Cambridge, MA, Jun. 1995, pp. 408–415.

27. C.-K. Tang and G. Medioni, Inference of integrated surface, curve, and junction descriptions from sparse 3-D data, IEEE Trans. Pattern Anal. Mach. Intell. 20, No. 11, 1998. 28. D. Terzopoulos, A. Witkin, and M. Kass, Constraints on deformable models: Recovering 3D shape and nonrigid motion, Artif. Intell. 36, 1988, 91– 123.

30. S. W. Zucker, A. Dobbins, C. David, and L. Iverson, The organization of curve detection: Coarse tangent fields and fine spline coverings, in Proc. ICCV88, Tampa, Fl., Dec. 1988, pp. 568–577. 31. S. W. Zucker and R. A. Hummel, Toward a low-level description of dot clusters: Labeling edge, interior, and noise points, Comput. Vision Graphics Image Process. 9, No. 3, 1979, 213–234.