Pattern Recognition 33 (2000) 767}785
Hybrid stereo matching with a new relaxation scheme of preserving disparity discontinuity Kyu-Phil Han, Tae-Min Bae, Yeong-Ho Ha* School of Electronic and Electrical Eng., Kyungpook Nat'l University, Taegu 702}701, South Korea Received 5 November 1998; accepted 29 March 1999
Abstract A hybrid stereo matching algorithm using a combined edge- and region-based method is proposed to take the advantage of each technique, i.e. an exactly matched point and a full resolution disparity map. Region-based matching is typically more e$cient than edge-based matching, however, a region-based matcher lacks the capability of generating an accurate "ne resolution disparity map. The generation of such a map can be better accomplished by using edge-based techniques. Accordingly, regions and edges both play important and complimentary roles in a binocular stereo process. Since it is crucial that an e$cient and robust stereo system utilizes the most appropriate set of primitives, a nonlinear Laplacian "lter is modi"ed to extract proper primitives. Since each pixel value of a second-order di!erentiated image includes important information for the intensity pro"le, information such as edge-, signed-, and zero-pixels obtained by the modi"ed nonlinear Laplacian "lter, is used to determine the matching strategy. Consequently, the proposed matching algorithm consists of edge-, signed-, and zero- or residual-pixel matching. Di!erent matching strategies are adopted in each matching step. Adaptive windows with variable sizes and shapes are also used to consider the local information of the pixels. In addition, a new relaxation scheme, based on the statistical distribution of matched errors and constraint functions which contain disparity smoothness, uniqueness, and discontinuity preservation, is proposed to e$ciently reduce mismatched points in unfavorable conditions. Unlike conventional relaxation schemes, the erosion in an abrupt area of a disparity map is considerably reduced because a discontinuity preservation factor based on a survival possibility function is added to the proposed relaxation. The relaxation scheme can be applied to various methods, such as block-, feature-, region-, object-based matching methods, and so on, by modifying the excitatory set of the smoothness constraint function. Experimental results show that the proposed matching algorithm is e!ective for various images, even if the image has a high content of noise and repeated patterns. The convergence rate of the relaxation and the output quality are both improved. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Stereo matching; Edge- and region-based matching; Adaptive window; Relaxation; Smoothness and uniqueness constraints; Excitatory and inhibitory inputs; Disparity discontinuity preservation
1. Introduction In a pair of eyes, each eye receives slightly di!erent images of the world due to its distinct position. Di!erences between the left and right images create binocular
* Corresponding author. Tel.: #82-53-950-5535; fax: #8253-957-1194. E-mail address:
[email protected] (Y-H. Ha)
disparities. The Human Visual System (HVS) can detect and use these disparities to recover information about the three-dimensional structure of the scene being viewed. The stereo correspondence problem makes the explicit disparities of all points common to both images. A great deal of computer vision research has addressed this problem, because disparity contains useful information for various applications such as object recognition, inspection, and manipulation, etc. A range-sensing system for these tasks is often required for the accurate and e$cient provision of a complete
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 5 - 3
768
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
depth or disparity map for an entire "eld of view. Range"nding techniques can be loosely classi"ed as either active or passive. Active techniques utilize arti"cial sources of energy, such as ultrasonic and laser, to illuminate the workspace, however, passive techniques do not require such energy sources. Popular active techniques include: contrived lighting approaches and direct range "nders based on time-of-#ight measurements. Common examples of passive techniques include: stereo, both binocular and photometric: shape from shading, shape from texture, and focusing methods. The contrived-lighting approach involves illuminating the scene with a controlled lighting pattern and interpreting the projection of the pattern to derive a depth pro"le of the scene [1,2]. Such active illumination can be disadvantageous in an outdoor or hostile environment. In these situations, this method may also fail because of the specular re#ectivity of the objects appearing in the scene. Passive techniques for range sensing typically require a simpler and less expensive setup than active approaches. The binocular stereo approach falls into this category. Yet the disadvantages of this approach are that it requires many photometric assumptions and has a high computational cost. In the binocular stereo approach, the di!erence between the relative positions of two digital or digitized images taken from di!erent points is measured to "nd range information like HVS. The existing techniques for stereo matching are grouped into two categories. One is feature-based methods and the other is intensity- or area-based methods [3]. The feature-based methods use zero-crossing points [4], edges [5], line segments, corner points, and conics [6], etc. Since these types of primitives are relatively sparse in images, a complicated interpolation process including occlusion modeling and disparity continuity should be taken into account to obtain a full resolution disparity map; plus they require more careful and explicit matching rules to eliminate false targets. However, they do have accurate disparity values at the feature points. Marr and Poggio [7], Grimson [4], Frisby and Pollard [8], etc. used these primitives. Since intensity-based methods use dense low-level features and intensity values themselves, a feature extraction and an interpolation process are not necessary and a dense disparity map can be obtained; however, they are sensitive to noise and small intensity di!erences. Consequently, recently proposed enhancements of stereo approaches include a coarse-to-"ne strategy [4,9,10] and some constraints [4,8,11,12] such as uniqueness, ordering, and smoothness, etc. Other matching strategies using a windowed Fourier phase [13], segmented regions [14], wavelet transformed images [12], chromatic information [15], neural networks [11,16], and a multiple-baseline [17] have been studied. In this paper, a hybrid approach including an edgeand region-based matching method is proposed. The
proposed method includes the advantages of both edgebased methods, which give accurate matched points, and region-based methods, which can produce a full-resolution disparity map [14]. In order to extract the proper features for stereo matching, the nonlinear Laplacian "lter [18] is modi"ed and used. The nonlinear Laplacian "lter is more e$cient than the family of Gaussian "lters, because it has no multiplication and can be easily implemented by mathematical morphology operators such as dilation and erosion. The Modi"ed Nonlinear Laplacian (MNL) operator includes an odd Hierarchical Discrete Correlation (HDC) for fast "ltering [19], thresholding for weak edge elimination, region growing for strong edge linking, and an edge re"nement process. After MNL "ltering, zero-crossing points, positive-, negative-, and zero-regions are used as the matching primitives in the proposed algorithm. Then, three matching strategies are accomplished according to the type of pixel, i.e. zero crossing, signed-, and zero-pixels. Since the primitives are obtained using a second-order di!erentiation, they include important topological, information, i.e. edge (zero-crossing point), signed-, and zero-pixels imply a transition point, a convex or concave area, and a smooth area of the intensity pro"le, respectively. The size and shape of the windows are also important factors for signal matching [12,20]; thus locally adaptive windows are used in each matching step. In addition, a relaxation algorithm is proposed which can reduce false matches based on a distribution of matched errors and a possibility value subject to various constraints including uniqueness, smoothness, and a discontinuity preservation factor of the disparity. The general scheme of stereo matching is mentioned in Section 2. The proposed feature extraction "lter and the proposed stereo matching algorithm are illustrated in Section 3. Section 4 presents the feasibility of the proposed algorithm demonstrated through experimental results for synthetic and real scene images, plus the e$ciency of the proposed relaxation scheme is evaluated. Finally, the conclusion is given in Section 5.
2. Stereo matching scheme Stereo matching is typically achieved by "ve steps which include (1) image formation, (2) feature extraction, (3) feature identi"cation under some criteria such as similarity and consistency, (4) disparity calculation, and (5) calculation of the actual range according to camera geometry. The third step, which deals with matching or correspondence, is the most important part of the binocular stereo approach. All approaches for image matching follow these procedures but use di!erent image features, matching measures, and strategies. In general, since matching measures and matching strategies strongly depend on the attribute of features, the selection of
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
a matching strategy according to a feature is important [3]. There are some additional schemes such as interpolation, relaxation [21], and dynamic programming [22], etc. An interpolation scheme based on a consistency criterion can obtain a dense disparity "eld from a sparse feature map. Relaxation schemes are commonly applied to acquire more #exible solutions in complex optimization problems and take both similarity and consistency into account. However, since most relaxation methods are iterative, dynamic programming either assists a relaxation method to speed up convergence or optimizes the cost function. Therefore, possible multi-schemes and both intensity- and feature-based methods may be considered to obtain stable and accurate matching results.
3. The proposed matching algorithm The proposed stereo matching algorithm consists of feature extraction, three matching steps, and relaxation, as shown in Fig. 1. First, in order to extract features suitable for stereo matching, some processes that decrease matching ambiguities are added to the nonlinear Laplacian "lter. Since the characteristics of edge, signed, and residual pixels extracted by MNL are di!erent, varying strategies are applied in each matching step. Locally adaptive windows varying in size and shape are also considered. If a point with a minimum matching error is determined as a disparity of the pixel, a result that only considers similarity can be achieved. Accordingly, to acquire stable results, a relaxation scheme based on some constraints is inserted to consider both similarity and
Fig. 1. Block diagram of the proposed stereo matching algorithm.
769
consistency. The Mean of the Absolute Di!erences (MADs) of intensity obtained in each pixel matching, is normalized according to the size of the matching window and then transformed into a possibility value based on the statistical distribution of the MADs. Finally, a disparity is determined by the reciprocal action between the possibility of the current point and its neighbor possibility values. 3.1. Feature extraction using a modixed nonlinear Laplacian xlter The matching primitives, used as features, including the edge, positive, negative, and zero pixels for a 1-D signal are illustrated in Fig. 2. They all contain topological information on the intensity pro"le, such as smooth and transition areas. A new "lter has thus been designed to extract these matching primitives from an image. The "lter consists of four parts, including low-pass "ltering using an odd HDC, second-order di!erentiation with a nonlinear Laplacian "lter, weak-edge elimination using local variance and strong edge linking by region growing, and edge and region determination.
Fig. 2. The relation between an intensity and a feature pro"le. (a) Original intensity pro"le, (b) after low pass "ltering, (c) after "rst-order di!erentiation, (d) after second-order di!erentiation.
770
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
3.1.1. Odd hierarchical discrete correlation A mathematical problem is de"ned to be well-posed in the Hadamard sense if its solution both exists and satis"es uniqueness and continuity for the initial data. However, there are some ill-posed conditions of the sense in a di!erentiation operator. Accordingly, a regularization process has been studied to minimize this di!erentiation problem. Torre and Poggio [23] found that a regularized di!erentiation of image data could be performed by convolving the data with the "rst derivative of a cubic spline "lter, which is very similar to the Gaussian function. Generally, the stabilizing operator, such as the Gaussian or some other low-pass "lter, is convolved with the original image before the "rst- or the second-order di!erentiation. There is a method for computing correlations which is particularly well suited for image processing [19]. This method, called Hierarchical Discrete Correlation, or HDC, is computationally e$cient, typically requiring one or two orders of magnitude fewer computational steps than direct correlation or correlation computed in the spatial frequency domain using the Fast Fourier Transform (FFT) [19]. In addition, the method simultaneously generates correlations for kernels of many sizes. Some of these kernels closely approximate the Gaussian probability distribution, so that the correlation is equivalent to low-pass "ltering. The principle underlying the HDC is that the correlation of function with certain large kernels can be computed as a weighted sum of correlations with smaller kernels, and these in turn can be computed as weighted sums of correlations with still smaller kernels. The kernels at each iteration of the HDC computation di!er in size by a factor r, or the order of the hierarchical correlation. Let f (x) be a function de"ned only at integer values
of x. Also let w(x) be a discrete weighting function de"ned at integral x and nonzero for !m)x)m. The odd hierarchical discrete correlation is de"ned as a set of correlation functions g (x) which are obtained from f and l w as follows: g (x)"f (x), o m g (x)" + w(i)g (x#irl~1) for l'1. l l~1 i/~m
(1)
Function g is obtained from f through l recursions of l a correlation-like operation using the weighting function w(x). Thus l is the level of g (x) in the HDC. And g (x) is l l de"ned as the sum of k"2m#1 values of g (x) which l~1 are separated by multiples of the distance rl~1. This sample distance grows geometrically by the factor r from level to level, so r is the order of the HDC and k is called the width of the generating kernel. This odd HDC is illustrated graphically in Fig. 3. In order to insure convergence and low-pass "ltering, the generating kernel must satisfy four constraints including unimodality, symmetry, normalization, and equal distribution. When a+0.4, the best "t Gaussian is obtained [19]. Therefore, a"0.4, b"0.25, and c"0.05 are used as the weights of the generating kernel in this paper. 3.1.2. A nonlinear Laplacian xlter A discrete version of a one-dimensional di!erentiation can be generally represented as *l(m, n) "I(m#l, n)!I(m, n) or I(m, n)!I(m!1, n) *m (2)
Fig. 3. Graphical representation of an odd HDC. The generating kernel is shown as a pattern of arrows between successive levels, sample values at level l are weighted by a, b, c and summed to obtain the value of a single sample at level l#1. The order, r, is 2 in this example.
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
where I(m, n) denotes the gray level in a (m, n) point of an image. A nonlinear gradient [18] is de"ned by NG[I(m, n)]" max [I(m#k, n#l)]!I(m, n) (k,l)|W or I(m, n)! min [I(m#k, n#l)] (3) (k,l)|W where = denotes M]M windows. And k and l represent a searching range in row and column directions respectively. Its characteristics include nonsensitivity to noise and granularity. It detects valleys as well as edges. A nonlinear Laplacian [18] can be de"ned by N¸[I(m, n)] " max [I(m#k, n#l)]!I(m, n)!MI(m, n) (k,l)|W ! min [I(m#k, n#l)]N. (4) (k,l)|W Its implementation is very simple because there is no multiplication and its responses are integer values. It also has a close relation to the mathematical morphological gradient operator and can, therefore, detect a correct edge point due to unbiased characteristics [18]. 3.1.3. Weak edge elimination and region growing When an edge operator is convolved in an image, its response relates to the window size of the operator. In general, a very sensitive and noisy response occurs with a small window as shown in Fig. 4, so that matching
771
ambiguities are increased. With a large window, even if it is insensitive to noise and small intensity di!erence, the edge pixel is shifted as shown in Fig. 5. Consequently, the intensity pro"le will not match with the edge image. This problem is critical in signal matching. In order to prevent edge pixels from shifting, a di!erentiation operator with a small window size should be used and weak edges must be eliminated to reduce matching ambiguities. Since a strong edge point includes large edgeness and a notable variation of intensity, the elimination process can be conducted by local variance. Accordingly, a simple threshold technique is adopted to eliminate a weak edge that has a small local variance. However, one edge contour can be separated into several segments by thresholding as shown in Fig. 6 and thus it is di$cult to "nd a proper threshold. Therefore, to reduce the in#uence of the threshold value and satisfy the connectivity of an edge, a region growing process is inserted starting from the remaining pixels after thresholding. The spacee$cient two-pass labeling algorithm [24] is used as the region growing method. In this paper, a region is de"ned as a blob that has the same sign or value after secondorder di!erentiation. Since zero-crossing points are typically detected by a sign change in the di!erentiated image, a region that includes signed- or zero-pixels is important as well as an edge. Several experiments were conducted to "nd a proper threshold. It was shown that the mean is appropriate as the threshold because the variance is quite diverse according to the image characteristics. Fig. 4 shows the
Fig. 4. Second-order di!erentiated images of (a) `girla, (b) `lennaa, and (c) `pentagona processed by a conventional nonlinear Laplacian operator with a 5]5 mask. White, gray, and black blobs denote the zero-, negative-, and positive-regions, respectively. (d), (e), and (f ) are edge images extracted from (a), (b), and (c), respectively. The size of all three images is 256]256.
772
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 5. Region and edge images processed by a conventional nonlinear Laplacian operator with a 15]15 mask.
Fig. 6. Region and edge images after the elimination of weak edge points using the mean of local variances.
edge image obtained by a conventional NL operator with a 5]5 mask. Figs. 6 and 7 show images with eliminated weak point edges and processed in growing steps, respectively. 3.1.4. Merging of isolated zero-crossing points and edge determination After second-order di!erentiation, edges are typically formed at the intersection of two regions that have a different sign. However, after nonlinear Laplacian "ltering,
about 10% of all edge pixels are correctly quantized to zero between di!erent regions. Though these pixels are correct zero-crossing points and have accurate matching points, they rarely appear on the image. Therefore, these isolated points do not act as dominant pixels in the relaxation process. The reason is because there are no neighboring homogeneous pixels. In order to preserve these points in the latter relaxation process, these pixels are merged into the neighbor region that has the nearest value to zero. Fig. 8 shows the merging process. As
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
773
Fig. 7. Region and edge images after region growing.
Fig. 8. An example of isolated zero-crossing points merging. The signs of `#a, `!a, and `0a are the signs of the pixels after second-order di!erentiation, respectively. The circles denote the pixel which has the nearest value to zero among 8-neighbor pixels: (a) Before merging, (b) after merging.
a result, there are no isolated zero-crossing points on the entire edge. In conventional edge operators, edge determination only depends on the "ltered sign. In this paper, the pixel response as well as the sign is considered for edge determination. Thus, the pixel with the minimum value between two pixels with opposite signs is determined as the edge. An example of feature images obtained by a MNL "lter for matching is shown in Fig. 9. 3.2. Matching 3.2.1. Edge pixel matching General edge features such as direction and intensity are not the only features used in edge-based methods,
Fig. 9. Feature images of the `pentagona pair extracted by a MNL operator: (a) Left edge, (b) right edge, (c) left region, and (d) right region image, respectively.
as variable windows are also considered in edge pixel matching. The sign change of an edge has been used as a good feature, however, it often changes in occlusion boundaries. Consequently, sign is excluded from the feature set. Eight-directional compass operators based on a Sobel operator, as shown in Fig. 10, are used to "nd the direction of an edge pixel. The angle with the maximum response among eight masks is determined as the direction. Then, edge pixel matching is performed. Several
774
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 10. Eight-directional compass operator.
Fig. 12. An example of making a matching mask in a signed region. `0a, `na, `pa, `ea, and `#a denote zero, negative, positive, edge pixels, and the center of the mask, respectively: (a) Feature image, (b) the generated mask. Fig. 11. Windows used in edge pixel matching. The sign `#a indicates the center of the window and the degree shows the direction of the edge.
points of the target image, which lie within the search range and have a !1 to #1 di!erence of direction, are selected as matching candidates. Okutomi and Kanade [20] simulated the relation between mask size and signal variance relative to noise ratio in a matching environment. They concluded that a small window is more appropriate for a disparity change region and a large window is more appropriate for a disparity smooth region. In general, since disparity changes are detected by intensity changes, it is assumed that disparity changes may or may not occur at an intensity edge. Thus, a small window is more e$cient than a large one in edge pixel matching. 3]5 windows, shown in Fig. 11, are used in this matching step. 3.2.2. Signed pixel matching There are signed pixels around an edge after secondorder di!erentiation and the intensity slopes of the pixels are either monotonous or #at. Though a pixel is located in a #at zone, the sign of the pixel "ltered by a Laplacian operator may not be zero since the window of the di!erential operator can include an inhomogeneous pixel beyond the edge. Since occluded areas generally exist in a disparity discontinuity region and the disparity discontinuity matches with the edge, a mask composed of pixels homogeneous to the center pixel is e$cient in this matching step. Fig. 12 shows a region shape as considered by a mask generation. Since the intensity of a zero region adjacent to a signed region is similar to that of a signed one, the pixel situated in a zero region is included in the mask generation. A 9]9 window whose origin point is at the center is used in the generation. Fig. 12(b) shows an example of the generated mask, which includes in the feature image pixels with the same sign as the center pixel with a zero value. When the size of the generated mask is smaller than 20 pixels, a 7]7 square mask is used in the
Fig. 13. Three-dimensional relaxation structure.
pixel matching to accommodate the insensitiveness of small variances. 3.2.3. Residual pixel matching After edge and signed pixel matching, the residuals are zero pixels. They exist far away from the edge and those intensity "gures are very smooth. If small windows are used in this matching step, the matching response will be sensitive to small di!erences. Thus, the size of the matching window should be large enough [20]. Square type windows varying only in size from 7]7 to 11]11 are used. The minimum MAD among the windows is selected as the matched error in each disparity. 3.3. Relaxation MADs, selected in each matching step, are normalized by the window size and they are stored into the relaxation structure as shown in Fig. 13. In the relaxation stage, the normalized MADs are transformed into initial possibilities and the possibilities are updated by three constraint functions. Finally, the point with the
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
775
A B
(5)
maximum possibility is determined as the disparity value after several iterations.
x2 x2 f (x)" exp ! , x'0 x p2 p2 and
3.3.1. Transformation In order to assign a possibility to each MAD, which denotes a selected grade as the correct matching point, the distribution of the matched di!erences should be considered. However, since the distribution varies with image characteristics, the analysis of the distribution in each matching is laborious work. Thus, the MADs obtained in several images are approximated to a certain distribution for a fast transformation. If MADs and squared MADs are accumulated while the matching proceeds, the approximate distribution of MADs can be calculated. Let X be the random variable of MAD, the variance of MADs becomes E[X2]!(E[X])2. Then the distribution is represented by the variance. It is approximated to the Rayleigh distribution from some experiments. Fig. 14 shows examples of actual MADs and the bold curves are the approximated Probability Distribution Functions (PDF). From these examples, it can be observed that the approximation of the distribution is a reasonable one. Also, it has the advantage that the Rayleigh distribution can represent a Laplacian to Gaussian probability distribution function by varying its variance. The probability distribution function and the Cumulative Distribution Function (CDF) of Rayleigh are expressed as
A B
x2 F (x)"1!exp ! , x'0, x 2p2
(6)
respectively. A CDF is used to transform the MADs into possibility. Since the value with the smaller di!erence must be mapped to the higher possibility, the transformation function, h (x), can be de"ned as d h (x)"1!F (x) d x
C A BD A B
x2 "1! 1!exp ! 2p2 x2 "exp ! . 2p2
(7)
All MADs, saved in the 3-D relaxation structure as shown in Fig. 13, are transformed into possibilities by Eq. (7). Fig. 15 shows the curve-"tted PDF using the Rayleigh distribution, the CDF, and the transformation function of `pentagona, respectively. 3.3.2. Update possibility As mentioned above, possibilities transformed by Eq. (7) are recursively updated by neighbor values. The
Fig. 14. Examples of the distribution of MADs using the proposed matching algorithm. (a) `Pentagona, (b) `Stripea, (c) 30% random dot stereogram which has `0a or `255a gray level. The bold lines are a curve-"tted graph of the actual distribution.
776
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
will be increased according to the amplitude of the mean value. If the center pixel is located in a region R , the c smoothness function is represented as
C
D
1 m m F (i, j, k)"w + + % (i, j, k) t s s N e i/~m j/~m
(i, j)O(0, 0)
and (i, j)3R , c
(9)
where w , N , and m are the weighting constants of the s e smoothness, the number of excitatory inputs, and the search range, respectively. The uniqueness implies that a pixel must be matched with one point. Therefore, other nodes on the disparity axis have to act exclusively with one another. Consequently, the point with the maximum possibility will remain.
C
D
1 m F (i, j, k)"!w + % (i, j, k) u u N t i k/~m
Fig. 15. An example of PDF, CDF, and a transformation function: (a) The PDF and CDF of Fig. 14(a), (b) and its transformation function.
possibility of a current node is updated by three constraint functions which include smoothness, uniqueness, and discontinuity preservation. The updating rule is similar to the cooperative algorithm of Marr and Poggio [7]. However, in the proposed method, a disparity discontinuity term is added to the updating function to preserve an edge of disparity. Let %, F , Fu, and F denote the possibility, smooths d ness, uniqueness, and discontinuity preservation functions, respectively. The possibility of each node in the next iteration is represented as % (i, j, k)"% (i, j, k)#F (i, j, k)#F (i, j, k) t`1 t s u #F (i, j, k), (8) d where i, j, k, and t represent row, column, disparity axis of the relaxation structure, and iteration number, respectively. Therefore, a possibility value in the next iteration is determined by the sum of the previous possibilities and the three constraint functions. Since disparities in a region are similar to one another, the smoothness function is strongly dependent on the region map, or the intensity pro"le. Thus, it has to be excited by the possibility of the pixel which is located in the same region as the center pixel on the image plane. If the mean value of the neighbor possibility included in the excitation set is large, the possibility of the current node
kO0
(10)
is de"ned as the uniqueness function, where w and N are u i the weighting factor and the number of inhibitory input, respectively. This reduces the possibility of the current node and relates only to the disparity axis. The last term of Eq. (7) is the discontinuity preservation function. It assigns a survival possibility to a node according to the appearance of possibilities in the current state to preserve edge points on a disparity map from erosion caused by consecutive iterations. Since this term has not been used in conventional relaxation algorithms, it is, therefore, di$cult to determine the weights of each constraint function and a proper stop condition to avoid over-smoothing. Thus, heuristic approaches are used in the determination. Generally, if there are many nodes surrounding the current node that have the maximum possibility on the disparity axis, it must have a positive survival possibility. A simple threshold function,
G
w Nl (i, j, k)*¹, k F (i, j, k)" d d !w Nl (i, j, k)(¹, d k
(11)
is used as the discontinuity preservation function. w , d Nl and ¹ are the weight, the number of nodes which k belong to the excitatory set and have the maximum possibility on k-direction at t-iteration, and the threshold value, respectively. When the center pixel of an odd window is in a corner point as shown in Fig. 16(a), in order to preserve the center point, the threshold is set as ¹"M(=!1)/2#1N2"(m#1)2,
(12)
where = represents the length of the window. If there are fewer pixels than the threshold, the possibility of the center node is going to decrease. Therefore, F (i, j, k) acts d as either an excitatory or an inhibitory input according to the appearance of neighbor possibilities. Plus, Nl k should be considered as in the same region as that of the
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
777
Fig. 16. Examples of excitation sets having a positive survival possibility when = is 3. The black boxes are the pixels which have the maximum possibility in each disparity direction.
applied to various matching algorithms such as block-, edge-, and region-based methods. For instance, if a block matching algorithm is used in the matching, the excitation set includes all the pixels within the block. Also, an edge-segment, region, and object, etc. can compose the excitatory inputs. Temporary disparity maps are calculated in each iteration to check the termination of the recursive process. The maps consist of the disparity point which has the maximum possibility along the disparity axis in the iteration. When there are no isolated spike pulses on the map, the iteration process is terminated.
4. Experimental results The proposed relaxation scheme and matching algorithm were tested by arti"cial and real scene stereoscopic images. Arti"cial images, with random and repeating patterns, and real scene images were used. The relaxation scheme was compared with the cooperative algorithm proposed by Marr and Poggio [7]. In the experiments with the matching algorithm, three methods classi"ed by the type of matching window and the presence of relaxation scheme were compared. 4.1. The relaxation scheme
Fig. 17. A region map and connection diagram where the search range is 2: (a) Region map, (b) connection diagram of (a).
Figs. 18 and 19 are Random Dot Stereograms (RDS) and Fig. 20 is the `stripea image pair. Table 1 shows the properties of these stereograms and the matching information. The updating rule of the cooperative algorithm was de"ned as
G
C(n`1)"p xyd center pixel. As shown in Fig. 16, the number of the maximum possibility node is important, not the pattern. Since both smoothness and the discontinuity preservation function relate to excitatory inputs, except for uniqueness, the connection diagram of a current node is depicted as shown in Fig. 17. The excitatory inputs can be modi"ed by a matching method, so that the proposed relaxation algorithm is
H
+ C(n) !e + C(n) #C(0) , x{y{d{ x{y{d{ xyd x{y{d{|S(xyd) x{y{d{|0(xyd) (13)
where C(n`1) represents the state of the node at position xyd (x, y) with disparity d at iteration n#1, S and O denote the excitation and inhibition set, e is the inhibition constant, and p is a sigmoid function. The results of the cooperative algorithm are shown in Figs. 21}23, when e is 2.0. The results are plotted before over-smoothing. If
778
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
p, e, C(0) and both S and O are carefully selected, the xyd outputs were better than those in Figs. 21}23. However, erosion due to smoothing at the disparity edge area was not eliminated. For the comparison of the relaxation scheme with Marr and Poggio's algorithm, matching strategies were not used and a simple Block Matching Algorithm (BMA) using a 3]3 window was used to "nd the corresponding point in this experiment. The results for arti"cial images are shown in Figs. 24}26. Disparities in each iteration were determined by the possibility which had the maximum value along the disparity axis and were displayed
Fig. 18. 30% random dot stereogram. 10% of the dots of the right image are randomly decorrelated.
Fig. 19. 50% random dot stereogram. 20% of the dots of the right image are randomly decorrelated.
Fig. 20. `Stripea image pair with white Gaussian noise (p"50).
Fig. 21. Results of the cooperative algorithm for a 30% random dot stereogram. Iteration number is (a) 0, (b) 1, (c) 6, and (d) 9.
Table 1 The properties of arti"cial stereograms and matching information. Image Item
30% RDS
50% RDS
`Stripea image
Size Noise type Actual disparity Searching range Matching method Excitatory set
128]128 Random noise (10%) 0}3 !4}8 BMA (3]3) All pixels within the considered range 0.1, 0.1, 0.1
128]128 Random noise (20%) 0}3 !4}8 BMA (3]3) All pixels within the considered range 0.1, 0.1, 0.1
128]128 Gaussian noise (p"50) 0}2 !4}8 BMA (3]3) All pixels within the considered range 0.1, 0.1, 0.1
Weights (w , w , w ) s u d
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
779
Fig. 22. Results of the cooperative algorithm for a 50% random dot stereogram. Iteration number is (a) 0, (b) 1, (c) 6, and (d) 8.
Fig. 23. Results of the cooperative algorithm for the `stripea image pair. Iteration number is (a) 0, (b) 10, (c) 11, and (d) 12.
Fig. 24. The proposed relaxation results for a 30% random dot stereogram. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
with an intensity and height map. Since a small window was intentionally used in the matching, the initial results were very noisy as shown in Fig. 24(a), 25(a), and 26(a). The line patterns of the intensity map in Fig. 26(a) represent mismatched points which were matched to the next stripe, but they gradually disappear in the map through repeated iteration. Some experiments were executed to check the in#uence of the discontinuity preservation function. Figs. 27 and 28 show the results for a 30%
random dot stereogram without the discontinuity function. When this function is not used, the weight of the function and the iteration number must be carefully determined because oscillation or over-smoothing may occur in a disparity map according to the amplitude of the weight. However, if the discontinuity preservation function is inserted to the updating rule, the output is insensitive to the weights and over-smoothing will not occur after more than 100 iterations. When both w and s
780
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 25. The proposed relaxation results for a 50% random dot stereogram. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
Fig. 26. The proposed relaxation results for the `stripea image. Iteration number is (a) 0, (b) 2, (c) 5, and (d) 20.
w were between 0.1 to 0.3, there were few di!erences in u the results. 4.2. The proposed matching algorithm In the experiments with the arti"cial images, three methods were compared to evaluate the proposed matching algorithm. The methods were as follows. Method 1: Using a "xed size window (3]3) with relaxation, Method 2: Using a variable size window (3]3 to 11]11) without relaxation,
Method 3: Using a variable size window with relaxation (the proposed algorithm). In general, the intensity-based methods are similar to method 1 except for the di!erence of the relaxation scheme. Method 1 is the same as the proposed relaxation scheme as mentioned in the above section (Figs. 24}26). Since it is impossible for a random dot stereogram to use edge and region information, variable square windows, changing only in size, are used in the random dot stereogram matching. The results for methods 2 and 3 are shown in Figs. 29}32. To numerically compare the matching results, the Mean of the Squared Error (MSE) and the Sum of the Squared Error (SSE) between the true
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
781
Fig. 27. The results for a 30% random dot stereogram without the discontinuity preservation function (3]3 window). Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30 (w "w "0.05). s u
Fig. 28. The results for a 30% random dot stereogram without the discontinuity preservation function (3]3 window). Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30 (w "w "0.4). s u
disparity and the estimated one were used as the distance measure. N SSE" + (d !d) )2 i i i/1 and
(14)
1 N MSE" + (d !dK )2 (15) i i N i/1 are the measures, where N, d , and dK denote the number i i of disparity, true disparity, and the estimated one, respec-
tively. The MSEs and SSEs of the three methods are shown in Table 2. The initial disparity maps of method 3 are the same as those of method 2. The `beara and `pentagona image pairs were used for real scene stereo image matching. The images are shown in Figs. 33 and 34. Table 3 shows the parameters of the images and Figs. 35 and 36 are the results of the proposed matching method. There are some mismatched points, but stable outputs were obtained to a certain degree. As compared with Lee [11] and Kim's method [12], both the bookstand in the `beara image and the bridge in the `pentagona image disappeared using Lee's method and
782
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 29. The results using method 2: (a) 30% random dot stereogram, (b) 50% random dot stereogram, (c) `Stripea image pair.
Fig. 30. The results for a 30% random dot stereogram using method 3. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
the bookstand was not seen using Kim's method. The bookstand, ball, and bear appeared in the result of the `beara image pair. On the right-top side of the `pentagona image, the bridge became visible.
5. Conclusion A hybrid approach for a stereo matching method based on edge and region information was proposed. A modi"ed N¸ operator, including a HDC, N¸ "lter, and weak edge elimination, etc., was used to extract proper matching primitives. According to the type of the current pixel in feature image, di!erent matching strat-
egies using variable windows were applied to the pixel matching. The local information of input images was considered. To acquire more stable results under similarity and consistency constraints, normalized MADs, obtained in a matching step, were transformed into possibilities. Final disparities were determined by the reciprocal actions of neighbor possibilities in the relaxation step. Unlike conventional relaxation schemes, the proposed relaxation algorithm did not only use disparity smoothness and uniqueness, but also introduced a disparity preservation factor. Because of the use of the preservation factor, the erosion in abrupt areas of a disparity map was considerably reduced. In addition, the proposed relaxation could be applied to block-, feature-, region-, and
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
783
Fig. 31. The results for a 50% random dot stereogram using method 3. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
Fig. 32. The results for the `stripea image. Iteration number is (a) 0, (b) 2, (c) 5, and (d) 20. Table 2 Matched errors of the three methods for synthetic images Method 1
Method 2
Method 3
Image
SSE
SSE
MSE
SSE
MSE
30% RDS 50% RDS `Stripea image pair
1094 0.067 1375 0.084 1048 0.064
2149 0.131 3012 0.184 1220 0.074
703 754 682
0.043 0.046 0.042
Table 3 Parameters of real scene stereograms and the matching information Image
MSE
segment-based matching methods by modifying the excitation set. In experiments using the proposed matching algorithm for random dot stereograms with a random pattern, the `stripea image with a repeating pattern, and
Item Size Noise type Actual disparity Searching range Matching method Excitatory set
`Beara image
200]200 None About 0 to 10 !15 to 25 The proposed Within the same region and the considering window Weights (w , w , w ) 0.1, 0.1, 0.1 s u d
`Pentagona image 512]512 None About !15 to 15 !25 to 25 The proposed Within the same region and the considering window 0.1, 0.1, 0.1
784
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
indoor and outdoor images, stable outputs were obtained.
Acknowledgements This work was partially supported by the Korea Research Foundation under grant number 1997-001E00374. Fig. 33. `Beara image pair.
Fig. 34. `Pentagona image pair.
Fig. 35. The result map of the `beara image.
Fig. 36. The result map of the `pentagona image.
References [1] A.C. Kak, Handbook of Industrial Robotics, Chapter on Depth Perception for Robots, Wiley, New York, 1985. [2] G. Stockman, S. Chen, G. Hu, N. Shrikhande, Recognition of rigid objects using structured light, in Proceedings of 1987 IEEE International Conference on Systems Man and Cybernetics 1987, pp. 877}883. [3] R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, part 2, 1992, (Chapter 16). [4] W. Eric L. Grimson, Computational experiments with a feature based stereo algorithm, IEEE Trans. Pattern Anal. Mach. Intell. 7 (1) (1985) 17}34. [5] G. Medioni, R. Nevatia, Segment-based stereo matching, Compu, Vision Graphics Image Process. 31 (1985) 2}18. [6] Song De Ma, Conics-based stereo, motion estimation, and pose determination, Int. J. Comput. Vision 10 (1) (1993) 7}25. [7] D. Marr, T. Poggio, Cooperative computation of stereo disparity, Science 194 (1976) 283}287. [8] J.P. Frisby, S.B. Pollard, Computational Issues in solving the stereo correspondence problem, Computational Models of Visual Processing, part 7, 1990, pp. 331}357 (Chapter 22). [9] D. Marr, T. Poggio, A computational theory of human stereo vision, Proc. Roy. Soc. London B204 (1979) 301}328. [10] D. De Vleeschauwer, An intensity-based, coarse-to-"ne approach to reliably measure binocular disparity, CVGIP: Image Understanding 57 (2) (1993) 204}218. [11] Jun-Jae Lee, Jae-Chang Shim, Yeong-Ho Ha, Stereo correspondence using hop"eld neural network of new energy function, Pattern Recognition 27 (1994) 1513}1522. [12] Yong-Suk Kim, Jun-Jae Lee, Yeong-Ho Ha, Stereo matching algorithm based on modi"ed wavelet decomposition process, Pattern Recognition 30 (1997) 929}952. [13] John (Juyang) Weng, Image matching using the windowed fourier phase, Int. J. Comput. Vision 11 (3) (1993) 211}236. [14] S.B. Marapane, M.M. Trived, Region-based stereo analysis for robotic applications, IEEE Trans. Systems Man Cybernet 19 (1989) 1447}1464. [15] J.R. Jordan, A.C. Bovik, Using chromatic information in edge-based stereo correspondence, CVGIP: Image Understanding 54 (1) (1991) 98}118. [16] A. Knotanzad, A. Bokil, Y.W. Lee, Stereopsis by constraint learning feed-forward neural networks, IEEE Trans. Neural Networks 4 (1993) 332}342. [17] M. Okutomi, T. Kanade, A multiple-basedline stereo, IEEE Trans. Pattern. Anal. Mach. Intell. 15 (4) (1993) 353}363.
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785 [18] L.J. Van Vliet, I.T. Young, A nonlinear laplace operator as edge detector in noisy images, Comput. Vision Graphics Image Process. 45 (1989) 167}195. [19] P.J. Burt, Fast "lter transforms for image processing, Comput. Graphics Image Process. 16 (1981) 20}51. [20] M. Okutomi, T. Kanade, A locally adaptive window for signal matching, Int. J. Comput. Vision 7 (2) (1992) 143}162. [21] Kyeong-Hoon Do, Yong-Suk Kim, Tae-Uk Uam, YeongHo Ha, Iterative relaxational stereo matching based on
785
adaptive support between disparities, Pattern Recognition 31 (8) (1998) 1049}1059. [22] Shing-Huan Lee, Jin-Jang Leou, A dynamic programming approach to line segment matching in stereo vision, Pattern Recognition 27 (8) (1994) 961}986. [23] V. Torre, T. Poggio, On edge detection, IEEE Trans. Pattern. Anal. Mach. Intell. PAMI-8 (1986) 147}163. [24] R.M. Haralick, L.G. Shapiro Computell and Robot Vision, part 1 (1992) 37}48.
About the Author*KYU-PHIL HAN received the B.S. and M.S. degrees in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1993 and 1995, respectively, and is currently a Ph.D. student in the Department of Electronic Engineering of Kyungpook National University. He was a Researcher at the SindoRicoh Advanced Institute of Technology from 1995 to 1996. He was awarded a bronze prize in the 5th Samsung Humantech Thesis competition in February 1999. His main interests are in digital image processing, 3-D image compression, and computer vision. About the Author*TAE-MIN BAE received the B.S. and M.S. degrees in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1996 and 1998, respectively, and is currently a Ph.D. student in the Department of Electronic Engineering of Kyungpook National University. His main interests are in 3-D image compression and computer vision. About the Author*YEONG-HO HA received the B.S. and M.S. degrees in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1976 and 1978, respectively, and Ph.D. degree in Electrical and Computer Engineering from the University of Texas at Austin, TX, 1985. In March 1986, he joined the Department of Electronic Engineering of Kyungpook National Univeristy, as an Assistant Professor, and is currently a Professor. He served as TPC co-chair of 1994 IEEE International Conference on Intelligent Signal Processing and Communication Systems and he is now chairman of IEEE Taegu section. His main research interests are in image processing, computer vision, and video signal processing. He is a member of IEEE, Pattern Recognition Society, IS&T, Institute of Electronics Engineers of Korea and Korean Institute of Communication Sciences.