A modified stereo matching algorithm suitable for implementation on a convolution specialized hardware

A modified stereo matching algorithm suitable for implementation on a convolution specialized hardware

Pattern Recognition Letters 13 (1992)523-528 North-Holland July 1992 A modified stereo matching algorithm suitable for implementation on a convoluti...

440KB Sizes 4 Downloads 134 Views

Pattern Recognition Letters 13 (1992)523-528 North-Holland

July 1992

A modified stereo matching algorithm suitable for implementation on a convolution specialized hardware E. Stella, A. Distante, G. A t t o l i c o a n d T. D ' O r a z i o Istituto Elaborazione Segnali ed hnmagini, C.N.R., Via Amendola 166/5, 70126 Bari, Italy

Received 2 July 1991

Abstract Stella, E., A. Distante, G. Attolico and T. D'Orazio, A modified stereo matching algorithm suitable for implementation on a convolution specialized hardware, Pattern Recognition Letters 13 (1992) 523-528. A new algorithm for stereo matching is presented. It is a modified version of the Marr-Poggio-Grimson technique for the stereo corresponding problem. Our algorithm is a coarse-to-fine strategy (like the Grimson technique) whose steps are formulated in terms of several convolution operations (including the matching phase) making the algorithm suitable for implementation on a convolution-oriented board.

Keywords. Image processing, stereo matching, 3-D map.

1. Introduction The advancements in machine vision technology have been substantial in recent years with the introduction of faster processors and the improvements in sensor technology. A depth map can be obtained with both direct and indirect methods. The first ones recover depth directly from ranging devices. The second ones recover 3-D information by cues extracted from the intensity image of the observed scene. In the latter class fall stereo techniques. The binocular stereo correspondence problem is the problem of matching two images of the

The work was supported by "Progetto Finalizzato Robotica" of the C.N.R. (Italy).

Correspondence to: E. Stella, lstituto Elaborazione Segnali ed lmmagini, C.N.R., Via Amendola 166/5, 70126 Bari, Italy.

same scene from different viewing positions. Once the position of the same physical surface point is known in both images, it is possible to recover its depth value. A large number of stereo algorithms have been developed in the literature and in particular two different approaches have been identified: region-based and edge-based [2,3,4]. The region-based techniques have the advantage to be more reliable but have a coarse spatial resolution. The edge-based techniques can be more precise but generally produce a sparse depth map. However, both methods when implemented on a conventional hardware have an impracticable computing time for real-time applications. The use of advanced specialized hardware is suitable to achieve real-time response. In this paper we present an implementation of a modified version of the stereo model proposed by D. Mart and T. Poggio. The aim of this algorithm is to reformulate each step of

0167-8655/92/$05.00 © 1992 -- Elsevier Science Publishers B.V. All rights reserved

523

Volume 13. Number 7

PATTERN RECOGNITION LETTERS

the Marr-Poggio technique in terms of convolution-based operations (especially the matching step). In this way the whole algorithm can be implemented using a convolution-specialized board permitting to speed up the execution time without to use expensive (in cost) hardware (such as the Connection Machine) [5,6]. In Section 2 we describe the stereo matching algorithm, and in Section 3 the results on real data are shown.

2. Background The Marr-Poggio algorithm for stereo matching is one of the most commonly accepted methods to recover 3-D information from stereo pair images. The resulting depth map is called 'sparse' because the range information is provided only in a few points of the observed scene. The method is based on a coarse-to-fine strategy consisting generally of some steps. For each step an edge extraction is executed on the left and right images, then a matching phase between the resulting images permits the detection of the disparities. The edge detection is very important for the matching phase, it is well known that intensity changes occur at different scales and so their detection requires the use of operators of different sizes. Marr and Hildreth [I] argued that the most satisfactory operator fulfilling these conditions is the filter F2G, whose size is determined by the value of w in the Gaussian function (w=21/2¢). Thus both stereo images are observed by channels with different resolution. At the coarse level, big variations in intensity levels are extracted using a Gaussian with a large w. Then the disparity produced at this level is used as offset at the lower level, where a Gaussian with a smaller w than the previous one is used. The process is iterated through some levels in order to produce the final sparse depth map. The bottleneck of the approach is clearly the matching phase. In fact this is a merely sequential process, where each feature (i.e., the zero-crossing points) in the right image is searched for the corresponding one in the left image under the epipolarity condition and the search space is limited by the spatial dimension of the Gaussian (the zero-crossing separability theorem). 524

July 1992

3. The modified stereo algorithm We have reformulated the whole process in terms of convolution operations, making the whole approach suitable for implementation on a specialized board (convolution board). The new formulation of the algorithm is still a coarse-tofine strategy like the previous one. The first stage is still the filtering of the stereo pair by the Laplacian of Gaussian in order to extract the edge points. This filtering operation takes place convolving the two intensity images with a digital representation of the p'2G operator. In order to speed up the matching phase, a particular data representation is supplied. In fact, the determination of well defined kernels allows us to express the time-expensive steps of the matching process in terms of convolutions. So for each step of the process we transform the two filtered images. The left and right images are managed differently. The right image is obtained in this way: e--

R(i,j)=

+1

for a zero-crossing positive-negative,

-1

for a zero-crossing negative-positive,

0 otherwise. The left image is split in two:

Lp(i, j) =

Ln (i, j ) =

I f

+I 0

for a zero-crossing positive-negative, otherwise.

+ 1 for a zero-crossing negative-positive, 0

otherwise.

Denoting wc the size of the current step, the Grimson implementation searches for each point nonzero in R(i,j), the corresponding point in the left filtered image in the range [ - w c, + we] centered on position (i,j + D p) where D p is the disparity value from the previous step. In our algorithm, if the examined point R(i,j) is + 1, we will consider the representation Lp, otherwise we will consider the representation L n. In this way we are able to match the zero-crossings of the same sign. Now we

Volume 13, Number 7

PATTERN RECOGNITION LEFTERS

are interested in those candidates from the appropriate left image representation, that are unique in the range I - w c/2, + Wc/2] centered on position j of the same scan line i (epipolarity assumption). For this purpose we define a I-D kernel K~ of Wc elements, all equal to l, and then we obtain another representation as follows: L! = Lp*K~,

L2 = Ln * K ! .

The element at the position (i,j) of L ! or L 2 will represent the count of zero-crossings in the range ( j - Wc/2, j + Wc/2). Of course we are interested in those values which equal 1. Now we need to produce the representation of the possible disparity values to be used in the case of correct matches. For this purpose we define another I-D kernel as follows: '- - Wc/2- 1,..., - 1 for d = 1,2,..., Wc/2- 1, K d p --

0

1, 2, ..., Wc/2- 1

for d=wc/2, for d= Wc/2 + 1, Wc/2 + 2,..., Wc.

Convolving L! and Kdp we obtain a frame, say Lid, whose element at the position (i,j) is the disparity value to be associated if and only if the point R ( i , j ) matches correctly the point L ( i , j + DP). The same step is performed on L~ obtaining the frame L2d. Lid and L2d can be combined in one frame, say Lo. Summarizing the steps above, the following information is available to start the matching: R L nLpLd-

zero-crossing for right image, negative zero-crossing for left image, positive zero-crossing for left image, frame of possible disparities.

Using this information, the matching can be performed as follows: for each channel size Wc for each i, j : if R ( i , j ) = +1 then if Lp(i, j + D p) = 1 then D c ( i , j ) = D p + Ld(i,j + D p) endif endif

July 1992

if R ( i , j ) = - 1 then if Ln(i, j + D p) -- 1 then Dc(i, j ) = D p + L d ( i , j + D p) endif endif After these steps, the map stored in D c represents the disparity obtained for the current channel wc, and must be used for the next channel. D p is the disparity at the previous step. For the first step D p depends on the stereo pair camera setup (e.g., parallel or convergent optical axes).

4. Experimental results and discussion The algorithm, which is a coarse-to-fine strategy at 3 steps, has been implemented on a VAX 6310 and its results have been compared with the Grimson algorithm implemented on the same VAX, too. In order to evaluate the execution time, we have to remember that both algorithms must execute 3 two-dimensional convolutions (w=32, 16,8) with a mask of 3.5w pixels (in our case: (112 x 112), (51 x 51), (27 x 27) pixels, respectively) for each image of the stereo pair. Besides, after each 2-D convolution, our algorithm performs 6 1-D convolutions of dimension w on the left image and the final sequential phase, while Grimson's algorithm performs the matching phase. In order to evaluate the algorithm, a synthetical stereo pair is considered. In Figure 1 a synthetical stereo pair of two squares is shown. The leftmost square has a dispariy of 10 pixels, and the rightmost square has a disparity of 20 pixels. In Figure 2 the disparity maps produced by the algorithms are shown, while in Figure 3 a line of both resulting disparity maps is plotted. It is evident that our algorithm produces the same results as Grimson's one. In Figure 4 a stereo pair of images of a laboratory scene is shown. The experimental setup consists of a stereo pair of TV cameras COMERSON TC 131 CCD. The lens system is the same for both TV cameras and has f = 50 mm. The two cameras have parallel optical axes so the initial disparity is assumed equal to the physical camera separation. Figure 5 shows the sparse disparity maps obtained by both algorithms, while Figure 6 525

Volume 13, Number 7

PATTERN RECOGNITION LETTERS

||

July 1992

|| •

Figure I. A synthetical stereo pair. The leftmost rectangle has a disparity of 10 pixels, while the other rectangle has a disparity of 20 pixels.

I

Figure 2. Disparity maps produced by Grimson's algorithm (left) and our algorithm (right).

GRIM.SON A L G O R I I H M

OUR ALGORITHM o

(M

(a)

I



(b)

|o &/)

a

Ot

l ~ 100

0 200

RXE:L

--"

"

-' 100

"

I 200

PIXEL

Figure 3, Sections of the disparity map in Figure 2, (a) Central line t'rom disparity map produced by Grimson's algorithm. (b) Central line from disparity map produced by our algorithm.

Figure 4. A stereo pair of our laboratory,

shows that the histograms of the resulting disparity maps are in the same range. In fact, in the real case, both algorithms produce maps slightly dif526

Figure 5. Disparity maps obtained by Grimson's algorithm (left) and our algorithm (right).

ferent, so a plot of a line is meaningless. Table 1 shows a comparison in terms of correct matched points between the two algorithms.

Volume 13, Number 7

PATTERN RECOGNITION LETTERS DISPARrP( HISTOGRAM

bits while the convolution results are 16 bits long. The resulting disparity map of the run on the stereo pair of Figure 4, is shown in Figure 7, while Figure 8 shows the histogram of that disparity map. It can be seen that the disparity map has a lower number of points than the map produced by the VAX version of the algorithm. This is essentially due to the use of the integer arithmetic of the board. Our stereo matching produces a final disparity map in 100 seconds on a real stereo pair of 256 x 256 pixels.

I !

I

III III

J l; ,' !1

'!

-

o

i !

II

i',

',

2o

.

July 1992

4O

BIN Figure 6. Histograms of the disparity maps in Figure 5 (dotted line for disparity map obtained by our algorithm).

Table 1 A comparison between Grimson's and our algorithm

Number of matched points

Grimson's algorithm

Our algorithm

2544

2137

Let us now compare the number of operations of Grimson's algorithm with our technique. The convolutions for the edge detection are the same in both algorithms. During the matching phase the number of operations for Grimson's method depends on the number of zero-crossings found. On the other hand, our method working on all pixels in the image, requires a greater number of operations than Grimson's algorithm. In spite of this our algorithm is suitable for portability on convolution-specialized hardware, speeding up the execution time and making it quasi-real-time. To test the portability we have used a MATROX MVP-AT/NP convolution board for an IBM/PCAT and the C-language as programming language. Some problems are due to the impossibility of the board to manage floating point arithmetic, but only the integers 8 bits long. This imposes the images and the convolution masks must be quantized in 8

Figure 7. Disparity map obtained by our algorithm using tile MATROX MVP-AT/NP convolution board from the stereo pair in Figure 4.

DISPARITY HISTOGRAM

o

' - ~

! 20

!

4O

-

BIN Figure 8. Histogram of the disparity map in Figure 7. 527

Volume 13, Number 7

PATTERN RECOGNITION LETTERS

July 1992

5. Conclusion

References

In this p a p e r , we have presented a new straightf o r w a r d a l g o r i t h m for stereo m a t c h i n g based essentially o n a set o f c o n v o l u t i o n operations, in o r d e r to m a k e suitable its i m p l e m e n t a t i o n o n a convolution-specialized b o a r d . T h e a l g o r i t h m solution is oriented to all applications where a near real-time 3-D m a p is requested w i t h o u t use o f expensive h a r d w a r e (e.g., the C o n n e c t i o n Machine).

[il Marr, D. and E. Hildreth (1980). Theory of edge detection. Proc. Roy. Soc. London B 207. [2] Mart, D. (1982). Vision. Freeman, San Francisco, CA. [3] Grimson, W.E.L. (1981). From hnages to Surface. MIT Press, Cambridge, MA. [4] Nishihara, H.K. (1983). PRISM: a practical real-time imaging stereo marcher. 3rd Int. Conf. on Robin Vision, Cambridge, MA. [5l Nishimoto, Y. and Y. Shirai (1985). A parallel matching algorithm for stereo vision. Proc. IJCAi. [6] Little, J.J., G. Belloch and T. Cass (1987). Parallel algorithms for computer vision on the Connection Machine. Proc. lth ICCV.

528