Stereo matching and occlusion detection with integrity and illusion sensitivity

Stereo matching and occlusion detection with integrity and illusion sensitivity

Pattern Recognition Letters 24 (2003) 1143–1149 www.elsevier.com/locate/patrec Stereo matching and occlusion detection with integrity and illusion se...

322KB Sizes 0 Downloads 24 Views

Pattern Recognition Letters 24 (2003) 1143–1149 www.elsevier.com/locate/patrec

Stereo matching and occlusion detection with integrity and illusion sensitivity Qiuming Luo *, Jingli Zhou, Shengsheng Yu, Degui Xiao School of Computer Science and Technology, Huazhong University of Science and Technology, 8f, Huagong Keji Chanye Building 243# Luoyu Road, Wuhan, Hubei 430074, PR China Received 29 December 2001; received in revised form 16 September 2002

Abstract Occlusion might cause problems in stereo matching, especially when there are narrow objects with large disparity and illusion in the scene. Disparity consistency is introduced to handle these complicate scenes. The advantage of disparity consistency over those existed constraints is the integrity and illusion sensitivity. An algorithm using disparity consistency is provided. With this consistency constraint, the problems in the complex scene can be better solved than with the existed constraints and other methods. The results of experiment on synthetic images and real world images are presented. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Stereo vision; Integrity; Illusion sensitivity; Disparity consistency

1. Introduction The depth or range information can be derived from the disparity map and there have been various matching algorithms that can produce disparity map, which could be mainly classified into two categories: area-based (e.g. Kanade and Okutomi, 1991, 1994; Intille and Bobick, 1995; Sun, 1998), and feature-based (e.g. Chen et al., 1999; Tsai and Katsaggelos, 1999; Candocia and Adjouadi, 1997). *

Corresponding author. Tel.: +86-27-87801079; fax: +86-2787806221. E-mail addresses: [email protected] (Q. Luo), jlzhou@mail. hust.edu.cn (J. Zhou), [email protected] (S. Yu), dgxiao@ wtwh.com.cn (D. Xiao).

Occlusion will cause complicate problems in matching process of all those algorithms. The occluded area appears in only one of the two cameras. In this case, as Fig. 1 shows, the occluded area will be in the image from the left camera only. If it is not identified as occluded, wrong correspondence will occur, subsequently results in a wrong disparity map and wrong depth map. This includes the problems of narrow object with large disparity and illusion. We will focus on the passive binocular stereo, although it might be remedy by using more images, cameras or information of motion. Although some algorithms (Woodfill and Herzen, 1997; Hariyama et al., 2001; Werth and Scherer, 2000) do not take occluding into account to achieve high speed and simplicity (these methods are mostly used with special hardware,

0167-8655/03/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 2 ) 0 0 2 8 4 - 2

1144

Q. Luo et al. / Pattern Recognition Letters 24 (2003) 1143–1149

Fig. 1. Scene: a simple case of occluding.

where complicate algorithm is not permitted), others manage to solve the problems caused by occluding. Area-based method or a combination of them (Tsai and Katsaggelos, 1999) is more preferable if an accurate dense disparity is desired. The feature-based method could only produce a sparse disparity map, though a dense disparity map is possible by interpolating. This means that it could not identify the occluded area accurately if there is no enough features in the boundary of occluded area. In early research of stereo vision, the segmentation into visible and occlusion region is treated as a secondary process, postponed until matching is completed and smoothing is underway (Dhond and Aggarwal, 1989). Because the pixels in occluded area have no corresponding pixel in peer image, trying to find the correspondence of them is not proper. The recent researches tend to execute the matching and occluding detection simultaneously (Toh and Forrest, 1990; Triantafyllidis et al., 2000; Zitnick and Kanade, 2000; Birchfield and Tomas, 1999). Most of these methods exploit the difference of matching goodness between occluded area and non-occluded area by different measurement, such as Bayesian approach (Triantafyllidis et al., 2000), variable window (Kanade and Okutomi, 1991), bi-directional (Werth and Scherer, 2000), etc. The key element of those methods is the measurement of similarity/dissimilarity (Toh and Forrest, 1990). They do not utilize the essential property of occluding: an occluding object is present between one eye/camera and the occluded object, as Fig. 1 shows. Therefore, it can be said

that they use an indirect cue of occlusion. They work well in most cases, but will failure in the complex cases. So some algorithms begin to use the essential property of occluding. Because exploiting the occlusion in world coordinate is complex and irregular due to the computation of convert disparity to depth, the disparity space is more preferable. These algorithms include (Intille and Bobick, 1995; Zitnick and Kanade, 2000; Watanabe and Fukushima, 1999). In this paper, the philosophy of exploiting the property in disparity space is adopted. In the next section, we will denote left and right view-line and the disparity consistency, which describe how the pixels are located in disparity space and the nature of imaging and occluding. It is compatible to uniqueness constraint, continuity constraint, disparity gradient limit and ordering constraint, and overcomes their shortage to some extent. In Section 3, an algorithm with disparity consistency constrain is introduced and experiment results are available in Section 4. Occlusion is detected by direct cue and a method is taken to solve the problems of narrow object with large disparity and illusion in the complex cases. Conclusion is draw in Section 5.

2. LVL, RVL and disparity consistency 2.1. LVL and RVL in disparity space There are some constraints commonly used in stereo matching: uniqueness, continuity, ordering, epipolar constraint and disparity gradient limit. None of these constraints describe the nature of imaging and occluding. The left-view-line (LVL) and right-view-line (RVL) are the ones that are capable of describing the nature of imaging and occluding. If the cameras are parallel configured and left image is referred, LVL is defined in disparity space as a line originating from a pixel and has a gradient of þ1 and RVL is defined in disparity space as a line originating from a pixel and has a gradient of )1. A 3D disparity space has dimensions column x, row y, and disparity d. This parameterization is different from 3D volumetric methods (Moravec,

Q. Luo et al. / Pattern Recognition Letters 24 (2003) 1143–1149

1145

Fig. 2. Disparity space, DSI, LVL and RVL.

1996) that use x, y, and z world coordinates as dimensions. A pixel with disparity of d is located in the disparity space by coordinate ðx; y; dÞ, where x, y is identical to image coordinate. A matching cube is made up of all the matching goodness for all possible disparity of all pixels. Goodness of matching is the measurement of similarity. It can be SSD, SAD, correlation or others. A slice of the cube for a scan line is called disparity-space image (DSI). A matching cube calculated from correlation is called correlation cube. The disparity space, DSI, LVL and RVL is showed as Fig. 2. If right image is referred, the definition is similar. LVL and RVL shows that all the points in a same view line are corresponding to one pixel in the imaging plane of a camera and the nearest one will occlude all other points. The following matching and occlusion detection is based on this understanding. 2.2. Disparity consistency Disparity consistency is different from those commonly used constraints, although they are similar. It is denoted as: There is one and only one pixel for any LVL or RVL in disparity space except those who crossing occluded area, where there might be no pixel. The upper part of LVL and left part of RVL of a pixel are occluded areas of that pixel. A pixel is illusion-like when both RVL and LVL of this pixel cross right-occlusion and left-occlusion respectively. Fig. 3(a)–(c) give an outlook of conflictions that disobey the disparity consistency and Fig. 3(d) and (e) depict the cases when illusion occurs.

Fig. 3. Some cases that a pixel located in disparity space ((a)– (c) represent conflictions; (d) and (e) represent two kind of illusions).

3. Disparity consistency algorithm This algorithm is composed of four consecutive processes. The first is fast calculation of matching cube (here is a correlation cube). The second process is to produce initial disparity estimation from this correlation cube. The third process solves the problem of narrow object with large disparity. The last process eliminates the illusions. 3.1. Calculating the correlation cube We use a fast calculation of correlation cube as (Sun, 1998) described. That can be achieved by using a FIFO buffers to perform pipeline operation. The complexity of the calculation is deduced to OðM; N ; DÞ, where M, N are horizontal and vertical dimensions of image and D is the largest disparity. The correlation values are stored in a matching goodness array Gðx; y; dÞ. Array Gðx; y; dÞ is initiated by assigning value of )1, which indicates invalid. 3.2. Initial disparity map After the correlation cube is calculated, initial disparity map need to be established. An array, Dispðx; yÞ, used to store the disparity values is initiated by assigning value of )1, which indicates invalid. As the disparity consistency described, the candidate disparities are selected from all possible

1146

Q. Luo et al. / Pattern Recognition Letters 24 (2003) 1143–1149

LVL by largest correlation value to avoid confliction. A pixel P ðx1 ; y1 Þ has the disparity value of d if: Gðx; y; dÞ ¼ maxðGðx; y; di Þjdi 2 ð0; max disparityÞÞ

Fig. 4. TDispðx; yÞ derived form Dispðx; yÞ.

ð1Þ The disparity value of pixel P ðx1 ; y1 Þ is stored in array element Dispðx1 ; y1 Þ. Then the candidates will be checked along the RVL to avoid disparity confliction. For any pixel P ðx1 ; y1 Þ, if there is other candidate P ðx2 ; y2 Þ located in the same RVL a resolution is need. The one whose matching goodness G is greater than other one by dt (a empirical factor) is chosen, otherwise both are abandoned by assigning a invalid disparity value to them: Dispðx1 ; y1 Þ ¼ 1

ð2Þ

and Dispðx2 ; y2 Þ ¼ 1

ð3Þ

The result of selecting the best one from LVL and RVL is similar to bi-directional match and it has the advantage of need not to calculate the reverse match. A traditional disparity map TDispðx; yÞ is obtained by searching form right to left, row by row. The searching path has only three directions: left, down and left up. This is conformed to disparity gradient limit and this will skip the illusion-like areas. It uses the valid Dispðx; yÞ as ground control point (GCP) to help find out the path as the dynamic programming algorithm (Intille and Bobick, 1995). Searching form right to left in those directions is avoid accepting illusion-like areas into traditional disparity map. The result is stored in TDispðx; yÞ.

and Bobick, 1995). In (Dhond and Aggarwal, 1992), a dynamic disparity search method is used. And Zitnick and Kanade (2000) use a cooperative algorithm. Ordering constrain is not valid in this case (Tsai and Katsaggelos, 1999). According to disparity consistency, these areas are legal searching areas, and they are (as pointed out by disparity consistency) illusion-like areas. In order to keep the integrity of disparity, more effort should be taken. Then the traditional disparity map is separated into highly reliable matched set and unmatched set. According to this information we divide the disparity space into two: the space occupy by LVL and RVL of reliably matched pixels is marked as inhibition area; the remains areas are marked as searching area. This is achieved by searching the TDispðx; yÞ and find out the valid disparity value of P ðX ; Y Þ, then make this assignment: GðX ; Y ; di jdi 2 ð0; max disparityÞÞ ¼ MaxGoodness þ 1 GðX þ di  d; Y ; di jdi 2 ð0; max disparityÞÞ ¼ MaxGoodness þ 1

ð4Þ

ð5Þ

The inhibition areas and searching areas are as Fig. 5 shows. The searching areas will be analyzed. The integrity is achieved by searching these areas. We get the illusion-like areas by submitting traditional disparity map TDispðx; yÞ from Dispðx; yÞ to get

3.3. Dealing with narrow and large-disparity object When there are narrow objects with large disparity in the scene, some part of background will be missed in traditional disparity map TDispðx; yÞ, although they do exist in Dispðx; yÞ, as Fig. 4 shows. Only a few algorithms treat this case seriously (Zitnick and Kanade, 2000; Dhond and Aggarwal, 1992), others assume that it will not happen (Intille

Fig. 5. A DSI is separated into two area: one is the highly reliable matched; another is unmatched searching areas (shadowed).

Q. Luo et al. / Pattern Recognition Letters 24 (2003) 1143–1149

1147

4.1. Synthetic stereo pair

Fig. 6. Decision flow for illusion-like areas.

IDispðx; yÞ. If there is no valid IDispðx; yÞ in a searching area, there is no need to search in it. The valide IDispðx; yÞ will be used as GCP in searching operation, and the searching is limited in the local space only, it is not a global searching. Searching result is stored in IDispðx; yÞ. When all the illusionlike areas are finished, the integrity problem of narrow object with large disparity is solved. 3.4. Illusion elimination When the integrity is obtained, illusion is accepted too. There is no firm evidence to discriminate the background between narrow object and its occlusion area from real illusion, and there is no firm evidence to discriminate the outstanding real object from illusion. For all area in IDispðx; yÞ, a check is performed as Fig. 6 shows: A block in IDispðx; yÞ is accepted if the area is large enough or it concatenate to other highly reliable matched area, others are eliminated as illusion. Of cause, you can keep it in disparity map and give another marking to identify them from usual ones. Although this method is not prefect, it does provide more information than just accept them as usual ones as (Zitnick and Kanade, 2000) does. Unmatched areas are marked as occluded. There is no a separated process to handle occlusion.

The stereo pair used is a random dot pair. It is generated by shifting blocks of pixel by a disparity value and then added with noise. There are four kinds of object: (A) a horizontal long object in the foreground; (B) three narrow objects with large disparity in the foreground; (C) four concatenated blocks in an order of alternating occluding and occluded; (D) an illusion caused by two occlude areas. Fig. 7 shows a truth that occluded area is equal to occluding area, which is not always satisfied in other methods or algorithms. The vertical narrow objects and shadows are detected accurately without expanding the shadow areas, and this make the pixels located between the occluding and occluded pixels in area B detected. Dynamic programming cannot treat this problem properly, it tend to treat all of them as occluded by select the path of left-up (Intille and Bobick, 1995). The illusion caused by the two identical blocks each located in left and right occluded areas and

4. Experiment result This algorithm is run in a Pentumn-II 350 with 64 MB memory. The calculation of correlation cube for an image of 256 256 in 32 disparity depth takes about 300 ms.

Fig. 7. Random dot stereo pair disparity and occluded area ((a) is left image, (b) is right image, (c) is the initial disparity map and (d) is the final disparity map).

1148

Q. Luo et al. / Pattern Recognition Letters 24 (2003) 1143–1149

projected to left and right cameras respectively. The illusion is detected successfully and eliminated in the final disparity map, as Fig. 7(c) and (d) shows. Although some areas of B are illusion-like too, there are kept because they are large enough and concatenated to background smoothly.

Fig. 8 is the DSI of y ¼ 218 where illusion locates, and Fig. 8(b) shows that the LVL and RVL of that illusion across the left occluded area and right occluded area simultaneously. The statistics of experiment result is show as Table 1.

Fig. 8. (a) DSI at y ¼ 218 where the illusion (within a circle) locates and (b) the disparity map before illusion elimination.

Table 1 Statistics data for experiment result The numbers are pixels counting

Result

Real

Accuracy (%)

Area A

Occluding Occluded

4751 1143

5200 1300

91.4 87.6

Area B

Occluding Occluded

2160 2138

2400 2400

90.0 89.1

Area C

Occluding Occluded

1533 1201

1800 1500

85.5 80.1

48,326 3580

50,936 4000

94.9 89.5

35

27 27

77.1 100

Background Pixels between the occluded and occluding pixels in area B Illusion detected Illusion eliminated

Fig. 9. Result of real image ((a) and (b) are left and right image respectively, (c) is a 3D view of disparity map).

Q. Luo et al. / Pattern Recognition Letters 24 (2003) 1143–1149

4.2. Real picture The experiment for the ordinary stereo pairs shows a good performance, too. The edge of disparity discontinuity is sharp and the other place is smooth, as Fig. 9 shows. What should be mentioned is that no low-pass filter is used.

5. Conclusion This method of stereo matching and occlusion detection is based on the analysis in disparity space, in the form of correlation cube. All the constraints are put as disparity consistency, which exploit how the pixels locate in disparity space and the nature of imaging and occluding. It is more accurate than their original statements. Uniqueness is preserved. Continuity is kept by solving the confliction of disparity consistency other than a kind of low-pass filtering. Gradient limit became more perfect in the form of RVL and ordering constraint is abandoned. What is more, the integrity and illusion sensitivity is achieved. This is because of using a direct clue of imaging and occluding other than indirect clue. The experiment results show that the integrity problem caused by narrow object occlusion is solved successfully, and illusion is specially treated other than simply accepted them as usual ones.

References Birchfield, S., Tomas, C., 1999. Depth discontinuities by pixelto pixel stereo. Elsevier Neural Networks. Candocia, F., Adjouadi, M.A., 1997. Similarity measure for stereo feature matching. Image Processing. IEEE Transactions on 6 (10), 1460–1464. Chen, T., Bvik, A.C., Cormack, L.K., 1999. Stereoscopic ranging by matching image modulations. Image Processing, IEEE Transactions on 8 (6), 785–797. Dhond, U.R., Aggarwal, J.K., 1992. Analysis of the stereo correspondence process in scenes with narrow occluding objects. Pattern Recognition, 1992, Vol. 1. Conference A:

1149

Computer Vision and Applications, Proceedings, 11th IAPR International Conference on, pp. 470–473. Dhond, U.R., Aggarwal, J.K., 1989. Structure from stereo a review. IEEE Transaction on Systems Man and Cybernetics 19 (6), 1489–1510. Hariyama, M., Takeuchi, T., Kameyama, M., 2001. VLSI Processor for reliable stereo matching based on adaptive widow-size selection. Proceeding of the IEEE International Conference on Robotics and Automation 2, 1168–1173. Intille, S.S., Bobick, A.F., 1995. Disparity-space images and large occlusion stereo. M.I.T Media Lab Perceptual Computing Group, Technical Report, No. 220. Kanade, T., Okutomi, M., 1994. A stereo matching algorithm with an adaptive window: theory and experiment Pattern Analysis and Machine Intelligence. IEEE Transactions on 16 (9), 920–932. Kanade, T., Okutomi, M., 1991. A stereo matching algorithm with an adaptive window: theory and experiment. Proceeding of IEEE International Conference on Robotics and Automation 2, 1088–1095. Moravec, H., 1996. Robot spatial perception by stereoscopic vision and 3D evidence grids. CMU Robotics Institute Technical Report CMU-RI-TR-96-34. Sun, C., 1998. Multi-resolution rectangular subregioning stereo matching using fast correlation and dynamic programming techniques. CSIRO Mathematical and Information Sciences Macquarie University Campus, Report No. 246. Toh, P.S., Forrest, A., 1990. Occlusion detection in early vision. ICCV, 126–132. Triantafyllidis, G.A., Tzovaras, D., Strintzis, M.G., 2000. Occlusion and visible background and foreground areas in stereo: a Bayesian approach. IEEE Transactions on Circuits and Systems for Video Technology 10 (4), 563–575. Tsai, C., Katsaggelos, A.K., 1999. Dense disparity estimation with a divide-and-conquer disparity space image technique. Multimedia, IEEE Transactions on 1 (1), 18–29. Werth, P., Scherer, S., 2000. A novel bidirectional framework for control and refinement of area based correlation techniques. IEEE 15th International Conference on Pattern Recognition, Proceedings 3, 730–733. Watanabe, O., Fukushima, K., 1999. Stereo algorithm that extracts a depth cue from interoccularly unpaired points, neural networks. Woodfill, J., Herzen, B.V., 1997. Real-time stereo vision on the PARTS reconfigurable computer. IEEE Symposium on Field-Programmable Custom Computing Machines, 242– 250. Zitnick, C.L., Kanade, T., 2000. A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (7, p. 2), 675–684.