Pattern Recognition Letters 24 (2003) 2421–2429 www.elsevier.com/locate/patrec
An efficient method to build panoramic image mosaics Dae-Hyun Kim *, Yong-In Yoon, Jong-Soo Choi Department of Image Engineering, Graduate School of AIM, Chung-Ang University, 221 Huksuk-Dong, Dongjak-Ku, Seoul 156-756, South Korea Received 10 September 2002; received in revised form 4 March 2003
Abstract This paper describes an efficient method to build panoramic image mosaics with multiple images. Conventional algorithms used geometrical feature points and optimization to compute the projective transformation, which is the relation between two consecutive images. However, building a panoramic image was very time consuming because of the iterative computation involved. The proposed method computed the projective transformation in overlapped areas of the two given images by using four seed points. The seed point is the highly textured point in the overlapped area of the reference image, which is extracted by using phase correlation. Because the region of interest (ROI) was restricted within overlapped areas of two images, more accurate correspondences were obtained. Before selecting the seed point, the histograms of the overlapped areas were equalized to mitigate the variation of the illumination conditions. After selecting the seed point, the weighted block matching algorithm (BMA) was used to minimize image distortion caused by camera rotation. An experiment was performed employing the proposed method with various images and the results were compared with peak signal to noise ratio (PSNR). Results showed that the proposed method built high-quality panoramic image mosaics in high speed. Ó 2003 Elsevier B.V. All rights reserved. Keywords: Panoramic image mosaics; Projective transform; Image warping
1. Introduction The automatic construction of image mosaic is an active area of research in the fields of photogrammetry, computer vision, and computer graphics. Image mosaics can be used for different applications. Szeliski (1996) and Szeliski and Shum
*
Corresponding author. Tel.: +82-282-05295; fax: +82-28262505. E-mail address:
[email protected] (D.-H. Kim).
(1997) used them for the construction of virtual environments and virtual travel, Hansen et al. (1994) for scene stabilization, and Irani et al. (1995a,b) and Irani and Anandan (1998) for scene change detection, video compression, and video indexing. There are a number of methods to build image mosaics. One is to record an image onto a long filmstrip using the panorama camera. Another is to use a lens with a very large field of view, such as fish-eye lens, and a parabolic mirror. Although these methods are able to directly capture the
0167-8655/03/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0167-8655(03)00071-0
2422
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
panoramic image, they are rather hardware-intensive and expensive. On the other hand, image mosaicking algorithm is a less hardware-intensive method. It takes many regular pictures in order to cover the whole viewing space. Then, it aligns these images and puts them together. To build the panoramic image mosaic, the projective transform between two images has to be computed. Dai and Khorram (1999) and Imad et al. (1997) used geometrical features such as corners, edges, and lines to derive the projective transform. It is possible to compute more accurate transformation, but this would demand high computational cost because of its iterative investigation of corresponding points. Shum and Szeliski (1998), Szeliski (1994, 1996) and Szeliski and Shum (1997) directly minimized the discrepancy in intensities between pairs of images after applying the transformation being recovered in this study. This algorithm is very simple and has the advantage of not requiring any easily identifiable feature point. However, it takes too much processing time because it iteratively computes the projective transform in order to optimize it. In this paper, four pairs of seed points were used to compute the projective transform in overlapped areas of two images. The seed point is the highest textured pixel in the overlapped area of the reference image. Its corresponding point is detected in the overlapped area of the target image by using block matching algorithm (BMA). For more robust detection of the corresponding point, histograms of two overlapped images were equalized before selecting seed points to mitigate the variation of illumination conditions. Furthermore, the weight function was added to BMA in order to minimize image distortion due to camera rotation. Fig. 1 shows how the proposed algorithm operates. First, overlapped areas were extracted in the given images, and then histogram equalization is applied to these areas. Then the overlapped area of the reference image was divided into four subareas. One seed point was selected in each subarea. Their correspondences were detected in the overlapped area of the target image using the weighted BMA. Once four pairs of seed points were obtained, the projective transformation and
Fig. 1. The proposed algorithm.
the camera focal length could be estimated. Using the estimated focal length, input images were warped into the cylindrical coordinates and blended into the complete panoramic image. The bilinear weight function was used to remove the boundary that occurred in blending multiple warped images.
2. Computation of the projective transform Since the region of interest (ROI) was restricted within overlapped areas of two images, more accurate correspondences were obtained. Four pairs of seed points were first identified, and then the projective transformation was computed. 2.1. Extracting overlapped areas Brown (1992) and Reddy and Chatterji (1996) described an elegant method, called phase correlation, to align two images that are shifted relatively to each other. This method estimated the 2D translation between a pair of images by taking 2D Fourier transforms of each image, computing the phase difference at each frequency, and then taking the inverse Fourier transform. From this 2D translation, overlapped areas of two given images could be estimated. 2.2. Selecting seed points In many vision applications, texture-based features are often used instead of geometrical fea-
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
tures. Shi and Tomasi (1994) used texture-based features, good features, to track feature points. In addition, Lhuillier (1999) and Lhuillier and Quan (1999) used seed points to develop the quasi-dense matching algorithm. To compute the projective transform, four seed points must exist in the overlapped area and no three points of them should be collinear. The overlapped area of the reference image was therefore divided into four subareas. The central pixel of the block that kept the maximum variance was selected as the seed point in each subarea. In Eq. (1), seed point, qi , is defined in the kth block that has a maximum variance in the ith subarea: qi ¼ arg maxk ½r2k;i ; r2k;i ¼
GX MAX
06i63
2 g2 hg Mk;i
Eðdx ; dy Þ ¼
X
2423
jIðx þ i; y þ j; kÞ
ði;jÞ2B
Iðx þ i þ dx ; y þ j þ dy ; k þ 1Þj wi;j ð2Þ wi;j ¼ di;j =D; 7 6 i; j 6 7 pffiffiffiffiffiffiffiffiffiffiffiffiffi di;j ¼ i2 þ j2
ð3Þ
where Iðx; y; kÞ is a graylevel of ðx; yÞ in the kth image, dx and dy stand for the displacement of the block, di;j is the distance from the center of the block, and D represents its maximum distance.
3. Building the panoramic image mosaic ð1Þ
where r2k;i is the variance and Mi;j is the mean value of the kth block in ith subarea, hg is the histogram of the graylevel g, and GMAX is the maximum graylevel, often 255.
The cylindrical panoramic image was often used because of its ease of construction. If the camera focal length were known, each perspective image could be warped into cylindrical coordinates. Szeliski and Shum (1997) described in detail how to estimate the focal length from the projective transformation.
2.3. Detecting corresponding points
3.1. Transformation of cylindrical coordinates
Since histogram equalization uniformly distributed graylevels of the image over all grayscale, all images had approximately the same brightness and contrast. As a result, it allowed images to be compared equally without the bias due to the difference of the perceived contrast and brightness. Therefore, the histogram of the overlapped area of each given image was equalized. For more robust detection of correspondences, the weighted BMA was used. As the camera was rotated, the image became distorted. This distortion increased monotonically as the searched location moved away from the direction of minimum distortion (Huang and Zhuang, 1995; Jong et al., 1994). Therefore, the weight function (wi;j ) was applied which decreased the error in the small distortion area and increased the error in the large distortion. Eq. (2) defines the weighted block matching criteria and Eq. (3) defines the weight function that is proportional to the distance from the center of the block:
To build the cylindrical panoramic image, the world coordinate P ðX ; Y ; ZÞ was mapped to 2D cylindrical screen coordinate ðh; vÞ using pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h ¼ tan1 ðX =ZÞ; v ¼ Y = X 2 þ Z 2 ð4Þ
g
where h is the panning angle and v is the scanline. However, Eq. (4) should be modified into the transform of screen coordinates because only the screen coordinate pðx; yÞ was known. Substituting the camera equations x ¼ f X =Z and x ¼ f Y =Z for Eq. (4) and summarizing it, the transform of screen coordinates was obtained as shown in Eq. (5) pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h ¼ tan1 ðx=f Þ; v ¼ y= x2 þ f 2 ð5Þ 3.2. Blending warped images Before blending cylindrically warped images, the displacement between two cylindrically warped
2424
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
images had to be estimated. In this procedure, phase correlation, described in Section 2.1, was exploited again. After estimating the displacement, the warped images can be blended and the complete panoramic image can be constructed. To reduce visible artifacts, images were weighed and blended together more heavily toward the center using the bilinear weighting function.
4. Experimentation An experiment employing the proposed algorithm was done with four pairs of images (Fig. 2). The experimental environment was a Pentium-4 1.4 GHz CPU of Intel with 256 MB RAM. The 15 15 block was used for selecting the seed point and the search range was 32 32. The peak signal-to-noise ratio (PSNR) was estimated in order to evaluate the performance of the proposed algorithm. In addition, the proposed algorithm was compared with other algorithms. PSNR was estimated only in the region where two warped images were synthesized. If the projective transform were more accurate, then the PSNR of the synthesized image would be larger. It is therefore important to accurately compute the projective transform. 4.1. Comparison by histogram equalization Fig. 3 shows two overlapped images and their own histograms for Fig. 2(b). Their means and standard deviations are given in Table 1. From Fig. 3(a) and Table 1, it could be seen that each image presented different brightness. After histogram equalization, however, the brightness of two images became similar, as shown in Fig. 3(b) and Table 1. Even though there were large differences on both the means and the standard deviations before histogram equalization, they presented similar values after histogram equalization. Therefore, the difference of the brightness can be mitigated due to the change of the illumination condition. Fig. 4 depicts the PSNR of the reconstructed image mosaics before and after histogram equalization. For most images, histogram equalization
Fig. 2. Experimental image pairs: (a) and (b) were acquired from http://www.cs.unc.edu/~ibr/ while (c) and (d) were captured by Olympus C-1400L digital camera.
Table 1 Means and standard deviations before and after histogram equalization
Mean Std. deviation
Before histogram equalization
After histogram equalization
Left
Right
Left
Right
145.91 43.86
104.58 39.43
127.96 73.87
127.95 73.66
improved the quality of the synthesized image to about 1dB, except for Fig. 2(d).
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
2425
Fig. 5. Performance comparison between typical BMA and weighted BMA.
tecting the correspondence. Even though there was no quantitative difference, a little qualitative difference was discovered. When using typical BMA, some ghost image phenomenon appeared in the boxed region of Fig. 6(a), but it almost did not
Fig. 3. Two overlapped images and their histograms before (a) and after (b) histogram equalization.
Fig. 4. Performance comparison before and after histogram equalization.
4.2. Weighted block matching algorithm Fig. 5 presents the experimental result according to the weight function was used or not in de-
Fig. 6. Mosaicking images of Fig. 2(b): (a) typical BMA, and (b) weighted BMA.
2426
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
appear when the weighted BMA was used (Fig. 6(b)). 4.3. Comparison by the image mosaicking algorithms ImadÕs algorithm, which used the geometrical feature points, and SzeliskiÕs algorithm, which used the Levenberg–Marquardt iterative optimization, are representative software-intensive algorithms. In this section, the proposed algorithm would be compared with these two algorithms. Instead of completely implementing ImadÕs algorithm, some corners were first detected in given images using Harris corner detector. Then, the human animator manually assigned their corresponding points. In general, this algorithm was considered to give better quality of the synthesized image than the other two algorithms. Fig. 7 presents the reconstructed image mosaics. In Fig. 7(a), human animator directly assigned the corresponding points. Fig. 7(b) and (c) are results of the SzeliskiÕs algorithm and the proposed algorithm, respectively. All three mosaics were good but still had some defects as shown in Fig. 7(b). For example, the bottom of the stairs was curved due to its different direction. This meant that SzeliskiÕs algorithm did not reflect well the geometrical feature because it searched for the minimum gradient of the graylevel in overlapped areas. However, this defect disappeared in Fig. 7(a) and (c). Fig. 7(d) shows the PSNR of each algorithm. It presented the best quality that human animator directly assigned corresponding points among detected corners. On the other hand, SzeliskiÕs algorithm presented the worst quality. The proposed algorithm showed better quality than SzeliskiÕs algorithm and was similar to ImadÕs algorithm in most images. In addition, it automatically built the panoramic image mosaic without any human interaction. 4.4. Further analysis of the proposed algorithm Both ImadÕs algorithm and SzeliskiÕs algorithm minimized the sum of the squared intensity errors as
Fig. 7. Mosaicking images of Fig. 2(a): (a) image mosaic by human animator, (b) image mosaic by SzeliskiÕs algorithm, (c) image mosaic by the proposed algorithm, and (d) PSNR of each algorithm.
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
E¼
X
2
½I 0 ðx0i ; yi0 Þ Iðxi ; yi Þ ¼
i
X
e2i
ð6Þ
i
where the corresponding pairs of pixels i which were inside both images Iðxi ; yi Þ and I 0 ðx0i ; yi0 Þ, where Iðxi ; yi Þ was the reference image and I 0 ðx0i ; yi0 Þ was the warped target image. In SzeliskiÕs algorithm, the Levenberg–Marquardt iterative optimization algorithm minimized Eq. (6). It required the computation of the partial derivatives of ei with respect to the unknown elements of the projective transform. In ImadÕs algorithm, four matches were needed within the overlapped images, but this number was usually overestimated to insure that four good matches would be found. The best projective transform was to minimize Eq. (6) among combinations of detected corners. Table 2 summarizes some features that are required to compute the projective transform. M and N are the width and the height of the image, respectively, and w is the size of the search range. T is the number of the iteration for minimizing Eq. (6). P refers to the permutation of detected corners, and its subscripts m and n are the number of detected corners in two overlapped images. In ImadÕs algorithm, the number of operations increased according to the number of detected corners. In SzeliskiÕs algorithm, it also increased
2427
according to the number of iteration that occurred in the optimization. However, the proposed algorithm did not perform any iterative operation to compute the projective transform except for the repeated computation in the weighted BMA in four subareas. Its amount of operation was not significant because the search range was smaller than the resolution of the image. Table 3 summarizes the processing time and the number of iteration for computing the projective transform. SzeliskiÕs algorithm could not avoid the iterative computation because of the Levenberg– Marquardt iterative optimization. Furthermore, it consumed different processing time according to the number of iteration. However, the proposed algorithm did not need any iterative computation and on average only took 1.4 s. As a result, the proposed algorithm speeded up the process three times. Figs. 8 and 9 show the reconstructed image mosaics and cylindrical panoramic image mosaics of Fig. 2(c) and (d), respectively. Figs. 8 and 9 also present good quality images that were visibly satisfactory.
5. Conclusion An efficient method was proposed to build a panoramic image mosaic with four seed points. In
Table 2 Features for computing the projective transform Needs iterative computation? What is the number of iterations? What kind of features is used? How many operations are needed?
ImadÕs algorithm
SzeliskiÕs algorithm
Proposed algorithm
Yes m P4 n P4 Harris corner T M N
Yes Unspecified None T M N
No 4 Texture 4w2
Table 3 Quantitative comparison between SzeliskiÕs algorithm and the proposed algorithm Algorithm
SzeliskiÕs algorithm
Fig. 2
Processing time (s)
# of iterations
Processing time (s)
Proposed algorithm # of iterations
(a) (b) (c) (d)
3.36 5.34 7.02 2.38
17 34 39 8
1.41 1.41 1.44 1.31
0 0 0 0
2428
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
Fig. 8. Reconstructed image mosaics of Fig. 2(c) and (d), respectively.
Fig. 9. Reconstructed cylindrical image mosaics of Fig. 2(c) and (d), respectively.
order to select the seed point, the overlapped areas of two images were first extracted using phase correlation. Then, the overlapped area of the reference image was divided into four subareas, and the highest textured pixel was selected as the seed point in each subarea. Extraction of the over-
lapped areas could decrease the rate of the mismatch as well as the range of computation. Before selecting the seed point, the histograms of the overlapped areas were equalized because histogram equalization could not only mitigate the variation of the illumination conditions but also
D.-H. Kim et al. / Pattern Recognition Letters 24 (2003) 2421–2429
allow images to be compared equally without bias. Its correspondence was detected using the weighted BMA in order to minimize image distortion after selecting the seed point. As soon as four correspondences were identified, the projective transformation was computed and the panoramic image mosaic was completed. In of the whole procedure, any iteration such as ImadÕs algorithm and SzeliskiÕs algorithm were not found. This meant that the panoramic image mosaic could be more quickly constructed than in the previous methods. Moreover, its quality was also better. Now, only the scene that is generated at the fixed viewpoint was considered. In future studies, the view interpolation would be investigated in order to generate the novel scene in accordance with the movement of the viewerÕs gazing direction. Acknowledgements This research was supported by the Ministry of Education, Seoul, Korea, under the BK21 project, and the Ministry of Science and Technology, Seoul, Korea, under the NRL project (M10204000079-02J0000-07310). References Brown, L.G., 1992. A survey of image registration techniques. ACM Comput. Surv. 24 (4), 325–376. Dai, X., Khorram, S., 1999. A feature-based image registration algorithm using improved chain-code representation combined with invariant moments. IEEE Trans. Geosci. Remote Sensing 37 (5), 2351–2362. Hansen, M., Anandan, P., Dana, K., van der Wal, G., Burt, P., 1994. Real-time scene stabilization and mosaic construction. In: IEEE Workshop on Applications of Computer Vision. December, Sarasota, Fl, pp. 54–62.
2429
Huang, Y., Zhuang, X., 1995. An adaptively refined blockmatching algorithm for motion compensated video coding. IEEE Trans. Circ. Syst. Vid. Technol. 5 (1), 56–59. Imad, Z., Faugeras, O., Deriche, R., 1997. Using geometric corners to build a 2D mosaic from a set of images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Puerto Rico, pp. 420–425. Irani, M., Anandan, P., Hsu, S., 1995a. Mosaic based representations of video sequences and their applications. In: Proceedings of the 5th International Conference on Computer Vision. June, Cambridge, MA, pp. 605–611. Irani, M., Hsu, S., Anandan, P., 1995b. Video compression using mosaic representations. Signal Process.: Image Commun. 5 (3), 529–552. Irani, M., Anandan, P., 1998. Video indexing based on mosaic representations. Proc. IEEE 86 (5), 905–921. Jong, H.M., Chen, L.G., Chiueh, T.D., 1994. Accuracy improvement and cost reduction of 3-step search block matching algorithm for video coding. IEEE Trans. Circ. Syst. Video Technol. 4 (1), 88–90. Lhuillier, M., 1999. Joint view triangulation for two views. In: Proceedings of the Vision Interface Conference. May, TroisRivieres, Que. Lhuillier, M., Quan, L., 1999. Image interpolation by joint view triangulation. In: Proceedings of the IEEE Computer Vision and Pattern Recognition Conference. Fort Collins, USA. Reddy, B.S., Chatterji, B.N., 1996. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 5 (8), 1266–1271. Shi, J., Tomasi, C., 1994. Good features to track. In: Proceedings of the IEEE Computer Vision and Pattern Recognition Conference. pp. 593–600. Shum, H.Y., Szeliski, R., 1998. Construction and refinement of panoramic mosaics with global and local alignment. In: Proceedings of the 6th International Conference on Computer Vision. pp. 953–958. Szeliski, R., 1994. Image mosaicing for Tele-Reality applications. In: IEEE Workshop on Applications of Computer Vision. December, Florida, pp. 44–53. Szeliski, R., 1996. Video mosaics for virtual environments. IEEE Comput. Graph. Appl., 22–30. Szeliski, R., Shum, H.Y., 1997. Creating full view panoramic image mosaics and environment maps. In: Proceedings SIGGRAPH. pp. 251–258.