Robust rigid registration by scanning multiple phase correlation peaks

Robust rigid registration by scanning multiple phase correlation peaks

Optik 126 (2015) 5115–5118 Contents lists available at ScienceDirect Optik journal homepage: www.elsevier.de/ijleo Robust rigid registration by sca...

785KB Sizes 0 Downloads 37 Views

Optik 126 (2015) 5115–5118

Contents lists available at ScienceDirect

Optik journal homepage: www.elsevier.de/ijleo

Robust rigid registration by scanning multiple phase correlation peaks Alfonso Alba ∗ , Edgar R. Arce-Santana Facultad de Ciencias, UASLP, Av. Salvador Nava Mtz. S/N, Zona Universitaria, 78290 San Luis Potosí, Mexico

a r t i c l e

i n f o

Article history: Received 4 November 2014 Accepted 11 September 2015 Keywords: Phase correlation Rigid image registration Image registration

a b s t r a c t Rigid image registration is an important image processing problem with applications in medical imaging, biometrics, image mosaicking, video stabilization, and others. A popular technique for rigid image registration consists in estimating the location of the maximum of the phase correlation function. However, multiple motions, non-rigid transformations, non-textured regions and noise introduce artefacts and spurious peaks that may confound the phase correlation method. A solution, which is proposed here, is to explore multiple peaks and estimate an error measure for each plausible transformation, in order to find the correct one. Our tests, performed with both artificially generated and real cases, show that the proposed approach can increase the probability of successful registration by 25–40% and reduce the registration error by up to 35%. © 2015 Elsevier GmbH. All rights reserved.

1. Introduction The phase correlation method is a popular technique for translational and rigid image alignment [1,2]. In recent years, it has been used for registration of medical images [3], aerial and satellite images [4], image mosaicking and creation of super-resolution images, fingerprint and iris recognition [5,6], and many other applications. It has also been used in various computer vision problems such as stereo vision and motion estimation [7,8]. On the other hand, various authors have studied multiple refinements to the basic phase correlation method in order to increase its accuracy and robustness. For instance, to estimate the phase correlation with sub-pixel accuracy, some authors fit a continuous model such as a Gaussian, a Dirichlet kernel, or a quadratic model, to the discrete phase correlation data [9,10]. It is also common to apply a windowing function (e.g., Hanning or Blackman) to the input images to reduce border effects. The phase correlation method is based on the properties of the Fourier transform. Consider two one-dimensional discrete signals f[x] and g[x], defined for x = 0, . . ., N − 1. The normalized discrete cross-spectrum R[k] between f and g is defined as R[k] =

F[k]G∗ [k]

 , F[k]G∗ [k]

∗ Corresponding author. Tel.: +52 4448262486x2906. E-mail addresses: [email protected] (A. Alba), [email protected] (E.R. Arce-Santana). http://dx.doi.org/10.1016/j.ijleo.2015.09.125 0030-4026/© 2015 Elsevier GmbH. All rights reserved.

(1)

where F and G are the discrete Fourier transforms of f and g, respectively, G* is the complex conjugate of G, and |z| represents the magnitude of a complex number z. The phase correlation (PC) function r[x] is defined as the inverse discrete Fourier transform of R[k]. It is easy to see that if g is a circularly shifted version of f (i.e. g[x] = f[x − d]), then G[k] = F[k] exp { − j(2k/N)d}, where N is the length of both signals and d is the displacement between them, and thus R[k] = exp {j(2k/N)} and r[x] = ı(x + d). In other words, if one of the images is a perfect circular shift of the other, the phase correlation function between them would be a single peak located at the displacement −d required to re-align the images. These notions can be easily generalized to 2D signals (images) and higher dimensional signals. In practice, however, the displacement between two signals or images is rarely circular, and it often occurs that different regions in the image suffer different displacements, for instance, when the background remains static while the foreground object moves. These issues introduce border artefacts which, along with the presence of noise and untextured regions, will distort the ideal phase correlation peak and/or introduce additional spurious peaks that may confound the basic phase correlation algorithm. On the other hand, due to the periodic nature of the phase correlation function, a peak observed at position −d in r[x] may represent a displacement of either d or d − N pixels between both signals. In many applications, such as motion estimation, it may be reasonable to assume that shorter displacements are more likely, so that one would choose d when d ≤ N/2, or choose d − N if d > N/2. In other applications, such as image stitching, the correct choice will depend on each case.

5116

A. Alba, E.R. Arce-Santana / Optik 126 (2015) 5115–5118

In this article, we propose a robust rigid image registration approach based on the exploration of multiple peaks of the phase correlation function in order to obtain a set of plausible transformations, including multiple motions which correspond to the same phase correlation peak due to periodicity. Each transformation from this set is then evaluated by computing the root mean square error in the overlapping area between the gray values of the aligned images, and the transformation that minimizes this error is selected. To demonstrate the advantages of the proposed method, a quantitative evaluation is carried out using artificially transformed images with known transformations, and with real images for which the transformation is unknown. From these results, our conclusions are drawn. 2. Methodology The proposed methodology is based on Reddy and Chatterji’s algorithm for rigid image registration [2]. Given two images f1 and f2 of size Nx × Ny , we first compute their Fourier transforms F1 and  F2 and  the log-magnitude of their spectra Mi [k, l] = W [k, l] log Fi [k, l], where k and l represent the frequency indices and W[k, l] is the frequency response of a high-pass filter. Following Reddy and Chatterji’s suggestion we use W[k, l] = (1 − X[k, l])(2 − X[k, l]) with X[k, l] = cos(k/Nx ) cos(l/Ny ). The filtered spectra are then transformed from cartesian  coor-

dinates [k, l] to log-polar coordinates [, ] given by  = K k2 + l2 and  = atan2(l, k), where K is an adequate scaling factor which controls the resolution of the radius () axis. Here we use K = (Nx + Ny )/8. The phase correlation rRS [, ] between M1 [, ] and M2 [, ] is then computed, and the set SR of locations of the P most significant maxima of rSR is obtained with sub-pixel accuracy by minimizing the gradient of the phase correlation function as described in [11]. Each element of SR represents a candidate pair for the scaling and rotation parameters of the unknown rigid transformation between the two input images. For each (, ) ∈ RS , the scale parameter is recovered as s = exp(/K) and the rotation angle is recovered as  = 2/Ny . The image f1 is aligned to f2 by computing f˜1 [x, y] = f1 (Ts, (x, y)), where Ts, is a linear transformation defined by



Ts, (x, y) =

s cos 

−s sin 

s sin 

s cos 

   x

y

.

(2)

In order to estimate the most plausible translations for the given scale and rotation parameters (s, ), the phase correlation rT is computed between f˜1 and f2 . Again, the locations (with sub-pixel accuracy) of the P most prominent maxima of rT are obtained as the set T (s, ), taking into account multiple choices due to the periodicity of the phase correlation function. That is, if (tx , ty ) ∈ T (s, ), then (tx − Nx , ty ), (tx , ty − Ny ) and (tx − Nx , ty − Ny ) will also be included in T (s, ). The set of all plausible transformations is simply obtained as  = {(s, , tx , ty ) : (s, ) ∈ RS , (tx , ty ) ∈ T (s, )},

logical AND operation between g1 and g˜ 2 to obtain a binary mask mT [x, y] = g1 [x, y]˜g2 [x, y] which represents the overlapping region for a given transformation T. Once this mask has been obtained, we compute the root mean square error (RMSE) between the two images for a given transformation T as

  2  m [x, y](f1 [x, y] − fˆ2 [x, y])  x y T ET = , x

y

mT [x, y]

(4)

where fˆ2 [x, y] = f2 (T −1 (x, y)). On the other hand, if a transformation is such that the overlapping area is very small, it may result in a spuriously low RMSE. To avoid this problem, we only take into account transformations for which the proportion of overlapping area is larger than a given threshold A. For a given transformation T ∈ , the proportion of overlapping area is given by aT =

1

mT [x, y]. Nx Ny x

(5)

y

Therefore, the estimated transformation T* is chosen as T ∗ = arg

min

T ∈ , aT >A

ET

.

(6)

3. Results and discussion In this section, a quantitative evaluation of the proposed method is presented. The algorithm was implemented in C++ using the OpenCV library [12], and all tests were performed on a 2.6 GHz Intel Core i5 processor, although the implementation does not take advantage of multiple cores. In all cases, P = 8 phase correlation peaks were explored for both the scaling/rotation estimation stage, and the translation estimation stage; therefore, a maximum of 4P2 = 256 possible transformations were obtained for each case. The minimum area of overlap was set to A = 20% of the input image size, to avoid spurious solutions with small overlap. 3.1. Tests with artificially generated transformations For this test, 100 known random transformations were applied to a 512 × 512 grayscale version of the Lena image (see Fig. 1a). The parameters of these transformations were chosen in the following ranges: scaling factor s ∈ [2−0.5 , 20.5 ] (approximately between 0.7 and 1.4), rotation angle  ∈ [−30◦ , 30◦ ], and translations tx , ty ∈ [−32, 32]. These choices produce some difficult transformations among the test cases, especially because the transformed image is cropped with respect to the frame of the original image, which may

(3)

and its cardinality is at most 4P2 , but it is often less because the sub-pixel refining algorithm may lead to equal results for different integer-valued starting points. The final step consists in selecting which transformation in  is the most appropriate solution. To do this, one can apply the transformation (or its inverse) to one of the images in order to align it against the other image, and estimate a similarity or error measure between the overlapping region of the two aligned images. To obtain the overlapping region, one can use two binary images g1 [x, y] = g2 [x, y] = 1, and inversely transform g2 according to the transformation under evaluation; i.e., g˜ 2 = g2 (T −1 (x, y)), where T−1 represents the inverse of T. Then one applies a pixel-wise

Fig. 1. Lena image used for synthetic tests: (a) original image, (b) transformed image with large overlapping area, (c) transformed image with small overlapping area, (d) same as (b) with added Gaussian noise, and (e) same as (c) with added Gaussian noise.

A. Alba, E.R. Arce-Santana / Optik 126 (2015) 5115–5118

5117

Table 1 Results obtained from 100 artificially transformed images, first without noise (SNR = ∞) and then with Gaussian noise (SNR = 2).

SNR = ∞ Single peak Multiple peaks SNR = 2 Single peak Multiple peaks

Success rate

Median relative error

Median RMSE

78 99

0.0114 0.0073

0.0336 0.0260

68 97

0.0156 0.0107

0.0778 0.0692

Fig. 2. Histograms of phase correlation peak rankings corresponding to the best solution found by the proposed multi-peak method for artificially generated transformations with SNR = ∞ and SNR = 2. Histograms are shown for both the scaling/rotation peak (SR Peak) and the translation peak (T Peak).

result in a considerable amount of missing data. The RMSE between the aligned images was also computed. Fig. 1b shows the transformed image for the less extreme case (largest overlapping area), while Fig. 1c shows the most extreme case (smallest overlapping area). For each case, the proposed method was used to estimate the ˆ tˆx and tˆy , from which a transformation Tˆ with parameters sˆ, , relative error measure was computed as follows: E=

1 4

ˆ − | |tˆy − ty | |ˆs − s| | |tˆx − tx | + + + |s| |ty | |tx | ||



.

(7)

For comparison purposes, registration was also performed using only the maximum of the phase correlation function (also computed with sub-pixel accuracy) at each stage: scaling/rotation and translation. We consider a registration successful when its relative error is less than 0.05. One can thus compute the success rate of a registration algorithm as the percentage of successful registrations. A second experiment was carried out by adding Gaussian noise to the transformed images in order to obtain a signal-to-noise ratio (SNR) of 2 (i.e., the standard deviation of the noise is half the standard deviation of the image). Fig. 1d and e exemplifies two transformed images (corresponding to the largest and smallest overlapping area, respectively) with added noise. Relative error and RMSE were also computed for each case, as well as the success rate. Numerical results are shown in Table 1, where the proposed method (multiple peaks) is shown to achieve a considerably higher success rate (27% higher for SNR = ∞, and 42% higher for SNR = 2), as well as a lower median relative error (36% lower for SNR = ∞, 31% lower for SNR = 2) and lower median RMSE (22% lower for SNR = ∞, 11% lower for SNR = 2). It is worth noting that 99 out of 100 cases with SNR = ∞ were correctly registered by the multi-peak approach with a relative error less than 0.05; however, the remaining case had a relative error of 0.056 and the images were in fact aligned correctly. Moreover, the multiple peak method shows an increased robustness to noise: the success rate is only 2% lower for SNR = 2 with respect to SNR = ∞, whereas for the single peak approach, the presence of noise decreases the success rate by almost 13%. We are currently exploring the most prominent P = 8 peaks of the phase correlation function. These peaks can be ranked with respect to the height of the peak (i.e., the value of the phase correlation function at that location), so that the highest peak is assigned rank 1, the second highest peak is assigned rank 2, and so on. It is interesting to see how the best solutions found by the proposed approach are ranked in this regard. Fig. 2 shows the rank histograms of the solutions, for each stage of the registration method (scaling/rotation and translation), and for the cases without noise (SNR = ∞) and with noise (SNR = 2). The frequency of the scaling/rotation peak (SR Peak) for the rank 1 peak (the highest peak of the phase correlation function) is clearly correlated to the success rate of the single-peak approach. The frequency of lower-ranked peaks decreases rapidly with respect to the rank, which means that as P increases, the benefit becomes less and less significant, while the computational cost increases considerably. Furthermore, the

rank histograms for the translation peak (T Peak) are completely concentrated on the rank 1 peak (there is one sample with rank 8 at SNR = 2, but this case corresponds to an unsuccessful registration, possibly due to a wrong estimation of the scaling/rotation peak). On one hand, this suggests that the scaling/rotation stage is much more sensitive to artefacts produced by noise, artificial borders, etc. On the other hand, it also reflects the importance of estimating the scaling and rotation parameters with the highest accuracy possible, so that the estimation of the translation between the rectified images becomes an easier problem. 3.2. Tests with real images To complement the tests with synthetic transformations, the proposed algorithm has been also applied to real image pairs for which the true transformation is unknown, and in some cases is not even rigid. Since the ground truth is not known, it is not possible to estimate the relative error given by Eq. 7, but one can still measure the RMSE between the registered images (considering only the overlapping area). There are six test image pairs that have been obtained from different sources: Boston [13], Aerial 1 [14], Aerial 2, Face, MRI and Bark [14]. Fig. 3 shows the six pairs, along with the registered images, where one image is displayed using the red color channel, while the other one uses the green channel; therefore, matching pixels will be shown in yellow. Table 2 shows the RMSE values prior to registration, after registration using a single peak, and after registration using multiple peaks. Values shown with an asterisk (only in the case of the single-peak approach) correspond to cases where the algorithm was not able to find an adequate solution; in other words, cases where the highest peak of the phase correlation function, for either the scaling/rotation stage or the translation stage, does not approximate the true transformation between the images. Half of these image pairs could not be correctly aligned by considering only the highest peak of the phase correlation; for the other three cases, there were two (Boston and Aerial 1) for which the multiple-peak method was able to find better solutions than the single-peak approach. Interestingly, these two examples correspond to cases were the true transformation between the two input images is a projective transformation; the phase correlation approaches under study will only attempt to align some of the objects in the scene. Table 2 RMSE values between the unregistered and registered image pairs for the real test cases. Registration was performed using both a single peak and multiple peaks of the phase correlation function. Unsuccessful registrations are marked with an asterisk. The last two rows show the ranking of the phase correlation peak that produced the best result for the multiple peak method, both for the scaling/rotation stage (RS Peak) and for the translation stage (T Peak).

Unregistered Single peak Multi peak RS Peak T Peak

Boston

Aerial 1

Aerial 2

Face

MRI

Bark

0.246 0.090 0.080 2 2

0.165 0.134 0.107 2 3

0.118 0.117* 0.014 1 1

0.112 0.230* 0.036 5 2

0.124 0.066 0.066 1 1

0.133 0.144* 0.057 7 1

5118

A. Alba, E.R. Arce-Santana / Optik 126 (2015) 5115–5118

efficient than the previous approach. RMSE results for the real test cases using this approach are shown in Table 3. All image pairs were successfully aligned, although in some cases the RMSE was higher when compared to the full multi-peak approach. 4. Conclusions In this paper, a heuristic for improving the robustness of image registration methods that are based on phase correlation is proposed. The idea consists on a brute force search among multiple peaks of the phase correlation function, taking into account multiple translations that may be related to the same peak due to the periodicity of the phase correlation. Among these choices, the one with the smallest root mean square error, measured across the overlapping area of the registered images, is selected. Our results show that the proposed approach has between 25% and 40% more probability of successfully aligning the input images and up to 35% less estimation error, with respect to the typical approach based on simply selecting the highest phase correlation peak. Moreover, the proposed approach appears to be more robust to the presence of noise and difficult non-rigid transformations. Acknowledgements A. Alba was supported by CONACyT grant #154623. E. R. Arce was supported by CONACyTgrant #168140. The authors would like to thank Otniel Garcia for providing the Face images, Aldo Mejia for the MRI images used in Section 4.4, and El Colegio de la Frontera Sur (ECOSUR) for the Aerial 2 images. References

Fig. 3. Results of the proposed multi-peak scanning method for real image pairs (see text for details). Table 3 RMSE values between the registered image pairs for the real test cases exploring multiple peaks for the estimation of the scaling and rotation parameters, but only a single peak for the translation parameters. Boston

Aerial 1

Aerial 2

Face

MRI

Bark

0.086

0.132

0.014

0.043

0.066

0.057

Finally, the rankings of the scaling/rotation and translation peaks corresponding to the solution obtained from the multi-peak approach are also shown in Table 2. Unlike the artificially generated cases, the translation peak for these solutions does not always correspond to the highest peak (rank 1) of the phase correlation function, but they clearly have a higher ranking than the scaling/rotation peaks. This reinforces the idea that the multi-peak approach is sometimes crucial for a robust estimation of the scaling and rotation parameters, but might not be so important for the translation parameters. 3.3. An efficient approach A final test is performed with a more efficient approach, where only the scaling and rotation parameters are estimated by exploring the P = 8 most prominent peaks of the phase correlation function, while the translation parameters are estimated from the highest peak. Multiple translation choices due to periodicity are also taken into account. Therefore, the number of choices to be evaluated in order to find the best solution is 4P, which is considerably more

[1] C.D. Kuglin, D.C. Hines, The Phase Correlation Image Alignment Method, in: Proc. of the IEEE Int. Conf. on Cybernetics and Society, 1975, pp. 163–165. [2] B. Srinivasa Reddy, B.N. Chatterji, An FFT-based technique for translation, rotation, and scale-invariant image registration, IEEE Trans. Image Process. 5 (8) (1996) 1266–1271. [3] Akira Nikaido, Eiko Kosuge, Ryota Kawamata, Isamu Kashima, A dental radiograph recognition system using phase-only correlation for human identification, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 91 (1) (2008) 298–305. [4] S. Leprince, S. Barbot, F. Ayoub, J.-P. Avouac, Automatic and precise orthorectification, coregistration, and subpixel correlation of satellite images, application to ground deformation measurements, IEEE Trans. Geosci. Remote Sens. 45 (6) (June 2007) 1529–1558. [5] Hiroshi Nakajima, Koji Kobayashi, Tatsuo Higuchi, A fingerprint matching algorithm using phase-only correlation, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 87 (3) (2004) 682–691. [6] K. Miyazawa, K. Ito, T. Aoki, K. Kobayashi, H. Nakajima, An effective approach for iris recognition using phase-based image matching, IEEE Trans. Pattern Anal. Mach. Intell. 30 (10) (Oct 2008) 1741–1756. [7] K. Takita, M.A. Muquit, T. Aoki, T. Higuchi, A sub-pixel correspondence search technique for computer vision applications, IECIE Trans. Fundam. E87-A (8) (2004) 1913–1923. [8] A. Alba, E.R. Arce-Santana, R.M. Aguilar-Ponce, D.U. Campos-Delgado, Phasecorrelation guided area matching for realtime vision and video encoding, J. Real-Time Image Process. 9 (4) (2014) 621–633. [9] H. Foroosh, J.B. Zerubia, M. Berthod, Extension of phase correlation to subpixel registration, IEEE Trans. Image Process. 11 (3) (2002) 188–200. [10] V. Argyriou, T. Vlachos, A study of sub-pixel motion estimation using phase correlation, in: BMVC, Citeseer, 2006, pp. 387–396. [11] A. Alba, R.M. Aguilar-Ponce, J.F. Vigueras-Gomez, E. Arce-Santana, Phase correlation based image alignment with subpixel accuracy, in: Ildar Batyrshin, Miguel Gonzlez Mendoza (Eds.), Advances in Artificial Intelligence, vol. 7629 of Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, 2013, pp. 171–182. [12] G. Bradski, The opencv library, Dr. Dobb’s J. Softw. Tools 25 (11) (2000) 120–126. [13] G. Yang, C.V. Stewart, M. Sofka, Chia-Ling Tsai, Registration of challenging image pairs: initialization, estimation, and decision, IEEE Trans. Pattern Anal. Mach. Intell. 29 (11) (2007) 1973–1989. [14] University of Oxford Visual Geometry Group. http://www.robots.ox.ac.uk/ vgg/ data/, 2004.