Computers and Electrical Engineering 38 (2012) 1346–1357
Contents lists available at SciVerse ScienceDirect
Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng
Local Upsampling Fourier Transform for accurate 2D/3D image registration q Cailing Wang a,b,⇑, Xiaoyuan Jing a, Chunxia Zhao b a b
School of Automation, Nanjing University of Posts & Telecommunications, Nanjing 210003, PR China School of Computer Science, Nanjing University of Science & Technology, Nanjing 210094, PR China
a r t i c l e
i n f o
Article history: Available online 8 May 2012
a b s t r a c t An accurate image registration method based on Local Upsampling Fourier Transform (LUFT) is developed in this paper. It uses a hierarchical strategy to estimate more accurate image pair’s registration parameters, which consists of a coarse estimation and a robust and efficient refinement stage as well. The initial parameter is estimated through a conventional Phase Only Correlation (POC) method in the coarse stage, and then it is refined by the Local Upsampling Fourier Transform in frequency domain to achieve higher accuracy. Furthermore, as will be shown in many experiments, the LUFT can achieve more accurate translation and rotation estimation, and it is efficient, robust to noise, and it can be applied to accurate 2D and 3D image rotation and translation estimation. Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs from different sensors, from different times, or from different viewpoints. High accuracy image registration is an important and fundamental task in many fields such as computer vision [1], medical imaging [2], image analysis [3] and video processing [4]. The main approaches that are used to study this problem are pixel domain gradient methods [5], cross correlation techniques [6], and Discrete Fourier Transformation (DFT) based algorithms [7,8,4]. Image registration using Phase-Only Correlation (POC) in the Fourier domain have many merits and have attracted much attention. The POC utilizes the shift property of Fourier Transform (FT), which enables robust estimation of translations using normalized phase correlation [9,1]. The method works remarkably well for images degraded with noise, blur and illumination variation because of the immunity of phase to these factors. Furthermore, the POC is competent also when partial overlap. Actually, the POC has been one of the most predominant methods for image or signal translational parameter estimation. On the basis of POC, a lot of extensions have been developed. With corporation of log-polar transform [10], pseudo-polar transform [8] and pseudo-log-polar transform [11] etc., the rotation and scaling estimation are reduced to translational parameters estimation using POC. That is to say, POC has been a basic element in image registration process. Moreover, the POC is not only used to estimate the rigid transform parameters in 2D image space, and it is one of the important methods for non-rigid registration [12,13] and 3D registration [14,15]. In the non-rigid image registration, the non-rigid transform can be reduced to a combination of blocked rigid transforms [12], where the pair of registering images (from different wavelengths) are partitioned into multiple overlapping regions of interest (ROIs) and the standard 2D POC are used to find registration parameters in each of these areas. For the case of 3D registration, 3D POC is used to estimate
q
Reviews processed and approved for publication by Editor-in-Chief Dr. Manu Malek.
⇑ Corresponding author at: School of Automation, Nanjing University of Posts & Telecommunications, Nanjing 210003, PR China. Tel.: +86 2584315751. E-mail addresses:
[email protected] (C. Wang),
[email protected] (X. Jing),
[email protected] (C. Zhao). 0045-7906/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compeleceng.2012.04.005
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
1347
the 3D translation, such as video alignment [16], and 3D rotation is always transformed to a translation estimation using Cylindrical Phase Correlation Method (CPCM) in cylindrical coordinate system [14,15]. In a word, the POC is an important method for non-rigid or 3D registration. However, the conventional POC technique is always as a rough estimation or an initial estimation before other more accurate methods. The most commonly used approach to subpixel registration is based on interpolation, such as correlation interpolation, intensity interpolation, and phase correlation interpolation [7] etc., which always work on spatial domain and are sensitive to noise, and its accuracy depends highly on the quality of the interpolation algorithms. A second possible approach without interpolation is based on optimization function for subpixel registration. This kind of methods are iterative through gradient descending and highly dependents on image intensity conservation assumption and would fail when applied across multiple spectral bands or when considerable amount of luminance variations are present between frames. Recently, the POC has been extended to subpixel alignment by means of the analytic expression of POC on downsampled images in temporal domain [17]. However, the method uses a sine function to approximate the Dirac Delta function, which limits the accuracy of the method. Another method is the zero-padding of the phase correlation spectrum in frequency domain [18], however it is a global operator, the memory and computational time consuming increases rapidly as accuracy increases. In this paper, we have developed a Local Upsampling Fourier Transform (LUFT) method in frequency domain for accurate image registration. The technique uses a matrix form upsampling transform in the local region around an initial peak after Fourier Transform to achieve robust, efficient and accurate parameter estimation. It works after a standard POC on image pairs, and uses its peak as a parameter to construct an upsampling transform matrix, and the matrix form inverse Fourier Transform works on a small region around the POC peak to get a delta translation to achieve subpixel registration. Because it works in frequency domain and operates as a fast inverse Fourier Transform on a local neighborhood, it is efficient and insensitive to noise, moreover, the translation estimation precision relies on the initial peak and a factor of upsampling step. In many other applications, in association with LUFT and log-polar transform [10], pseudo-polar transform [8] and pseudolog-polar transform [11] etc., the accurate rotation parameters can be estimated in a similar way. The LUFT can be extended to the registration of 3D data sets, accurate 3D registration for object matching can be achieved through an LUFT in cylindrical coordinate [14,15,19] and the accurate 3D registration for video alignment [16] can be achieved directly through a 3D LUFT. The paper remains is organized as follows. In Section 2, we give the basic image motion estimation theory based on POC. In Section 3, we describe the LUFT and the accurate image registration algorithm. Experimental results are then given in Section 4 on 2D and 3D data sets. Finally, in Section 5, some concluding remarks are provided. 2. Frequency response of image motion and basic motion estimation theory When one image is converted by the DFT, defined as Eq. (1), the M N image f(x, y) in spatial domain is mapped to the amplitude spectrum and the phase spectrum of F(u, v) in frequency domain. N1 h ux v yi XX 1 M1 þ u ¼ 0; 1; . . . ; M 1; Fðu; v Þ ¼ pffiffiffiffiffiffiffiffi f ðx; yÞ exp j2p M N MN x¼0 y¼0
v ¼ 0; 1; . . . ; N 1
ð1Þ
Images have many superior features when they are transformed into frequency domain. For example, the amplitude spectrum contains the relative weights between frequency components, while the phase spectrum localizes these frequency components in space. 2.1. Frequency response of image motion The Fourier shift theorem shows that the translational motion in spatial domain is a linear phase difference in the Fourier domain. Moreover, the theory of time–frequency analysis tells us that the rotation and scale in spatial domain is synchronous with those of amplitude spectrum in frequency domain. Then, the rotation and scale of image are homogeneous with the translations of its amplitude spectrum in log-polar space, which is the frequency response phenomenon of image rotation and scale. That is to say, the rotation, scale and translation are decoupled when the signal is transformed to the frequency domain and the three components can be estimated separately [20]. Furthermore, the cases of 1D or 3D rigid transformation are similar with 2D. 2.2. Basic motion estimation theory The inverse interpretation of Fourier shift theorem has been the most essential theoretical principle for 1D, 2D or 3D signal motion estimation in frequency domain. The POC is a novel method usually used to estimate the motion parameters between two signals in frequency domain [21]. The basic POC theory [20] is described as follows. For example, given two images f(x, y) and g(x, y), assumed that f(x, y) = g(x x0, y y0), F(u, v) and G(u, v) are Fourier Transforms of f(x, y) and g(x, y), respectively. Then based on the Fourier shift theorem, F(u, v) and G(u, v) satisfy:
1348
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
ux v y 0 Fðu; v Þ ¼ Gðu; v Þ exp i2p þ 0 M N
ð2Þ
The normalized cross-correlation spectrum based on POC theory is
Rfg ðx0 ; y0 Þ ¼
ux Fðu; v ÞG ðu; v Þ v y 0 ¼ exp i2p þ 0 jFðu; v ÞG ðu; v Þj M N
ð3Þ
where (⁄) denotes complex conjugation, M and N are images’ dimensions. The phase of the cross-correlation spectrum is the phase difference between the two images. The approach to recover (x0, y0) is to inverse Fourier Transform the Eq. (3) to get the Response Function of Cross-Correlation Spectrum (RFCCS), which is a 2D Dirac Delta function d(x x0, y y0) centered at (x0, y0). Therefore, to estimate rotation, the two images’ amplitude spectrums A(u, v) and A0 (u, v), where A(u, v) = A0 (R(Da)(u, v)) and R(Da) is a rotation matrix, are projected to log-polar space, then Al ða; rÞ ¼ A0l ða þ Da; rÞ, where (l) denote a log-polar coordinate system, which is also called Fourier–Mellin Transform [11]. That is to say, the rotation is transformed to translation, which can be estimated by POC. Then the registration parameters of translational and rotational transform between two 2D signals can be estimated by the standard 2D POC method, using their amplitude and phase spectrums. Moreover, the registration precision is determined by the localization precision of the peak of RFCCS limited by Signal Noise Ratio (SNR) of them, and conditional POC method can only reach pixel-level. The use of frequency domain-based methods for 3D data registration is novel and suggests a new approach to this classical problem [15]. The 3D registration method using the 3D POC or the POC in cylindrical transformation [14] is similar with the case of 2D [16]. Similar as decoupling of rotation and translation in 2D Fourier domain, the 3D rotation, which is transformed to the rotation of amplitude, can be decoupled with the 3D translation in the 3D Fourier domain [19], then, we introduce the rotation estimation of 3D data’s amplitudes in detail. A 3D rotation matrix has three degrees of freedom, in fact, two of these degrees define the direction of the rotation axis and the other defines the angular displacement about this axis [15]. That is to say, once the rotation axis has been recovered, the rotation estimation is simplified to the rotation angle estimation about the rotation axis, which can be estimated on the 2D plane vertical to the axis. An algorithm for estimating the arbitrary rotation axis of a 3D data by determining the radial projections of the difference image through the (0, 0, 0) frequency point is presented by Luca Lucchese [15,19], for convenience of presentation, we assume the rotation axis is known and suppose it is the w-axis. The cylindrical coordinate transformation about w-axis is mC(a, r, w) = m(r cosa, r sina, w), where (C) denote a cylindrical coordinate system. To 3D amplitudes f ð~ uÞ and gð~ uÞ, gð~ uÞ ¼ f ðRw ðDaÞð~ uÞÞ, where Rw ðDaÞ is the rotation matrix for rotation about w-axis [14]. The rotation of the amplitude f ð~ uÞ by an angle Da is transformed to the shift D~ uC ¼ ðDa; 0; 0Þ of fC in the cylindrical coordinate system, as is shown in Eq. (4),
g C ð~ uC Þ ¼ fC ðð~ uC þ D~ uC ÞÞ
ð4Þ
Now it is clear that the rotation angle Da of the 3D data can be transformed to the translation by cylindrical coordinate transformation, the only parameter Da simplifies this problem to a 1D case, and then is estimated by POC. 3. Image registration using Local Upsampling Fourier Transform Based on the POC theory in Section 2, the precision of the peak of RFCCS decides the precision of image registration. An accurate image registration method called LUFT based on the 2D local upsampling in frequency domain is developed in this section. The classical subpixel registration is based on interpolation or optimization, such as correlation interpolation, intensity interpolation, phase correlation interpolation and sine function approximation, which have not completely gotten over the defects on efficiency and robustness. The LUFT adopts the local upsampling matrix Fourier transform to detect the optimal peak of cross-correlation spectrum, to estimate higher precision translational displacement, and to achieve accurate image rotation estimation corporate with polar-like transform subsequently. Another usual POC approach to estimate the subpixel displacement is to compute a global upsampled cross correlation, known as zero padding, between a spectrum to be registered and a reference spectrum by means of Fast Fourier Transform (FFT), and to locate its peak [22]. The computational burden and memory requirement of these approaches increase sharply when the required estimation accuracy increases, as a result, applications of these approaches are limited. Instead of upsampling the global cross correlation spectrum, the cross correlation spectrum of LUFT is upsampled in a very small neighborhood around the initial estimate of the peak through the local upsampling matrix Fourier Transform, so the memory and computation burden of the LUFT decrease sharply. It is important to note that the local frequency upsampling technique is not an additional approximation, simultaneously it may seem that we discard the information outside the valid region and the achievement is an efficient inverse Fourier Transform without any iteration. The detailed theory and algorithm is presented in the following. 3.1. Upsampling matrix inverse Fourier Transformation and LUFT algorithm Given a 1D discrete signal f(X), X ¼ ðx0 ; . . . ; xN1 ÞT , Fourier transform of the signal in matrix form is defined as
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
FðUÞ ¼ E f ðXÞ
1349
ð5Þ
where U ¼ ðu0 ; . . . ; uNB 1 ÞT and
0
ei2px0 u0
B ... B B i2px0 uk E¼B e B B @ ... ei2px0 uNB 1
ei2pxk u0
...
...
...
...
...
...
i2pxk uk
...
i2pxN1 uk
...
e
... ... . . . ei2pxk uNB 1
ei2pxN1 u0 e
... ... . . . ei2pxN1 uNB 1
1 C C C C C C A
ð6Þ
Eq. (5) is the Fourier Transform (FT) in matrix form deduced from the representation by Riemann sum of continuous FT, however, if NB is equal to N Eq. (5) is the DFT of the signal. Now, let’s extend the above definition to 2D, given an NA NA 2D signal f(X, Y), X ¼ ðx0 ; . . . ; xNA 1 ÞT and Y ¼ ðy0 ; . . . ; yNA 1 ÞT , the matrix FT of the signal is defined as
FðU; VÞ ¼ E1 f ðX; YÞ E2 T
ð7Þ T
T
T
where E1 ¼ ei2pUX , E2 ¼ ei2pYV , U ¼ ðu0 ; . . . ; uNB 1 Þ , V ¼ ðv 0 ; . . . ; v NB 1 Þ . Based on the definition of matrix FT, the upsampling matrix inverse Fourier Transformation is defined in the following. Given an input m (usually 1 6 m < 2), the size of upsampled region, and k, the precision coefficient in spatial domain, and (xM, yM), the initial estimation of the peak location of RFCCS from Eq. (3), such that construct the range L
n m m m mo L ¼ ðx0 ; y0 Þ; xM x0 xM þ ; yM y0 yM þ 2 2 2 2
ð8Þ
To an NA NA frequency spectrum F(U, V), the upsampling matrix inverse Fourier transform on L is defined on the basis of the matrix FT:
( 0
0
f ðX ; Y Þ ¼
T
T
ei2pX’U FðU; VÞ ei2pVY’
X 0 2 L; and Y 0 2 L
0
X 0 R L or Y 0 R L
ð9Þ
Y'[ y '1 , y ' N B ]
Y[ y1, yNA ]
where X0 and Y0 are spatial distributions of the upsampling, and as is shown in Fig. 1, L is indicated by the broken line rectangle centered at (xM, yM), X and Y are spatial distributions of RFCCS. As is known that the discrete frequency spectrum corresponds to a continuous periodic signal in spatial domain, therefore, in order to obtain a higher resolution spatial domain signal, we can increase the sampling rate of the signal. It is assumed that the sampling step of x0 and y0 is 1/k < 1 pixel (k > 0 is the precision coefficient which can be specified by users). The upsampling matrix inverse Fourier Transformation operator restricts the support of the RFCCS, the inner rectangular ½m=2; m=2 ½m=2; m=2 in Fig. 1 is the interval of uncertainty of the initial estimation. The phase correlation peak might be anywhere within this rectangular. T T From Eq. (6), we find that the constructions of base ei2pXU and ei2pVY are of consequence for upsampling of cross correlation spectrum. Firstly, after (xM, yM) is estimated and the upsampling neighborhood is defined according to m, the neighborhood is resampled with step 1/k to get vectors X0 and Y0 , as is known, U and V are frequency distributions of the cross correlation spectrum. Finally, the bases are constructed in the variable space [X 0 ; Y 0 ; U; V] based on Eq. (6). The usual DFT and FFT require that N B ðN B ¼ m kÞ equals to NA and dx du ¼ N 1 A (usually m = 1–2 pixel, dx is the step in spatial domain, du is the step in frequency domain). However, NB is not always equal to NA, so the Eq. (9) can not be substituted by DFT or FFT. In the procedure of the image translation estimation, the cross correlation spectrum of the image pair is constructed by Eq. (3), the inverse Fourier Transformation of which is the Dirac Delta function (Fig. 2(a)), the location of the peak denoted as 0 Þ is the initial estimate of the translation. The cross correlation spectrum is resampled in the neighborhood around ð n0 ; g 0 Þ, then we get local upsampled cross correlation spectrum depicted by color surface in Fig. 2(b), the location of the peak ð n0 ; g
xM
yM
X'[ x'1 , x'N B ] X[ x1 , x N A ] Fig. 1. The range of the upsampling.
1350
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
0.16
peak of LUFT 0.18
0.14
peak of POC
0.16
0.12
0.14 0.1 0.12 0.08
0.1 0.08
0.06
0.06 0.04 0.04 0.02 0.02
10
0
0
5 -0.02
0 1
2
3
4
5
6
7
8
9
10
(b) RFCCS after local upsampling Fig. 2. RFCCS of POC and LUFT (m = 2, k = 10).
denoted as ðDn0 ; Dg0 Þ is the offset of the translational displacement. The final estimate of the translation of amplitude spec~ 0 Þ is achieved by Eq. (10) trum ð~ n0 ; g
~n0 ; g ~ 0 ¼ n0 ; g 0 þ ðDn0 ; Dg0 Þ
ð10Þ
~ 0 ) is the translation estimate of the image pair. This estimation method is called LUFT method for image where (~ n0 ; g registration.
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
1351
Fig. 2(a) shows the RFCCS of POC. The location of the peak on the spectrum, which is pixel accuracy, is the initial estimate of the cross correlation spectrum. We can not achieve high accuracy translation estimation from that. In Fig. 2(b), the cross correlation spectrum of local upsampling, which is depicted by color surface, and the cross correlation spectrum of POC denoted by gray surface are plotted together. We can conclude from the comparison that location of the peak is more accurate after local upsampling. 3.2. LUFT based image registration procedure The flow chart of image translation estimation based on LUFT is shown in the following: As depicted in Fig. 3, the detailed process of image translation estimation based on LUFT is described as follows: (1) (2) (3) (4) (5) (6) (7)
Input image pair. The POC algorithm is used to construct the cross correlation spectrum, and then the coarse translation T^ is estimated. Detect precision coefficient k, if k > 1 continue, else if k 6 1, DT^ is set to zero and go to step (7). Construct an upsampled region around the initial estimate of the peak location of the cross correlation spectrum. The kernel functions are constructed according to T^ and upsampled region. ^ Compute the local upsampling cross correlation spectrum and estimate the refined translation offset DT. ^ and save the final estimates. Correct the translation of coarse estimate T^ with the refined offset estimate DT,
To estimate the image rotation, amplitude spectrums of the image pair are transformed to the polar space by the log-polar transform, pseudo-polar transform etc., and then the rotational and scaling estimation can be derived from translational estimate based on the viewpoints in Section 2. The accurate rotation estimation is achieved through an inverse log-polar transform on translation estimation by LUFT. Furthermore, the LUFT idea for 2D image registration can be extended to the 3D case. The detailed procedure for 3D rotation estimation is described as follows: (1) Transform the two data to 3D Fourier domain, and get amplitude spectrums. (2) Estimate the rotation axis according to Luca’s method [15]. Input image pair POC based algorithm
POC
POC
Cross-correlation spectrum
R fg
Tˆ No
precision coefficient K>1
Frequency upsampling
Yes
Construct kernel functions
ΔTˆ = 0 Local upsampling of the crosscorrelation spectrum
ΔTˆ
T = Tˆ + ΔTˆ
end Fig. 3. The flow chart of image translation estimation based on LUFT.
1352
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
(3) Project the amplitude spectrums to the cylindrical coordinate system (a, r, w), whose w-axis is consistent with the rotation axis of the data. The rotation is transformed to a 1D translation. (4) Estimate the accurate translation by LUFT. (5) Project the translation in the cylindrical coordinate system to the rotation angle in the Cartesian coordinate system. The 3D data rotation can be determined by a rotation axis and an estimated rotation angle in Cartesian coordinate system. 4. Experimental results and discussion In this section, the performance, robustness, efficiency, and applications of LUFT for image registration are introduced systematically. Firstly, we introduce precision characteristics of the LUFT algorithm for image translation and rotation estimation, where the experiments on comparison with the novel phase difference [1] based, sine function approximation [17] based translation estimation methods and PPFT-based rotation estimation algorithm [8] are involved, the experiments on the selection of precision coefficient k are also included. Subsequently, we introduce the experimental results of its immunity to most typical noises. Then we introduce the computational complexity analysis of LUFT algorithm. Finally, we introduce the applications of LUFT for image mosaicing and 3D applications for video alignment and volume registration etc. 4.1. The performance of the LUFT algorithm The comparison experiments between classical subpixel registration methods and LUFT are designed to evaluate the accuracy of LUFT for translation and rotation estimation. Comparison experiments between Foroosh’s subpixel method, phase difference method [1], and our LUFT method for accurate translation estimation are designed on classical sample images of ‘‘Pentagon’’ and ‘‘Paris’’, which are shown in Fig. 4. And the comparison results of different registration methods are listed in Table 1. From Table 1, among these subpixel methods, it can be seen that the LUFT method exceeds the method of Foroosh, which uses analytic expressions for the phase correlation spectrum in temporal domain to get subpixel translation, and the LUFT’s precision is close to the phase difference method and has a little superiority. The statistical comparison between the LUFT and log-polar transform based rotation estimation algorithm and the PPFTbased algorithm [8], a classical accurate rotation estimation algorithm, are listed in Table 2, where the test images shown in Fig. 5 are in keeping with the images used in PPFT-based algorithm. Each of the image is rotated with hð0 h 175 , stepped by 5 ). The images are translated randomly with (Dx; Dy) in the range of [20, 20] [20, 20] pixels. In Table 2, the average (indicated by Mean) and maximal (indicated by Max.) rotation estimation errors of the conditional POC-based algorithm, LUFT-based algorithm and PPFT-based algorithm are listed respectively. In the experiments of LUFTbased algorithm, precision coefficient k is 10 and the size of neighborhood m is 1.5 image pixels. The Mean error hh means average error, ^ hhi and hhi is the estimated angle and ground truth angle at the ith stepped angle, and hh is defined as:
Fig. 4. Sample images for subpixel translation estimation comparison.
Table 1 The subpixel registration results comparison of different methods. Image
Ground truth
Foroosh’s method
Phase difference [1]
LUFT
Pentagon
(0.500, 0.500) (0.250, 0.500) (0.250, 0.500) (0.000, 0.750)
(0.480, 0.510) (0.280, 0.490) (0.250, 0.520) (0.000, 0.800)
(0.496, 0.493) (0.255, 0.498) (0.250, 0.520) (0.000, 0.744)
k = 100
(0.500, 0.500) (0.240, 0.500) (0.250, 0.500) (0.000, 0.750)
Paris
(0.167, 0.500) (0.670, 0.250) (0.330, 0.167) (0.330, 0.330)
(0.152, 0.490) (0.690, 0.330) (0.320, 0.150) (0.325, 0.320)
(0.163, 0.510) (0.679, 0.242) (0.331, 0.159) (0.333, 0.329)
k = 200
(0.165, 0.500) (0.670, 0.250) (0.330, 0.165) (0.330, 0.330)
1353
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357 Table 2 Statistical errors of rotation estimation (°). Error
POC
PPFT-based algorithm [8]
a b c d
LUFT (k = 10)
XN (normalized)
X Mean
Max.
Mean
Max.
Mean
Max.
Mean
Max.
0.82 0.89 0.86 0.87
1.48 1.32 1.50 1.29
0.68 0.73 0.89 0.83
1.27 1.77 1.43 2.13
0.72 0.71 0.86 0.79
1.13 1.28 1.21 1.67
0.097 0.102 0.100 0.100
0.492 0.617 0.766 0.766
Fig. 5. Image samples for rotation estimation comparison experiments.
hh ¼
M X j^hhi hhi j
!, M
ð11Þ
i¼1
where h = a, b, c, d and M = 36 in the experiments. As is seen from the above experiments, the Mean errors are about 0.1°, and the Max errors are lower than 0.8° of LUFT in the given image SNR and precision coefficient. The proposed algorithm can achieve higher statistical precision than the PPFT-based algorithm for image rotation estimation. Furthermore, from Section 3, it is found that the precision coefficient k affects the registration accuracy. To indicate the relationship between k and the LUFT accuracy for rotation estimation, we test the effect of the coefficient k to rotation estimation error on the image of Lena, whose size is 256 256. The effects of rotation estimation error with the precision coefficient k from 1 to 20 are shown in Fig. 6. When k increases from 1 to 5, the registration error decreases sharply, however, when k is bigger than 20, the estimation error is convergent. In brief, the precision coefficient k lies in the suggested interval of 5–20, which is a nearly linear interval, we can achieve a preferable rotation estimation.
4.2. The robustness of LUFT to noise Another important factor impacts the registration precision is the Signal Noise Ratio (SNR) of images. To assure the statistical validity of the experiments, we use 50 times statistical experiments to test the robustness of LUFT to white Gaussian noise, Salt & Pepper noise and Speckle noise. To evaluate the robustness of LUFT, we test rotation estimation error of the algorithm on 256 256 ‘‘cameraman’’ image under different noise types, such as white Gaussian noise, Salt & Pepper noise and Speckle noise. The random noises are added to the image pairs after one of them is rotated to a certain degree, we can get Fig. 7(b),whose SNR is about 0.7 dB, when the Gaussian noise is added to Fig. 7(a). The LUFT algorithm is then used to estimate the rotation angle between the noisy image pairs. The average error has been shown in Table 3. From the experiments, we find that when SNR degrades from 20 to 0.7 dB, the LUFT is more robust to Gaussian and Speckle noise than to Salt & Pepper noise. Moreover, the rotation estimation statistical errors are evaluated under different power of random Gaussian noises, and the error curve is shown in Fig. 7(c). Fig. 7(c) shows the statistical estimation error ranges and trend when the SNR of image pairs varies from 26.8 to 0.689 dB. The average error is less than about 0.2° when the SNR is more than 0.689 and when the noise power is lower than 24 dBW the registration error varies little. On the whole, the proposed algorithm is robust to noise.
1354
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
0.7
rotation estimation error (degree)
0.6
0.5
0.4
0.3
0.2
0.1
0 0
5
10
15
20
25
30
35
40
45
50
precision coefficient k Fig. 6. The effects of the precision coefficient k to rotation estimation error.
1 0.9 0.8
error (degree)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
5
10
15
20
25
power of noise (dBW)
(c) the statistical error of rotation estimation Fig. 7. The robustness experiment of LUFT to noise.
30
1355
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357 Table 3 Average rotation estimation error (°) under different noise type. Noise type
Gaussian Salt & Pepper Speckle
SNR 20 dB
12 dB
8 dB
2 dB
0.7 dB
0.01563 0.01563 0.01563
0.01563 0.05781 0.01563
0.01563 0.10438 0.01563
0.02484 0.273047 0.04891
0.21328 2.09641 0.44047
4.3. The efficiency of LUFT for image registration From Section 3, we find that the LUFT based registration algorithm comprises two key steps: a coarse estimation and a refinement stage. Assuming the image is N N, the computational complexity of the coarse estimation, which includes a standard POC and the transformation of rotation and scale to translation, is OðN 2 log NÞ, because the Fast Fourier Transform (FFT) forms the main computational cost in the procedure [23]. However, the refinement technique is the kernel part of the LUFT method, and we will analysis the computational cost of the refinement stage in the following. The computational cost of the upsampling matrix inverse Fourier Transform, which is the critical part of the local frequency upsampling, involves two complex matrix products as is shown in Eq. (9). To an N N image pair, each element of the result is of N multiplications and N 1 additions, it means 8N 2 floating point operations. Since the two kernel matrices E1 and E2 can be calculated beforehand, in a similar fashion that FFT is generated, the total number of operations for the matrix inverse Fourier transform is therefore:
8ðN2 NB þ NN 2B Þ 2NN B 2N2B
ð12Þ
where N B ¼ k m, and m is the size of upsampled region, the computational complexity of local frequency upsampling of LUFT is OðN 2 kÞ when the precision coefficient is k, usually k > 1. We are noted that the computational complexity of FFT is OðN 2 log N 2 Þ, which is equivalent to that of local frequency upsampling with N 1024 and 1 k 20. Therefore, LUFT based algorithm doesn’t remarkably increase the computational burden relative to most of POC based algorithms which require at least 2D FFT once. On the whole, if compared with other POC based subpixel methods, such as zero-padding [18], interpolation-based methods [7], fitting-based methods [17] etc., the LUFT is not more time-consuming.
4.4. The applications of LUFT The LUFT method can be applied in lots of fields when requiring accurate image registration, such as 2D applications for precise image mosaicing and 3D application for accurate video alignment and volume registration as well. The 2D application for precise image mosaicing is shown in Fig. 8, where image samples are provided by Gehua Yang which is thought as one of challenging image pairs for mosaicing [24]. The result shown in Fig. 8(c) indicates that the LUFT can provide a more precise mosaicing scheme even with asymmetric illumination variance. The 3D application for video alignment, which aligns two videos in space and time, is usually based on POC based method [14]. For video alignment, the 2D image space and 1D time axis construct a 3D space, where we use the 3D extension of LUFT and 3D Fourier transform to estimate the subframe displacement in time and subpixel alignment in image space. We test the LUFT on two real-world videos, provided by C. Dai, where the ground truth of alignment is 63.931 frames [14], and the temporal displacement estimated by our LUFT is 63.9 frames with precision coefficient k ¼ 10, the results are shown in Fig. 9. Because the integer limit of the video frame number, we can not achieve the subframe image alignment, therefore the walking man can not be aligned seamlessly, but the background can be aligned in subpixel level after LUFT processing. After the two videos are aligned in subframe accuracy, the super-resolution of the images can be achieved. The issue on LUFT for accurate video alignment and super-resolution will be introduced in detail in our further study.
Fig. 8. The application for precise image mosaicing.
1356
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
Fig. 9. The application for accurate video alignment.
Fig. 10. 3D brain data and a slice example, volume size is 157 248 246 voxels with 938 lm spacing.
1 LUFT 0.9
CPCM
estimation error (degree)
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
50
100
150
200
250
300
350
rotation angle (degree) Fig. 11. Comparison of rotation estimation errors between LUFT and CPCM for 3D brain data.
The LUFT can be extended to the volume registration [25], which aligns two 3D volume data reconstructed from brain MRI slice images, whose volume size is 157 248 246 voxels with 938 lm spacing [26], shown in Fig. 10. In the experiment, the volume data (seen as Fig. 10(a)) is rotated around a fixed axis (Z axis) by a certain angle to get a rotated one (seen as Fig. 10(b)), and the Fig. 10(c) is a slice image perpendicular to the rotation axis. Then, the accurate rotation estimation between two volume data is achieved through main steps as 3D Fourier transform, cylindrical transform, LUFT, inverse cylindrical transform etc. In the experiment, we rotate the 3D brain data with hð0 h < 360 , stepped by 10 ). To assure the statistical validity of the experiments, we use 50 times statistical experiments for every rotation angle. The average estimation errors of LUFT and CPCM [14] are shown in Fig. 11, from which it is found that the LUFT-based method achieves higher statistical precision than the CPCM-based algorithm.
C. Wang et al. / Computers and Electrical Engineering 38 (2012) 1346–1357
1357
5. Conclusions In this paper, we have developed a novel Local Upsampling Fourier Transform method for accurate image registration. The high accuracy algorithm is on the basis of the local frequency upsampling Fourier Transform technology, which can estimate translation accurately by frequency upsampling within a local neighborhood, which serves as a refinement stage for the method of LUFT. The experiments and analysis presented in this paper have shown that more accurate results can be obtained for translation and rotation estimation by LUFT, and a lot of analysis and experiments prove that it is efficient and robust to noise. Furthermore, the experiments on image mosaicing indicate its potential applications on 2D accurate translation and rotation estimation, and experiments on video alignment and volume registration in 3D space indicate its applicability on 3D accurate translation and rotation registration, respectively. Acknowledgements The authors would like to thank the Natural Science Foundation of China for support under the Project No. 90820306, and thank Prof. Xin Li of West Virginia University for generously sharing his videos and resources. References [1] Murat B, Foroosh H. Subpixel registration directly from the phase difference. EURASIP J Appl Signal Process 2006;2006:1–11. [2] Ito K, Aoki T, Kosuge E, Kawamata R, Kashima I. Medical image registration using phase-only correlation for distorted dental radiographs. In: 19th International Conference on Pattern Recognition; 2008. p. 1–4. [3] Haghighat MohammadBagherAkbari, Aghagolzadeh Ali, Seyedarabi Hadi. Multi-focus image fusion for visual sensor networks in DCT domain. Comput Electr Eng 2011;37(5):789–97. [4] Ohyun K, Jeongho S, Joonki P. Video stabilization using Kalman filter and phase correlation matching. In: Proc. ICIAR 2005, Lecture Notes in Computer Science; 2005. p. 141–8. [5] Joshua G. Gradient field distributions for the registration of images. In: Proc. IEEE Int. Conf. on Image Processing; 2003. p. 691–4. [6] Francesco I, Maurizio P. A fast and robust image registration method based on an early consensus paradigm. Pattern Recognit Lett 2004;25(8):943–54. [7] Barbara Z, Jan F. Image registration methods: a survey. Image Vision Comput 2003;21:977–1000. [8] Keller Y, Shkolnisky Y, Averbuch A. The angular difference function and its application to image registration. IEEE Trans Pattern Anal Mach Intell 2005;27(6):969–76. [9] Ville O, Janne H. Image registration using Blur-invariant phase correlation. IEEE Trans Signal Process Lett 2007;14(7):449–52. [10] George Wolberg Siavash Zokai. Robust image registration using log-polar transform. IEEE Int Conf Image Process 2000;1:493–6. [11] Liu Hanzhou, Guo Baolong, Feng Zongzhe. Pseudo-log-polar Fourier transform for image registration. IEEE Signal Process Lett 2006;13(1):17–20. [12] Erivesa Hector, Fitzgeraldb Glenn J, Clarke Thomas R. Non-rigid registration of hyperspectral imagery for analysis of agronomic scenes. Biosyst Eng 2007;98:267–75. [13] Lee Duhgoon, Nam Woo Hyun, Lee Jae Young, Ra Jong Beom. Non-rigid registration between 3D ultrasound and CT images of the liver based on intensity and gradient information. Phys Med Biol 2011;56:117–37. [14] Bican Jakub, Flusser Jan. 3D rigid registration by cylindrical phase correlation method. Pattern Recognit Lett 2009;30:914–21. [15] Luca Lucchese, Gianfranco Doretto, Guido Maria Cortelazzo. A frequency domain technique for range data registration. IEEE Trans Pattern Anal Mach Intell 2002;24(11):1468–84. [16] Dai C, Zheng Y, Li X. Accurate video alignment using phase correlation. IEEE Signal Process Lett 2006;13(12):737–40. [17] Foroosh H, Zerubia JB. Extension of phase correlation to subpixel registration. IEEE Trans Image Process 2002;11(3):188–200. [18] Lucas J. van Vliet, Cris L. Luengo Hendriks. Improving spatial resolution in exchange of temporal resolution in aliased image sequences. In: 11th Scandinavian Conference on Image Analysis; 1999. p. 493–9. [19] Phillip Curtis and Pierre Payeur. A frequency domain approach to registration estimation in three-dimensional space. IEEE Trans Instruction Meas 2008;57(1):110–20. [20] Reddy BS, Chatterji BN. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans Image Process 1996;5(8):1266–71. [21] Kuglin CD, Hines DC. The phase correlation image alignment method. In: Proc. IEEE Int. Conf. on Cybernetics and Soc.; 1975. p. 163–5. [22] Soummer R, Pueyo L, Sivaramakrishnan A, Vanderbei RJ. Fast computation of Lyot-style coronagraph propagation. Opt Express 2007;15(24):15935–51. [23] Yinghuan Shi, Yang Gao, Ruili Wang. Real-time abnormal event detection in complicated scenes. In: International Conference on Pattern Recognition; 2010. p. 3653–6. [24] Yang GH, Stewart CV, Sofka M, Tsai CL. Registration of challenging image pairs: initialization, estimation, and decision. IEEE Trans Pattern Anal Mach Intell 2007;29(11):1973–89. [25] Mikula S, Trotts I, Stone J, Jones EG. Internet-enabled high resolution brain mapping and virtual microscopy. NeuroImage 2007;35(1):9–15. [26] Jolesz Ferenc A. Future perspectives for intraoperative MRI. Neurosurg Clin N Am 2005;16(1):201–13. Cailing Wang received the Ph.D. degree from Nanjing University of Science & Technology, China in 2012. Her research interests include image processing and pattern recognition. Xiaoyuan Jing received the Ph.D. degree from Nanjing University of Science & Technology, China in 1998. His research interests include pattern recognition, image processing, intelligent information processing and machine learning. Chunxia Zhao received the Ph.D. degree from Harbin Institute of Technology, China in 1998. Her research interests include image analysis, image understanding, and computer vision.