Registration of images with affine geometric distortion based on Maximally Stable Extremal Regions and phase congruency

Registration of images with affine geometric distortion based on Maximally Stable Extremal Regions and phase congruency

    Registration of images with affine geometric distortion based on maximally stable extremal regions and phase congruency Qiang Zhang, ...

2MB Sizes 0 Downloads 51 Views

    Registration of images with affine geometric distortion based on maximally stable extremal regions and phase congruency Qiang Zhang, Yabin Wang, Long Wang PII: DOI: Reference:

S0262-8856(15)00015-3 doi: 10.1016/j.imavis.2015.01.008 IMAVIS 3397

To appear in:

Image and Vision Computing

Received date: Revised date: Accepted date:

26 July 2014 3 January 2015 10 January 2015

Please cite this article as: Qiang Zhang, Yabin Wang, Long Wang, Registration of images with affine geometric distortion based on maximally stable extremal regions and phase congruency, Image and Vision Computing (2015), doi: 10.1016/j.imavis.2015.01.008

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Registration of images with affine geometric distortion based on

Qiang Zhanga,b, Yabin Wanga,b, Long Wangc a

T

maximally stable extremal regions and phase congruency

IP

Key Laboratory of Electronic Equipment Structure Design (Xidian University), Ministry of Education, Xi'an Shaanxi 710071, China

SC R

b

Center for Complex Systems, School of Mechano-electronic Engineering, Xidian University, Xi'an Shaanxi 710071,China c

Center for Systems and Control, College of Engineering, Peking University, Beijing 100871, China

NU

Abstract: This paper proposes a novel method to address the registration of images with affine transformation. Firstly, the Maximally Stable Extremal Region (MSER) detection method is performed on the reference image and the image to be registered, respectively. And the coarse affine transformation matrix between the two images is estimated by the matched MSER

MA

pairs. Two circular regions containing roughly the same image content are also obtained by fitting and normalizing the centroids of the matched MSERs from the two images. Secondly, a scale invariant and approximate affine transformation invariant feature point detection algorithm based on the Gabor filter decomposition and phase congruency is performed on the two coarsely aligned regions, and two feature point sets are achieved, respectively. Finally, the affine transformation matrix between the two

D

feature point sets is obtained by using a probabilistic point set registration algorithm, and the final affine transformation matrix between the reference image and the image to be registered is achieved according to the coarse affine transformation matrix and

TE

the affine transformation matrix between the two feature point sets. Several sets of experiments demonstrate that our proposed method performs competitively with the classical scale-invariant feature transform (SIFT) method for images with scale changes,

CE P

and performs better than the traditional MSER and affine-SIFT (ASIFT) methods for images with affine distortions. Moreover, the proposed method shows higher computation efficiency and robustness to illumination change than some existing area-based or feature-based methods do.

Key words: Image registration, Affine transformation, Maximally Stable Extremal Region, Phase congruency, Point set

1.

AC

registration

Introduction

Image registration is to overlay two or more images containing the same scene shot at different times, from different viewpoints or by different sensors [1]. It has been widely applied to many fields, such as image stitching [2], image fusion [3], and 3D reconstruction [4]. Because of the different structures caused by different viewpoints, the alignment for image pairs with affine geometric distortion is much more difficult than that for images with some simple transformations (e.g., translation, rotation and scaling). This paper proposes a novel method by using a two-step way, i.e., by combining a coarse alignment based on feature region detection and a refinement based on feature point detection, to address the registration of images with affine transformation. So far, there have been numbers of image registration algorithms, which can be divided into two categories [1], i.e., area-based and feature-based methods. The area-based image registration methods [5-9] generally employ some optimization algorithms to search the transformation model between two input images, in which some windows of predefined size or even the whole image is used [1]. However, 

Corresponding author. Address: P.O.Box 183, Department of Automatic Control, Xidian University, No.2 South TaiBai Road, Xi'an, Shaanxi Province, 710071, China. Tel: +86 029 88231936.Email address : [email protected] (Q.Zhang)

ACCEPTED MANUSCRIPT if there is large affine distortion between images, these area-based methods will obtain some unsatisfactory results. In addition, the huge parameter space of the transformation model often leads to high computation complexity [1]. Differently, the feature-based methods [10-15] utilize the salient features instead of image intensities to determine the transformation model between the two input

T

images. They are more robust to some complex geometric deformations, such as the affine distortion, and have higher computation efficiency than the area-based ones do [1]. Therefore, the feature-based

IP

methods are more widely used to align images with complex geometric distortions [1]. Especially, many feature-based methods have been proposed to align images with affine geometric

SC R

distortion. For example, Flusser and Suk [13] utilized closed-boundary region features to align the SPOT and Landsat images, in which the moment-based invariants were used to describe the regions. Yang and Cohen [14] computed the affine geometric invariants of the border triangles generated by object convex hull. They successfully recovered the affine transformation and mapped the reference

NU

object into the test object domain. Moreover, some affine invariant region detectors, such as Harris-Laplace [16], Hessian-Laplace [17], Maximally Stable Extremal Region (MSER) [18] and the improved MSER [19] have been proposed. These detectors can effectively detect invariant regions

MA

from the image pairs. And the centroids of the detected regions are often used to establish the geometrical relationship and estimate the affine transformation model. But the shapes of the detected regions are usually unstable because of some image distortions, such as affine distortion, scale change, illumination change and blur. This influences the localization accuracy of each detected region's

D

centroid, and reduces the final registration precision.

TE

Compared with region features, point features, such as Harris corners [20] and Scale Invariant Feature Transform (SIFT) [21] points, usually have more stable localization against those image distortions mentioned above. Higher registration precisions may be achieved by using point features

CE P

instead of region features to estimate the transformation model between the image pairs. For instance, Lin and Du et al. [15] used the correlation between the edge images to determine the homologous corner points, and then established the affine transformation model between the two input images. It should be noted that most of the feature point detection methods are not affine invariant, although some

AC

methods (e.g., SIFT) can obtain many correspondences from images with a narrow range of affine transformation. Generally, low feature repeatability rate [22]1 is obtained when these feature point detectors are directly performed on the images with affine geometric distortion, especially when the affine distortion is too large. This reduces the registration precisions to some extent. Recently, Morel and Yu [23] proposed a fully affine invariant feature detection method, i.e., Affine-SIFT (ASIFT), by which, good registration results can be obtained on images with large affine geometric distortion. However, the computation complexity of the ASIFT is very high, which limits its application in aligning images of large size. Considering the affine invariance of the region features and the high localization accuracy of the point features, more point correspondences with high feature repeatability rate will be obtained if the feature point detection is performed on the detected affine invariant regions instead of on the original image pairs. Correspondingly, the registration results will also be improved by the combination of point and region detections. In this paper, a novel affine image registration method is proposed based on the Maximally Stable Extremal Region (MSER) [18] and phase congruency [24, 25]. In the proposed algorithm, two circular image regions are firstly obtained by fitting and normalizing the matched MSERs, and the coarse affine

1

The feature repeatability rate is the percentage of the total observed features that are detected in both images. More details can be seen in [22].

ACCEPTED MANUSCRIPT transformation matrix between the two input images is thus estimated. Secondly, a point detection algorithm (GDPC, for short) based on the Gabor filter decomposition and the maximal phase congruency moment [26] is performed on the two circular regions. Two sets of points with scale and approximate affine invariance are thus obtained from the reference image and the image to be

T

registered, respectively. Finally, the point set registration method in [27] is performed on the two point sets and the transformation matrix between them is obtained. The final affine transformation matrix affine transformation matrix between the two feature point sets.

IP

between the input image pairs is achieved according to the coarse affine transformation matrix and the

SC R

The proposed method is an adaptation of a number of commonly used techniques. But it should be noted that the proposed method is not a simple combination of the existing methods. Three main contributions are made in this paper. (1) The proposed registration method is implemented by using a two-step way, i.e., the region feature based coarse alignment and the point feature based refined

NU

registration. By the combination of the two steps, the proposed method makes best use of the affine-invariance of the region detectors and the localization accuracy of the point detectors, and then outperforms some traditional registration methods, such as the SIFT, MSER and ASIFT methods, for

MA

images with scale or affine distortions. (2) A novel MSER based registration method is proposed in the coarse alignment step. Instead of directly using the matched MSER pairs to estimate the transformation matrix, we first obtain two coarsely aligned circular regions by fitting and normalizing the matched MSER pairs in the reference image and the image to be registered, and then obtain the coarse

D

estimation of the transformation matrix according to the relationship between the two circular regions.

TE

By this way, the parameters in the transformation matrix model are more easily estimated due to the fact that there mainly exists scale and rotation geometric distortions between the two circular regions after normalization. More feature points and higher feature repeatability rate in the subsequent

CE P

refinement step will also benefit from the two coarsely aligned circular regions considering the fact that they contain more common image contents than any individual matched MSER pair. In addition, we present a new method to exclude the mismatched MSER pairs by using a voting mechanism in the coarse alignment of the sub-regions. (3) A phase congruency based feature point detection method (i.e.,

AC

GDPC method) and a point set registration method are jointly employed to refine the coarse alignment. The employed GDPC point detector improves the robustness of the proposed registration method to illumination change. And the employed point set registration algorithm avoids the construction of descriptors for the detected feature points in the refinement step and increases the computation efficiency of the proposed method to some extent. The rest of the paper is organized as follows. Section 2 gives a brief introduction to the related work. In Section 3, the diagram of the proposed image registration approach is provided, which consists of three parts. And the three parts will be described in detail in Section 4, Section 5 and Section 6, respectively. Experimental results are presented in Section 7 and some conclusions are drawn in Section 8. 2.

Related work In general, the feature-based image registration methods consist of four basic steps [1], i.e., feature

detection, feature matching, transformation model estimation, and image re-sampling and transformation, among which feature extraction and matching are paid more attentions due to their crucial roles. In the past several decades, numbers of feature point detection methods have been proposed, and

ACCEPTED MANUSCRIPT most of them are based on image gradient. For example, Harris corner detector [20], proposed by Harris and Stephens, is rotation invariant and is taken as the best one to extract “L” junctions [28]. However, Harris detector is sensitive to contrast and noise and is not scale invariant. Scale Invariant Feature Transform (SIFT) [21], a landmark in feature detection, is robust to noise, scale and

T

illumination change to some extent. As well, it has a good performance on images with a narrow range of affine transformation [21]. Various improved algorithms based on the SIFT have also been proposed,

IP

such as Speed Up Robust Features (SURF) [29], Principal Components Analysis SIFT (PCA-SIFT) [30] and Gradient Location-orientation Histogram (GLOH) [31]. However, these methods may fail in case

SC R

of large affine distortions. Recently, a fully affine invariant algorithm, i.e., Affine-SIFT (ASIFT), has been proposed [23]. The ASIFT firstly simulates several possible affine distortions caused by varying orientation parameters of the camera axis and then applies the SIFT method to detect feature points on each simulated image, by which a large amount of feature points with affine invariant are obtained.

NU

Sufficient correct matches are obtained by the ASIFT, but many more useless feature points are also produced, which leads to a low feature repeatability rate. Furthermore, these redundant feature points cost a huge amount of memory space and query time, which limits its applications in detecting features

MA

from large size images.

In addition, these image gradient based feature point detection methods are sensitive to some image distortions, including geometric distortion (e.g., scale, rotation and affine transformations), illumination change and noise. In [24], Kovesi proposed a dimensionless measure for the significance

D

of a local structure, i.e., phase congruency. Rather than defining features at points with sharp changes

TE

in intensity, the phase congruency indicates that image points, whose Fourier components are maximally in phase, can be regarded as significant features. The phase congruency has been widely used in many image processing tasks [25, 32-34]. Particularly, in [25], Kovesi applied the phase

CE P

congruency to 'corner' point detection by using the maximum and minimum moments. The proposed detector shows good localization and high robustness to the contrast, illumination and noise. However, similar to the Harris corner detector, it is also not scale invariant. Besides the point-based feature detection algorithms, some other feature detection methods were

AC

proposed, such as the edge-based [35, 36] and region-based ones [16-18]. Especially, the region-based feature detection methods, for example, Harris-Laplace [16], Hessian-Laplace [17] and MSER [18] are usually more robust to the affine geometric distortion and have high computation efficiencies. However, the shapes of the detected regions are usually unstable, which will lead to lower accuracy of localization (e.g., the computation of the detected region's centroid) than the detected points. In addition to the feature detectors, the descriptor, i.e., how to describe the detected features uniquely and stably, is another important factor in these feature detection and matching methods. In order to improve the distinctiveness and correct matching rate of the detected features, high dimensions of descriptor vectors are generally required. For example, the dimension of the SIFT descriptor vector is 128, which increases the computation complexities of the subsequent processing tasks (e.g., feature matching). Recently, in order to achieve lower computational cost, some fast binary descriptors such as Binary Robust Invariant Scalable Keypoints (BRISK) [37], Binary Descriptor Very Fast (BRIEF) [38], and Fast Retina Keypoint (FREAK) [39] were proposed. But these methods often compromise between the matching accuracy and the calculating speed [40]. Differently, in [27], a novel point set registration algorithm was presented for the affine transformation model, in which, the descriptors for the feature points are not required and only the probability distributions of the feature points are employed. Moreover, the transformation and correspondences between the two point sets can be simultaneously

ACCEPTED MANUSCRIPT determined by the algorithm. 3.

Diagram of the proposed image registration method As shown in Fig.1, the proposed image registration method consists of three main parts:

T

sub-regions coarse matching, feature points extraction and transformation matrix estimation. (1) Sub-regions coarse alignment based on MSER detection and matching: MSERs [18] are first extracted

IP

and matched from the reference image and the image to be registered, respectively. Then two circular regions are obtained by fitting and normalizing the matched MSER pairs in each image, and the coarse

SC R

affine transformation matrix between the two input images is calculated. (2) Feature point detection based on Gabor filter decomposition and phase congruency: A series of band-pass Gabor filters are first constructed to decompose the circular regions obtained above into several sub-bands. Then, two sets of feature points are obtained by performing the maximum phase congruency moment based feature

NU

detection algorithm on the two sets of sub-bands, respectively. (3) Final transformation matrix estimation: A probabilistic model [27] is first built to describe the distributions for the two feature point sets (i.e., the two sets of feature points detected above). Then, according to the distributions of the two

MA

point sets, the parameters of the affine transformation model between the two point sets are achieved. Finally, the final transformation matrix between the two input images is calculated according to the coarse affine transformation matrix between the two input images and the affine transformation matrix

CE P

TE

D

between the two feature point sets. In the following sections, we will discuss each part in detail.

Fig. 1 Diagram of the proposed image registration algorithm.

Sub-regions coarse alignment based on MSER detection and matching

AC

4.

In general, each detected or matched MSER in the reference image or the image to be registered can be directly employed in the subsequent processing (e.g. point detection and transformation matrix estimation). However, one MSER often contains limited features, which will increase the transformation matrix estimation error or reduce the numbers of the detected feature points. Therefore, in the proposed algorithm, all of the matched MSERs are firstly fitted and normalized into one circular region in the reference image and the image to be registered, respectively. Then the coarse affine transformation matrix between the two input images is calculated according to the two circular regions. Moreover, the two circular regions will be employed again in the subsequent feature point detection to detect feature points with high repeatability rate, which will be discussed in Section 5. 4.1 MSER detection and matching The Maximally Stable Extremal Region (MSER) detector [18] is based on the Watershed segmentation and is affine invariant. In addition, it has the advantages of high computation efficiency and detection accuracy over some other affine invariant region detectors [41], such as Harris-Laplace [16] and Hessian-Laplace [17]. Therefore, we employ the MSER detector to coarsely estimate the

ACCEPTED MANUSCRIPT affine transformation matrix between the two input images. In the MSER extraction algorithm [18], MSERs are defined by an extremal property of the intensity function in an image region and on its outer boundary. Let Qt be an extremal region of an image. The pixel values inside it are either strictly higher or lower than those on its boundary, where the intensity value is exactly equal to t . Then the stability of Qt is defined as

T

A  Qt  d A  Qt  dt

(1)

IP

  Qt  

SC R

where A  Qt  denotes the area of Qt . The region Qt is called maximally stable if   Qt  has a local maximum at t . By the MSER algorithm, two sets of the MSERs are detected from the reference image and the image to be registered, respectively. More details can be seen in [18].

NU

Each detected MSER with irregular shape is firstly fitted into an elliptical region and then normalized into a circular one, which is further described by a 128-element vector. In this paper, we employ the same matching strategy as that in [21] to obtain the matched MSERs, which are represented

MA

as

 Ω , Ω  Ω i

i

i

 I , Ωi  I , i  1, 2,..., K 

(2)

where, I and I  are the reference image and the image to be registered, respectively.

 Ωi , Ωi 

D

represents the i th matched MSER pair and K denotes the total number of the matched pairs.

TE

4.2 Fitting of matched MSERs

As discussed in the earlier part of this section, each matched MSER Ωi (or Ωi ) may contain

CE P

limited features. Therefore, all of the MSERs in the reference image or the image to be registered are fitted into a big elliptical region to provide more useful information before further being processed. The fitting of the matched MSER pairs is computed as follows.

 xi , yi 

T

Firstly, compute the region centroids

and

 xi, yi 

T

for each pair of matched MSERs

AC

Ωi and Ωi in Eq. (2), and obtain a set of points P and P ' for the reference image and the image to be registered, respectively, i.e.,

  P  x x   x, y  , x  I , i  1, 2,..., K  P  xi xi   xi , yi  , xi  I , i  1, 2,..., K T

(3)

T

i

i

i

(4)

i

Secondly, for the point sets P and P , compute their mean and covariance matrix as 1 1 T μ   xi , U    xi  μ  xi  μ  K xi P K xi P

μ 

1 K

 x , i

xi' P

U 

1 K

  x  μ x  μ

T

xiP

i

(5) (6)

i

where, μ and U represent the mean and the covariance matrix of P , respectively. μ  and U represent the mean and the covariance matrix of P . In general, both U and U are symmetric and invertible matrices2. Finally, fit all of the points in P and P (i.e., the matched MSERs

Ωi | i  1, 2,..., K

and

The covariance matrix U or U is not invertible when all of the points in P or P lie on a straight line. However, this case rarely happens. 2

ACCEPTED MANUSCRIPT Ωi | i  1, 2,..., K ) into two elliptical regions determined by Eq. (7) and Eq. (8), respectively. x  μ

T

 x  μ

(7)

U1  x  μ  1

(8)

MA

NU

SC R

IP

T

T

U1  x  μ   1

Fig. 2 MSER detection and matching results on two images with affine distortion. (a) The initially fitted elliptical region in the

D

reference image; (b) The initially fitted elliptical region in the image to be registered. (a) and (b) have the same size of

TE

640  800 . The points marked on the images are the centroids of the matched MSERs. Parts of the image contents contained in the two elliptical regions are different because of many false matches; (c) The finally fitted elliptical regions after the false matches are removed. As shown in (c), there are only four pairs of mismatches which are labeled with green lines. But these four

CE P

false matches have little effect on the shapes of the two ellipses.

Fig. 2 illustrates an example of the fitted elliptical regions on a pair of images with affine distortion. Fig. 2(a) and Fig. 2(b) illustrate the fitted regions for the reference image and the image to be registered, respectively. However, as shown in Fig. 2(a) and Fig. 2(b), parts of the image contents in

AC

the two elliptical regions fitted by point sets P and P are different. This is due to the fact that there are many mismatches in Eq. (2). Therefore, the mismatched MSER pairs should be eliminated before being fitted so that the image contents in the two fitted elliptical regions are almost the same. In this paper, a method is presented to eliminate the mismatching points based on a voting mechanism, which is similar to Hough transform [42]. The specific implementation can be seen in Appendix A. After the mismatches being eliminated, two new sets of points Pˆ and Pˆ  are obtained

  Pˆ   xˆ  xˆ    xˆ , yˆ   , xˆ   I , i  1, 2,..., L T Pˆ  xˆ i xˆ i   xˆi , yˆi  , xˆ i  I , i  1, 2,..., L

T

i

i

i

i

(9) (10)

where L denotes the number of the remaining points in P or P after the mismatches being removed. Subsequently, two new elliptical regions corresponding to Pˆ and Pˆ  are also obtained

 x  μˆ 

T

 x  μˆ 

T

ˆ 1  x  μˆ   1 U

(11)

ˆ 1  x  μˆ   1 U

(12)

ˆ represent the mean and the covariance matrix of Pˆ , respectively. μˆ  and U ˆ where, μˆ and U

ACCEPTED MANUSCRIPT represent the mean and the covariance matrix of Pˆ  , respectively. As shown in Fig. 2(c), the two elliptical regions contain roughly the same image contents. 4.3 Normalization of matched MSERs and coarse estimation of affine transformation matrix

T

Due to the affine geometric distortion, the structures of the two images captured from the same scene are generally different. And it is difficult to calculate the affine transformation model between the

IP

two input images by directly utilizing Eq. (11) and Eq. (12). Therefore, in order to better estimate the affine relationship between the two input images, the two elliptical regions should be further

SC R

normalized into circular ones, between which there mainly remains scale and rotation changes. In addition, the two elliptical regions contain different features (e.g., the edges and corners). The feature repeatability rate will also be reduced when the feature point detection method, to be introduced in following Section 5, is directly performed on them. However, after the normalization of the elliptical

NU

regions, more feature points with high repeatability rate can be detected on the circular regions, which will also improve the subsequent image registration precision. As shown in Fig. 3, the computation

AC

CE P

TE

D

MA

procedure of the normalization is implemented as follows.

Fig. 3 Diagram of the normalization of elliptical regions and coarse alignment.

Step 1: Normalize the two ellipses defined by Eq. (11) and Eq. (12) into circles, which are denoted by Eq. (13) and Eq. (14), respectively. The areas of the two circular regions remain the same as those of their corresponding elliptical regions.









zT z  ˆ1ˆ2 zT z  ˆ1ˆ2

1/ 2

1/ 2





1/ 4





1/ 4

, z  ˆ1ˆ2

, z  ˆ1ˆ2

ˆ 1/ 2 H ˆ  x  μˆ  Λ

(13)

ˆ 1/ 2 H ˆ   x  μˆ   Λ

(14)

ˆ  diag{ˆ , ˆ } . They are obtained by performing ˆ is a symmetric orthogonal matrix and Λ where H 1 2 ˆ ˆ (here, U ˆ 1 , i.e., U ˆ 1  H ˆ T ΛH ˆ 1 is a the singular value decomposition (SVD) on the matrix U ˆ 1 ). symmetric matrix, so the right and left orthogonal matrices are the same in the SVD of the matrix U

ˆ   diag{ˆ, ˆ} are the SVD results on the matrix U ˆ 1 . ˆ and ˆ ˆ  and Λ Correspondingly, H 1 2 1 2 ˆ 1 . ˆ  and ˆ  are the two singular values of U ˆ 1 . are the two singular values of U 1 2

ACCEPTED MANUSCRIPT Step 2: Compute the scale transformation factor s and the rotation transformation matrix R



between the two circular regions determined by zT z  ˆ1ˆ2



1/ 2



and z T z  ˆ1ˆ2



1/ 2

. Owing to the

fact that the main geometric transformations between the two normalized circular regions are rotation

T

and scale changes, the following equations are achieved. z  z, z  sRz

(15)

SC R

IP

 cos  sin   where R    denotes the rotation between the two regions. Different from Arun's   sin  cos   method [43] that estimates the rotation matrix between two point sets based on SVD, a novel method based on the voting mechanism is presented to computation of the rotation angle  in this paper (See Appendix A). The scale factor s is computed as the square root of the area ratio between the two



  ˆ ˆ  1/ 4

1/ 4

.

1 2

NU

circles, i.e., s  ˆ1ˆ2



In this step, two coarsely aligned circular regions determined by zT z  ˆ1ˆ2



1/ 2

1/ 2

and

are obtained, i.e., the region D1 corresponding to the reference image and the region

MA



zT z  ˆ1ˆ2



D2 corresponding to the image to be registered. And the two circular regions will be further used to

D

refine the affine transformation matrix between the two input images in Section 5 and Section 6. It should be noted that the regions D1 and D2 are sampled to almost the same scale after the coarse

TE

alignment step, which makes our proposed method reach a sub-pixel accuracy after the refinement step.



and obtain



1/ 4



ˆ 1/ 2 H ˆ  x  μˆ  and z  ˆˆ Λ 1 2



1/ 4

ˆ 1/ 2 H ˆ   x  μˆ   into Eq. (15), Λ

CE P

Step 3: Substitute z  ˆ1ˆ2

ˆ 1/ 2 H ˆ 1/ 2 H ˆ  x  μˆ   RΛ ˆ   x  μˆ   Λ

(16)

AC

ˆ 1/ 2 H ˆ 1/ 2 H ˆ μˆ  ˆ  μˆ   RΛ Λ Let Ac1    . Then the coarse affine transformation  and Ac 2   0 1  1   0

A c between the two input images is computed as ˆ 1/ 2 RΛ ˆ 1/ 2 H ˆ 1/ 2 RΛ ˆ 1/ 2 H ˆ TΛ ˆ  μˆ  H ˆ TΛ ˆ μˆ  H  x  x  Ac1    Ac 2   , Ac  Ac11Ac 2    0 1 1  1   

where x   x, y 

T

and x   x, y 

T

(17)

represent the image points in the reference image and the image

to be registered, respectively. In general, MSERs are irregular and the computation of their centroids is also inaccurate. Therefore, the transformation matrix between the two input images should be further refined. In the following sections, we will use the GDPC method to detect feature points on the two coarsely aligned regions (i.e., D1 and D2 in Fig. 3) and then obtain the refined transformation matrix by performing the probabilistic point set registration algorithm on the detected feature point sets. 5.

Feature point detection based on Gabor filter decomposition and phase congruency After the coarse alignment, some traditional area-based methods, such as the mutual information

ACCEPTED MANUSCRIPT (MI) [44] and cross-correlation (CC) [45] based ones, can be performed on those circular regions obtained above, and the precision of the final registration results will also be further improved. However, one of the two coarsely aligned regions will be enlarged and thus be blurred if there exists large scale change between the reference image and the image to be registered. The contrast of the

T

zoomed image will also be greatly reduced. Correspondingly, the final registration accuracy of these area-based methods will be reduced. Some traditional feature-based methods, such as the SIFT [21] and

IP

ASIFT [23] based ones, can also be employed in our refinement step, and satisfactory results will also be obtained in most cases. However, these methods are usually based on image gradient. In some

SC R

special cases, such as for images with illumination change or blur, the final registration precision of these methods will also be affected.

Compared with image gradient, phase congruency is more robust to the noise, illumination and contrast change. In [25], Kovesi presented a feature point detector by using the maximum moment of

NU

the image phase congruency. Under varying illumination conditions, the detector shows high stability. However, as discussed in the earlier Section 1, the detector is not scale invariant. The detected points from one of input images will be much less than those from the other image if there exits large scale

MA

change between the two images. Fig. 4 illustrates the feature point detection results on two images with large scale change. As shown in Fig. 4(b), there are 130 detected points in the rectangle region of Fig. 4(a). But as shown in Fig. 4(d), there are only 75 detected points in the interpolated region of Fig. 4(c). This may result from the fact that some mirror high frequency components are introduced when an

AC

CE P

TE

D

image is interpolated and the computation of the phase congruency is thus affected to some extent.

Fig. 4 Detection results of the phase congruency based point detector [25] on two image regions with large scale change. (a) Image captured with short focal-length; (b) Detected feature points on the rectangle region in (a); (c) Image captured with long-focal length; (d) Detected feature points on the rectangle region in (c). The image in (d) is the zoomed result of the rectangle region in (c) by interpolating and it has the same size as the image in (b).

To overcome the shortcomings of the point detector in [25], a novel feature point detection algorithm (i.e., GDPC) was proposed to detect corners from images captured with different focal-lengths or different focuses in our previous work [26]. Here, we apply the GDPC algorithm to solve the problem of image blurring caused by enlargement and interpolation. And for completeness of our proposed registration method, the GDPC detector is briefly introduced in the following contents. In GDPC, the input image is first decomposed into many sub-bands by using a set of band-pass

ACCEPTED MANUSCRIPT filters with different center frequencies and bandwidths. Then the maximum phase congruency moment based point detector [25] is performed on each sub-band. The detected points from the sub-band that contains the maximum number of detected points are taken as the final detection results. Studies show that the Gabor filters can well simulate the frequency and orientation selectivity

T

mechanism of the human visual system (HVS) [46]. In addition, compared with some other commonly used band-pass filters, such as the Laplacian of Gaussian band-pass filters, Gabor filters have better

IP

spectral distinctiveness [47]. Therefore, the Gabor filters are employed to decompose an image into different sub-bands.

expressed as 

1 2

G ,  x ,  y   e

2

x2  y2 2 2

e

i



e

 x cos   y sin  

(18)

NU

g ,   x, y  

SC R

A 2D circular symmetrical Gabor function g ,   x, y  and its Fourier transform G ,  x ,  y  are

2 2  cos    sin    2 2 2   x   y     2   2   

(19)

MA

where   0 determines the bandwidth of the Gabor function and  denotes its orientation.  x and  y are frequency variables. To construct a band-pass Gabor filter, all of the directional filters with the same bandwidth (i.e., the same  ) defined by Eq. (18) are merged as [47]

D

g  x, y    real  g ,   x, y   d  

(20)



with the sum operation, i.e.,

TE

Practically,  are some discrete values. Then the integration operation in Eq. (20) is substituted

CE P

g  x, y  



N

  real  g   x, y   N  g   x, y  N 1

N 1

,

k  N

k

k  N

,

(21)

k

where, k  k / N with k   N ,  N  1,..., N  1 . 2N denotes the number of orientations. In

AC

this paper, N is set to 6. The Fourier transformation of the band-pass Gabor filter g  x, y  is thus expressed by

G x ,  y  

 N

N 1

e

2 2  cos  k   sin k   2 2 2   x    y     2   2    

Fig. 5 Magnitude-frequency response of a Gabor filter with  =2 2 . (a) 3D plot of G2 G2

2

 ,   . x

y

(22)

k  N

2

 ,   ; x

y

(b) 1D slice of

ACCEPTED MANUSCRIPT By setting  to different values, a set of band-pass Gabor filters with different bandwidths are constructed. Fig. 5 illustrates the magnitude-frequency response of a Gabor filter with   2 2 . Correspondingly, a set of filtered images I  x, y  are obtained by (23)

T

I ( x, y)  F 1 G x ,  y   F  I 0  x, y  

where I 0  x, y  is the original image. F [] and F 1[] denote the forward Fourier transform and the

SC R

IP

inverse Fourier transform, respectively. In this paper,  is experimentally set to the following five values, i.e.,  =2n / 2 (with n  1, 2,3, 4,5 ). Subsequently, the maximum phase congruency moment based corner detector [25] is performed on each of the five filtered images, respectively. And the image Iˆ that contains the maixmum number of detected points is picked out, i.e.,





Iˆ  arg max Num    I   th1 I

(24)

NU



where I denotes the filtered image.   I  represents the maximum phase congruency moment of

TE

D

MA

I . th1 is the threshold for detecting the feature points and is set to 0.4 in this paper. Finally, the feature points obtained from Iˆ are regarded as the final detection results.

Fig. 6 Feature point detection results on different sub-bands obtained by the band-pass Gabor filter decomposition. (a) Feature

CE P

point detection result on the original blurred image; (b) ~ (f) Feature point detection results on the sub-bands obtained by the Gabor band-pass filter decomposition with  = 2, 2, 2 2, 4, 4 2 , respectively. The numbers of the feature points in (a) to (f) are 75, 112, 122, 105, 80 and 19, respectively.

To illustrate the validity of the GDPC method, the image in the above Fig. 4(d) is employed again.

AC

As shown in Fig. 6, the numbers of the phase congruency feature points detected in Fig. 6(a) to Fig. 6(f) are 75, 112, 122, 105, 80 and 19, respectively. In the 2 nd sub-band, i.e. Fig. 6(c), which corresponds to

  2 , the number of the detected phase congruency feature points is maximum. Thus the detected feature points in Fig. 6(c) are taken as the final detection result. Moreover, for the feature points detected in Fig. 4(b) and Fig. 4(d), there are 31 correct matches and the feature repeatability rate is 0.4133. By virtue of the Gabor band-pass decomposition, 69 correct matches are obtained and the corresponding feature repeatability rate also reaches to 0.5476. This demonstrates that the GDPC detector performs better than the original phase congruency based point detector in [25] for images with large scale change. 6.

Refined transformation matrix estimation based on the point set registration In the classical SIFT based and ASIFT based image registration methods, a descriptor is first

constructed for each feature point. Then some matching strategies (e.g., the Euclidean Distance) are used to search for the best matches. Finally, the Least Squares (LS) or the RANdom SAmple Consensus (RANSAC) method [48] is applied to remove the outliers and estimate the transformation matrix. As discussed in the earlier Introduction part, the construction of descriptors for each feature point has important influence on the final matching (or registration) results and still is an open issue.

ACCEPTED MANUSCRIPT Moreover, the RANSAC method is time-consuming when the number of the matches is very large. Different from that in the traditional feature-based registration methods, we employ a probabilistic model based point set registration approach [27] to estimate the transformation matrix between the two feature point sets in this paper, in which there is no need to build descriptors for the feature points. In

T

addition, the searching for the best matches and the estimation of transformation matrix are implemented simultaneously, which improves the computation efficiency of the proposed image

IP

registration method.

SC R

6.1 Point set registration

In Subsection 4.3, two circular regions containing roughly the same image contents, i.e., the reference region D1 denoted by Eq. (13) and the tested region D2 denoted by Eq. (15), are obtained.

and

 xm , ym 

y1 y2

1 1 (25)   yM 1 M 3 from D1 and D2 , respectively.

represent the coordinates of the n th and m th feature points in D1 and D2 .

D

 xn , yn 

1  y1T   x1  T   y2 1 x y ,Y   2    2      T    yN 1 N 3  y M   xM of the feature points detected y1

MA

 x1T   x1  T  x x X 2 2     T   x N   xN where N and M are the number

NU

Two feature point sets X and Y are then obtained by performing the GDPC algorithm introduced in Section 5 on D1 and D2 , respectively. The X and Y are defined as follows.

CE P

TE

Assume that the geometrical relationship between the two point sets is modeled by an affine T T transformation, i.e., the correspondence of xn   xn , yn ,1 and y m   xm , ym ,1 satisfies

 xn   a b x n   yn    d e  1   0 0

c   xm  f   ym   A r y m 1   1 

(26)

where the matrix A r with 3  3 order denotes the affine transformation between the two feature point sets. Then A r can be estimated by the probabilistic model based point set registration algorithm

AC

in [27].

Here, the points in set Y are treated as the centroids of the Gaussian mixture model (GMM) [49], and the estimation of A r is to find the optimal model that generates the closest data points to X . The conditional probability density function (PDF) of point x n generated by the centroid y m is expressed as

P  xn | y m ,    2



2 3/ 2

e

 x A y r m  n  2 2 

2

   

(27)

where  denotes the parameters in A r and the covariance  2 . All of the GMM components are supposed to have the identical values of  2 . Then the GMM PDF of point x n generated by all the centroids is expressed as M

P  xn |     w  y m  P  xn | y m , 

(28)

m 1

where w(y m ) is the weight for the m th GMM component, and it is simply set to 1/ M for all m . For all of the points in set X , we can totally obtain N GMM PDFs to generate them. The log-likelihood of the GMM function is

ACCEPTED MANUSCRIPT N

l    log  P  x n |   n 1

(29)

M 1   =  log  P  x n | y m ,   M n 1  m 1  N

T

Then the problem of estimation of A r is reduced to seek for the solution that maximizes l   . The

IP

Expectation Maximization (EM) algorithm [49] is adopted here to estimate the parameters. More

SC R

details can be seen in [27]. 6.2 Estimation of transformation matrix

After the coarse sub-region matching and the feature point set registration, two affine

NU

transformations matrix Ac  Ac11Ac 2 and A r are obtained. Then Eq. (17) can be precisely expressed as

x  x  Ac1    A r Ac 2   1  1  T

and x   x, y 

T

MA

where x   x, y 

(30)

represent the image points in the reference image and the

image to be registered, respectively.

TE

D

 a b c Let B   denote the linear transformation and t    denote the translational  d e  f transformation, then the affine transformation matrix A r in Eq. (26) is simply rewritten as

CE P

B t  Ar     0 1 Thus, the final affine transformation matrix T between the two input images is calculated by



 

(32)

AC

ˆ 1/ 2 BRΛ ˆ 1/ 2 H ˆ 1/ 2 t  BRΛ ˆ 1/ 2 H ˆ TΛ ˆ  μˆ  H ˆ TΛ ˆ μˆ  H T  Ac11 A r Ac 2    0 1

(31)

The image to be registered is finally transformed by using the affine transformation matrix T , and the image values in non-integer coordinates are computed by the bilinear interpolation technique. 7.

Experiments and analysis In this section, several sets of experiments are performed to demonstrate the validity of the

proposed image registration algorithm. Firstly, a set of experiments are performed to verify the validity of the proposed image registration method on the alignment of images with scale changes. Secondly, several sets of images with different affine distortions are aligned to test the feasibility of the proposed image registration algorithm. Finally, a set of experiments are performed to test the robustness of the proposed image registration algorithm to illumination change. All the experiments work on a personal computer with a 2.83GHz Intel Core CPU, 3.48G RAM, and Windows XP operating system. 7.1 Alignment of images with scale change In this subsection, several sets of images with scale changes are used to test the validity of the proposed image registration method. Besides the proposed method, the classical SIFT based image registration method [21] (SIFT method, for short) and another three methods (MSER-SIFT, MSER-MI

ACCEPTED MANUSCRIPT and MSER-CC methods, for short, respectively) that employ the same coarse alignment step in Section 4 but different refinement steps are performed for comparisons. In the latter three methods, the SIFT based [21], the mutual information based [44] and the cross correlation based [45] methods are employed in the refinement step, respectively. In the SIFT and MSER-SIFT methods, the matched

T

feature points between the reference image and the image to be registered are firstly obtained. Then the transformation matrix (or model parameters) is estimated by using the RANSAC algorithm. The default

MA

NU

SC R

7.1.1 Alignment of man-made images with scale change

IP

values of the parameters in [21] are employed in the SIFT feature points detection and matching step.

Fig. 7 Man-made test images with scale changes. The image in the upper left corner is the reference image and the other images

AC

CE P

TE

D

are sampled from the reference image with scale factors of 0.5, 0.6, 0.7, 0.8, 0.9, 1.1, 1.2, 1.4, 1.7 and 2, respectively.

Fig. 8 Comparison of different methods on the images with simulated scale changes.

In this subsection, a set of man-made images with scale changes are used to test the performance of different registration methods above. Fig. 7 illustrates the test images. Considering the fact that all of the images to be registered in Fig. 7 are derived from the reference image, it is easy to obtain the accurate ground-truth control point pairs. Therefore, the registration precision of different methods can be evaluated by the root mean square error (RMSE) of the displacement of the control point pairs. As suggested in [33], we also employ 20 control points in the computation of the RMSE. Fig. 8 gives the RMSE values of different registration methods on the images with varieties of scale factors. As shown in Fig. 8, it is obviously found that the SIFT, MSER-SIFT and our proposed methods outperform the other three methods, especially for images with large scale changes (i.e., corresponding to bigger or smaller scale factors). It can also be found that the MSER-MI and MSER-CC methods perform better than the coarse alignment method in most of cases. But for images with large scale changes, the

ACCEPTED MANUSCRIPT registration precisions of the two area-based methods obviously reduce and even perform worse than the coarse alignment one (e.g., in case of the scale factors less than 0.7). As discussed in the earlier Section 5, the zoomed image will be greatly blurred after the interpolation and enlargement if there exists large scale change between the two input images. The registration precisions of these two

T

area-based methods are thus reduced to some extent in this case. 7.1.2 Alignment of real world images with scale change

IP

In this subsection, a set of real-world images with scale changes are employed to demonstrate the validity of our proposed method. Fig. 9 illustrates the test images downloaded from

SC R

www.robots.ox.ac.uk [50] and Fig. 10 displays the registration results obtained by the SIFT and our proposed methods3. As shown in Fig. 10, both of the SIFT and our proposed registration methods can

CE P

TE

D

MA

NU

align these images successfully and the registered results have insignificant differences.

Fig. 9 Real-world test images with scale changes. (a) Reference image; (b) ~ (e) Images to be registered. They have the same size

AC

of 680  850 and the scale change gradually increases from (b) to (e).

Fig. 10 Registration results obtained by the SIFT and the proposed methods on the images in Fig. 9. (a) Registration results obtained by the SIFT method; (b) Registration results obtained by the proposed method.

3

The registered results obtained by other methods such as MSER-SIFT, MSER-MI and MSER-CC, have insignificant differences with those obtained by the proposed method. So, they are not displayed here due to space limitation.

ACCEPTED MANUSCRIPT In order to evaluate the performance of different methods, some other objective metrics [31] in addition to the RMSE metric are employed here, which include the number of total matches ( NT ), the number of correct matches ( N C ), the feature repeatability rate ( FR ) of detected features and the correct matching rate ( CR ). Here, given the transformation matrix T 4, a matched point pair ( x, y)  ( x, y)

T

is regarded as a correct one if they satisfy

 x  x   y  y  th2 T T  x '', y '',1  T  x, y,1 2

IP

2

(33)

and CR are defined as

SC R

where th2 is a threshold, and it is experimentally set to 50 in all of the experiments. The metrics FR

(34)

CR  NC / NT

(35)

NU

FR  NC / min( Nref , Ntest )

where, N ref and Ntest denote the total number of detected features from the reference image and the

MA

image to be registered, respectively.

Table 1 Registration results of different methods on the real-world images with scale changes in Fig. 9. "--" denotes that the experimental data do not exist for the corresponding methods. The first RMSE values for the proposed method are the refinement results and the second RMSE values, i.e., the values in the brackets, are the coarsely aligned results. In other tables, the meaning

Metric

Fig. 9(d)

Fig. 9(e)

MSER-MI

MSER-CC

Proposed

533/608

--

--

409/409

0.8766

0.9797

--

--

1.0

0.3978

0.6394

--

--

0.6944

RMSE

1.1909

1.2017

3.2509

3.0863

1.2104(3.5556)

N C / NT

452/522

652/669

--

--

418/418

CR

0.8695

0.9746

--

--

1.0

FR

0.3767

0.6834

--

--

0.7232

RMSE

CR

CE P

FR

1.6657

1.6449

2.5537

2.4846

1.6349(3.3105)

N C / NT

199/224

462/484

--

--

382/414

CR

0.8884

0.9545

--

--

0.9227

FR

0.2823

0.4421

--

--

0.5411

RMSE

2.0929

2.1265

10.4303

9.8736

2.1300(13.1564)

N C / NT

123/156

344/384

--

--

340/350

CR

0.7885

0.8958

--

--

0.9714

FR

0.1692

0.3678

--

--

0.4920

RMSE

2.8781

2.7545

3.2307

3.2079

2.9886(4.9464)

AC

Fig. 9(c)

MSER-SIFT 578/590

N C / NT

Fig. 9(b)

SIFT

TE

Test image

D

of the symbol "--" and the representation of the RMSE values for the proposed method are the same.

According to the RMSE values in Table 1, it can be concluded that the SIFT, MSER-SIFT, and our proposed methods perform competitively and significantly outperform the MSER-MI and MSER-CC methods in registration accuracy for images with scale changes. By a further comparison, it can also be found that the MSER-SIFT and our proposed methods, especially our proposed methods, obtain higher correct matching rate and repeatability rate than the SIFT method does. In addition, the MSER-SIFT and our proposed methods perform more stable than the SIFT method. For example, the SIFT method obtains satisfactory results when the scale changes between the reference image and the image to be

4

Here, an affine transformation matrix is employed.

ACCEPTED MANUSCRIPT registered are small (e.g., for Fig. 9(b) and Fig. 9(c)). However, the number of correct matches and the repeatability rate obtained by the SIFT method reduce greatly when the scale changes are large (e.g., for Fig. 9(d) and Fig. 9(e)). Differently, the matching results obtained by the MSER-SIFT and our proposed methods are stable with the increase of scale changes between the reference image and the

T

one to be registered. This may owe to the employed coarse alignment step in our proposed method.

IP

7.2 Alignment of images with affine transformation

In this subsection, several sets of experiments are performed to test the feasibility of the proposed

SC R

registration method on images with affine distortions. In addition to the proposed, SIFT, MSER-SIFT, MSER-MI and MSER-CC methods mentioned in Subsection 7.1, another three feature-based image registration methods, i.e., the MSER [18], ASIFT [23] and MSER-ASIFT5 methods, are performed for comparisons. In the MSER, ASIFT and MSER-ASIFT methods, the feature regions or points are first

NU

detected and described for each input image. Then the matches are obtained by using the same strategy in [21]. Finally, the transformation matrix between the two input images is estimated by using the RANSAC algorithm. The default values of the parameters in [18] are employed in the MSER method

MA

and the default values of the parameters in [23] are employed in the ASIFT and MSER-ASIFT methods.

AC

CE P

TE

D

7.2.1 Alignment of man-made images with affine distortion

Fig. 11 Man-made images with affine distortions. (a) Reference image; (b) ~ (e) Affine transformed images derived from the reference image by using the transformation matrix T in Eq. (36) with   0.2, 0.4, 0.6, 0.8 , respectively.

In this subsection, a set of man-made test images with affine distortions are employed to verify our proposed method. Fig. 11 illustrates the test images, in which, Fig. 11(a) is a front view image shot [51] and is taken as the reference image in the following experiments. Fig. 11(b) ~ (e) are four images to be registered, which are obtained by performing the affine transformation defined by Eq. (36) on Fig. 11(a) with   0.2, 0.4, 0.6, 0.8 , respectively.

5

Similarly, the MSER-ASIFT method employs the MSER based alignment method introduced in Section 4 as a coarse step and the ASIFT based registration method in [23] as a refinement step.

ACCEPTED MANUSCRIPT 1 T    0

0.4 0  1 0  0 1 

(36)

T

The same objective metrics mentioned in the above subsection are employed here and Table 2 gives the experimental results obtained by different methods. As shown in Table 2, it can be found that

IP

all of the methods by using a two-step way (i.e., the MSER-SIFT, MSER-ASIFT, MSER-MI, MSER-CC and our proposed methods) obtain higher registration precisions than the coarse alignment

SC R

results in term of the metric RMSE. Moreover, the MSER-SIFT, MSER-ASIFT and our proposed methods perform better than those two area-based methods. Especially, the proposed method yields the lowest RMSE values among all the mentioned methods.

Table 2 Registration results of different methods on the man-made images with affine distortions in Fig. 11. "×" denotes that the

Fig. 11(e)

ASIFT

MSER-SIFT

MSER-ASIFT

MSER-MI

MSER-CC

Proposed

212/215

158/217

1827/2180

533/544

5504/5511

--

--

367/367

0.9860

0.7281

0.8381

0.9798

0.9987

--

--

1.0

0.9221

0.3059

--

--

0.9039

0.3996

0.3707

1.2130

1.1573

0.0670(1.6455)

MA

N C / NT CR FR

0.4988

0.1415

0.0776

RMSE

1.2610

1.3186

1.3796

T (s)

8.94

11.30

101.86

25.55

60.72

29.66

86.32

51.02

N C / NT

196/203

31/81

1248/1564

507/523

5338/5343

--

--

271/271

CR

0.9655

0.3827

0.7980

0.9694

0.9991

--

--

1.0

FR

0.4600

0.0335

D

Fig. 11(d)

SIFT

0.0625

0.8637

0.3085

--

--

0.6826

RMSE

0.9696

1.8873

1.6711

0.4254

0.4339

2.7429

2.3277

0.1596(3.0174)

T (s)

9.03

11.83

99.64

24.53

61.94

30.62

140.81

49.43

N C / NT

172/179

9/45

755/1037

453/472

4837/4852

--

--

312/312

CR

0.9609

0.2

0.7281

0.9597

0.9969

--

--

1.0

FR

0.4268

0.0115

0.0465

0.8147

0.2948

--

--

0.8168

RMSE

0.9358

43.3784

1.7923

0.6014

0.4932

4.7373

3.6594

0.2316(6.2300)

T (s)

8.84

10.40

96.64

24.07

58.48

27.58

329.22

46.00

N C / NT

143/153

0/35

437/640

375/393

3877/3885

--

--

240/240

CR

0.9346

0

0.6828

0.9542

0.9979

--

--

1.0

TE

Fig. 11(c)

MSER

CE P

Fig. 11(b)

Metric

AC

Test image

NU

method fails for the corresponding test images.

0.3505

0

0.0339

0.8352

0.2991

--

--

0.7362

RMSE

1.1708

×

1.4215

1.1853

1.0860

2.0938

2.1731

0.2405(3.5940)

T (s)

8.92

×

93.28

22.63

53.03

31.27

38.31

44.81

FR

In terms of the metrics CR and FR , the MSER-SIFT and our proposed method, especially our proposed method, also perform better than the other feature-based methods, such as the MSER, SIFT, ASIFT, and MSER-ASIFT ones. The SIFT method performs the worst among the feature-based methods in the four sets of experiments. It obtains less correct matches and lower correct matching rate than the other methods. Especially, the SIFT based registration method obtains few or even none of matches when the affine distortion is larger (e.g.,  is 0.6 or 0.8). This may result from the fact that the SIFT based point detection and matching algorithm is sensitive to large affine distortion [21]. Compared with the SIFT method, the MSER-SIFT method achieves a significant improvement in terms of the correct matches, the correct matching rate and the feature repeatability rate by virtue of the proposed coarse alignment step. It can also be found that the ASIFT and MSER-ASIFT methods obtain a large number of correct matches but low repeatability rates. That is to say, there are only a few proportions of feature points detected by the ASIFT algorithm are useful and much more are useless,

ACCEPTED MANUSCRIPT which leads to the great increase of computation complexity and huge waste of memory space [23]. The run time (i.e., T (s)) of different methods are also provided in Table 2, which shows that the MSER and SIFT methods are much faster than the other methods. The ASIFT, MSER-ASIFT and MSER-CC methods run the slowest among the mentioned registration methods. The run time of our

T

proposed method is about half of that of the ASIFT method.

IP

7.2.2 Alignment of real-world images with affine transformation

In this subsection, five pairs of real-world images with affine distortions are aligned to further

SC R

demonstrate the validity of our proposed method. Fig. 12 illustrates the test images 6 and the registered

AC

CE P

TE

D

MA

NU

ones obtained by our proposed method7. And Table 3 gives the experimental data of different methods.

Fig. 12 Registration results of the proposed method on the real-world images with affine distortions. (a) ~ (e) Five pairs of real-world test images with affine distortions and their registration results obtained by our proposed method. The images from left to right are the reference images, the images to be registered and the registration results, respectively.

Similar results can be found according to the experimental data in Table 3. In terms of the correct 6

The test images in Fig. 12(a) are shot by the IKONOS camera. The test images in Fig. 12(b) and Fig. 12(e) are downloaded from http://lear.inrialpes.fr/people/mikolajczyk/Database/index.html [51]. The test images in Fig. 12(c) come from Google earth, and the test images in Fig. 12(d) are downloaded from www.robots.ox.ac.uk [50]. 7 The registration results obtained by the MSER, ASIFT ,MSER-SIFT, MSER-ASIFT methods on all of the real-world images and the registration results obtained by the SIFT method on the images displayed in Fig. 12(a) and Fig. 12(b) have insignificant differences with the corresponding ones obtained by the proposed method, so they are not displayed here for space limitation.

ACCEPTED MANUSCRIPT matching rate, the proposed method performs competitively with the MSER-SIFT and MSER-ASIFT methods and better than the MSER, SIFT and ASIFT methods for all of the five experiments. Except for the MSER-SIFT method, the proposed method obtains the highest repeatability rate among the other five feature-based methods. The ASIFT and MSER-ASIFT methods obtain a large number of

T

correct matches but low repeatability rates. For images with affine distortions (e.g., Fig. 12 (b) ~ Fig. 12(e)), the SIFT based registration method performs the worst and even fails. In term of the metric

IP

RMSE, the MSER-ASIFT performs slightly better than the MSER-SIFT and our proposed methods. And these three methods outperform the other methods. Finally, in term of the metric T , the proposed

SC R

method runs faster than the ASIFT, MSER-ASIFT and MSER-CC methods and slower than the other methods. In summary, it can be concluded that the proposed and MSER-SIFT methods perform better than the others on aligning images with affine distortions by comprehensively considering the computational efficiency and registration precision.

NU

Table 3 Registration results of different methods on the real-world images with affine distortions in Fig. 12. The symbol "×" stands for the same meaning as that in Table 2.

Fig. 12(c)

Fig. 12(d)

Fig. 12(e)

SIFT

ASIFT

MSER-SIFT

MSER-ASIFT

MSER-MI

MSER-CC

Proposed

437/441

212/223

1035/1060

3258/3285

4225/4226

--

--

224/224

0.9507

0.9764

0.9918

0.9909

0.9998

--

--

1.0

MA

N C / NT

CR FR

0.1688

0.3838

0.0995

0.8585

0.3053

--

--

0.8854

RMSE

1.2501

1.0219

1.6661

0.8364

0.7223

1.9426

2.1475

0.8615(2.2292)

T (s)

23.64

17.79

425.68

N C / NT

86/92

256/291

CR

0.9348

0.8797

FR

0.1463

0.2282

RMSE

1.2565

T (s)

12.47

61.45

33.53

111.74

48.05

407/424

4341/4344

--

--

197/197

0.9958

0.9599

0.9993

--

--

1.0

0.0752

0.8532

0.3068

--

--

0.6655

1.0378

1.3280

0.9576

0.8997

1.2188

1.5200

1.0378(2.6887)

18.89

96.65

39.99

53.97

25.63

134.77

45.48

D

30.76

2347/2357

TE

Fig. 12(b)

MSER

CE P

Fig. 12(a)

Metric

N C / NT

57/62

2/15

1329/1413

256/272

2263/2266

--

--

166/167

CR

0.9194

0.1333

0.9406

0.9421

0.9987

--

--

0.9940

FR

0.2058

0.0022

0.0436

0.7420

0.2864

--

--

0.8384

RMSE

1.9170

×

2.1224

1.1836

1.2027

1.6554

1.8173

1.2749(2.9506)

T (s)

7.17

×

46.68

27.06

33.42

105.45

225.94

25.10

AC

Test image

N C / NT

76/94

7/18

1254/1313

238/256

2276/2282

--

--

146/146

CR

0.8085

0.3889

0.9551

0.9297

0.9974

--

--

1.0

FR

0.2262

0.0128

0.0555

0.7190

0.2938

--

--

0.7487

RMSE

1.7471

×

1.6587

1.3452

1.0963

1.5955

1.6949

1.1712(2.7535)

T (s)

9.78

×

42.97

18.34

26.73

22.42

132.82

19.48

N C / NT

90/106

0/27

1375/1420

252/259

2247/2251

--

--

137/137

CR

0.8491

0

0.9683

0.9730

0.9982

--

--

1.0

FR

0.1466

0

0.0448

0.8372

0.3031

--

--

0.7874

RMSE

3.1916

×

2.2813

2.0687

1.9442

2.1292

2.1605

2.1612(2.8566)

T (s)

13.41

×

65.15

22.56

37.64

21.40

195.43

31.11

7.3 Alignment of images with illumination change To further demonstrate the validity of our propose method, a set of images with different illumination changes are employed in this experiment. The test images which are downloaded from www.robots.ox.ac.ok [50], and the aligned images obtained by our proposed method are illustrated in Fig. 13. Similarly, the registration results obtained by the other methods on these images have insignificant differences with the corresponding ones obtained by the proposed method, so they are also

ACCEPTED MANUSCRIPT not displayed here. Quantitative results obtained by different methods are given in Table 4, which shows that all of the feature-based methods obviously perform better than the other two area-based methods in this case. Among the feature-based methods, our proposed method performs the best in terms of the metrics CR , FR and RMSE in most of cases. This demonstrates that our proposed

T

method is more robust to illumination change than the other methods, which owes to the employed

MA

NU

SC R

IP

GDPC detector in our proposed refinement step.

Fig. 13 Registration results of the proposed method on the images with illumination changes. (a) Reference image; (b) ~ (f) Images to be registered; (g) ~ (k) Registration results obtained by the proposed method.

Fig. 13(c)

Fig. 13(d)

Fig. 13(e)

Fig. 13(f)

8.

SIFT

N C / NT

51/69

160/173

CR

0.7391

FR

0.2394

RMSE

1.2586

N C / NT

36/48

CR FR

RMSE

ASIFT

MSER-SIFT

MSER-ASIFT

MSER-MI

MSER-CC

Proposed

2491/2502

94/108

848/853

--

--

132/132

0.9249

0.9956

0.8704

0.9941

--

--

1.0

0.5263

0.2136

0.5465

0.2438

--

--

0.7719

0.9718

0.9577

0.8322

0.8149

4.2714

4.4406

0.8593 (7.9612)

127/140

1925/1933

66/82

644/648

--

--

122/122

0.7500

0.9071

0.9959

0.8049

0.9641

--

--

1.0

0.1690

0.4792

0.2052

0.5238

0.2111

--

--

0.7531

1.4814

TE

MSER

CE P

Fig. 13(b)

Metric

1.0263

1.1455

1.0364

1.2533

6.1130

5.1007

0.8685 (10.9612)

N C / NT

42/58

95/104

1878/1886

38/44

451/457

--

--

108/108

CR

0. 7241

0.9134

0.9958

0.8636

0.9869

--

--

1.0

0.2246

0.3653

0.1869

0.4691

0.2275

--

--

0.7714

RMSE

1.2768

1.7957

1.3191

1.3318

1.3195

8.0378

11.6401

1.1994 (14.9092)

N C / NT

24/36

70/78

1041/1051

44/50

419/429

--

--

113/113

CR

0.6667

0.8974

0.9905

0.8800

0.9767

--

--

1.0

FR

0.1727

0.3911

0.1755

0.4731

0.1835

--

--

0.7019

RMSE

1.9439

2.1894

1.4283

1.3255

1.4072

9.298

8.794

1.3048(12.8535)

FR

AC

Test image

D

Table 4 Registration results of different methods on the images with illumination changes in Fig. 13.

N C / NT

13/25

56/63

650/660

29/33

243/246

--

--

69/71

CR

0.5200

0.8889

0.9848

0.8788

0.9878

--

--

0.9718

FR

0.1048

0.4029

0.1491

0.6042

0.2128

--

--

0.7320

RMSE

2.0646

1.7209

1.4823

1.3688

1.4240

8.6431

7.7223

1.5108 (11.4559)

Conclusions In this paper, a novel image registration method is proposed to align images with affine distortions

by using a two-step way. The coarse affine transformation matrix between the reference image and the image to be registered is first estimated by using an improved MSER based registration method. Then

ACCEPTED MANUSCRIPT the final affined transform matrix is refined by using a phase congruency based point detector and a probabilistic point set registration method. We show that the combination of the two-step way makes the proposed method outperform some traditional area-based methods (e.g., the MSER-MI and MSER-CC methods) and feature-based methods (e.g. MSER, SIFT, ASIFT and MSER-ASIFT methods)

T

for images with scale changes or affine distortions. Especially, with the help of the phase congruency based point detector in the refinement step, the proposed method shows higher robustness to

IP

illumination change than those existing methods do. Owing to the employed point set registration algorithm, the construction of descriptors is canceled in the refinement step, and the computation

SC R

complexity of our proposed registration method is also reduced to some extent. Experimental results demonstrate that the run time of our proposed method is about the half of that of the ASIFT method.

Acknowledgement

NU

This work is supported by the National Natural Science Foundation of China under Grant No.61104212, by the Fundamental Research Funds for the Central Universities under Grant No.K5051304001 and NSIY211416, and by China Scholarship Council under Grant No.201306965005.

MA

The image data sets used in Section 7.1 and Section 7.3 are downloaded from www.robots.ox.ac.uk and are kindly provided by Department of Engineering Science, University of Oxford.

Appendix A

D

In this appendix, the proposed methods for removing the false matched MSER pairs and determining the rotation angle  between the two corresponding point sets are discussed in detail.

TE

Suppose that there are two coarsely matched point sets P and P which are fitted into two ellipses expressed by Eq. (7) and Eq. (8), respectively. And the two ellipses are normalized into circles in the

CE P

same way as Eq. (13) and Eq. (14). Then another two point sets S and S corresponding to the two circles are obtained, i.e.,

 ui , vi 

and

AC

where

T

 ui, vi 

T

S  {zi | zi   ui , vi  , i  1, 2,..., N}

(A.1)

S  {zi | zi   ui, vi  , i  1, 2,..., N}

(A.2)

T

T

are a matched point pair. N is the total number of the matches.

Fig. A1 Illustration of the phase angles and the angle difference.

As shown in Fig. A1, for each matched point pair zi   ui , vi  and zi   ui, vi  , their phase T

T

angles i and i are computed by Eq. (A.3) and Eq. (A.4), respectively.

i  mod  atan 2  vi , ui  , 2  , i  0, 2 

(A.3)

ACCEPTED MANUSCRIPT i  mod  atan 2  vi, ui  , 2  , i  0, 2  where atan 2 



(A.4)

denotes the four quadrant inverse tangent function. Then a set of angle differences

Φ for all of the matched point pairs in S and S are obtained, i.e., Φ



i



| i  mod  i  i  , 2  , i  1, 2,..., N , i  0, 2 

(A.5)

Φ can be regarded as the rotation angle  between S

T

The mean of the elements in the set

IP

and S . However, there may be some mismatches in P and P , which will lead to a large deviation between the true rotation angle and the computed result. Therefore, the false matches should be

SC R

removed before the rotation angle is estimated. Similar to the idea in the Hough transform [42], we also present a method based on the voting mechanism to remove the mismatched MSER pairs and determine the rotation angle  in this paper.

Firstly, each matched pair votes for its corresponding angular interval. The interval 0, 2  is

NU

divided into 36 smaller sub-intervals of equal size8. For the n th ( n  1, 2,...,36 ) sub-interval, a new sub set Ψ n is defined by

     Ψ n   n, j n, j  Φ, and n, j    n  1 , n  , j  1, 2,.., an  18  18  

(A.6)

MA

   where an is the total number of the elements that fall into the sub-interval   n  1 , n  . 18 18  

D

   Then, the main interval   m  1 , m  containing the maximum elements is obtained, where 18 18   m is determined by

TE

m  arg max an n  1, 2,...,36

(A.7)

n

AC

interval.

CE P

   As shown in Fig. A2, the interval   m  1 , m  marked by orange color represents the main 18  18

Fig. A2 Illustration of the main interval and singular intervals.

Finally, the mismatched pairs are rejected based on the voting results. The matched pairs from S and S are regarded as the correct matches if their angle differences fall into the main interval. However, considering the computation error, the matched point pairs whose angle differences fall into the intervals around the main one (e.g., the intervals marked by the brown color in Fig. A2) may also be regarded as the correct pairs8.

8

Here, more than 36 sub-intervals can be divided, but slight improvements will be made. So, the number of sub-intervals is set to 36 in our experiments by comprehensively considering the computation complexity and accuracy.

ACCEPTED MANUSCRIPT In this paper,

the

matched

pairs

whose

angle

differences fall into

the

interval

   18  m  3 , 18  m  2   are finally regarded as the correct ones. The rest of the matched pairs are   regarded as the mismatched ones and then removed from S and S . Correspondingly, the rotation

    , 2    a  

  m 2

ap

pm2

 

q 1

p,q

m2

p m2

p

IP

  mod 

T

angle between S and S is computed by (A.8)

SC R

However, there will be some computation errors in Eq. (A.8) when the main interval is near to 0 or 2 (i.e., m  35 or m  2 ). For example, in case of m  36 , some of the angles

 p , q (e.g.,

the angles in the blue regions in Fig. A2) are near to 2 , but the others (e.g., the angles in the green

NU

regions in Fig. A2) will be near to 0. Then the mean of these angles has a large deviation from 2 , and a false result will be obtained. In addition, some values of p will exceed the range of [1,36] in

 p , q are simply modified as Eq. (A.9) or

MA

this case. Therefore, in order to avoid such errors, p and

Eq. (A.10) during the computation of  for m  35 or m  2 .

, when m  35 and p  37

, when m  2 and p  0

(A.9) (A.10)

TE

D

  p  p  36     p , q   p , q  2  p  p  36      p , q   p , q  2

References

B. Zitova, J. Flusser, Image registration methods: a survey, Image Vis. Comput. 21 (11) (2003) 977-1000.

[2]

M. Brown, D. Lowe, Automatic panoramic image stitching using invariant features, Int. J. Comput. Vis. 74 (1) (2007) 59-73.

[3]

CE P

[1]

Q. Zhang, L. Wang, H. Li, Z. Ma, Similarity-based multimodality image fusion with shiftable complex directional pyramid,

[4]

AC

Pattern Recognit. Lett. 32 (13) (2011) 1544-1553. B. Kratochvil, L. Dong, L. Zhang, B. Nelson, Image-based 3D reconstruction using helical nanobelts for localized rotations, J. Microsc. 237 (2) (2010) 122-135. [5]

A. Cole-Rhodes, K. Johnson, J. LeMoigne, I. Zavorin, Multiresolution registration of remote sensing imagery by optimization of mutual information using a stochastic gradient, IEEE Trans. Image Process. 12 (12) (2003) 1495-1511.

[6]

X. Wang, J. Tian, Image registration based on maximization of gradient code mutual information, Image Anal. Stereol. 24 (1) (2005) 1-7.

[7]

A. Goshtasby, G. C. Stockman, C. V. Page, A region-based approach to digital image registration with subpixel accuracy, IEEE Trans. Geosci. Remote Sens. 24(3) (1986) 390-399.

[8]

X. Lu, S. Zhang, H. Su, Y. Chen, Mutual information-based multimodal image registration using a novel joint histogram estimation, Comput. Med. Imaging Graph. 32(3) (2008) 202-209.

[9]

J. P. Heather, M. I. Smith, Multimodal image registration with applications to image fusion, Information Fusion, 2005 8th International Conference on. IEEE, 2005, pp. 1-8.

[10] T. Kim, Y. Im, Automatic satellite image registration by combination of matching and random sample consensus, IEEE Trans. Geosci. Remote Sens. 41 (5) (2003) 1111-1117. [11] X. Dai, S. Khorram, A feature-based image registration algorithm using improved chain-code representation combined with invariant moments, IEEE Trans. Geosci. Remote Sens. 37 (5) (1999) 2351-2362.

ACCEPTED MANUSCRIPT [12] O. Cordón, S. Damas, J. Santamaría, Feature-based image registration by means of the CHC evolutionary algorithm, Image Vis. Comput. 24 (5) (2006) 525-533. [13] J. Flusser, T. Suk, A moment-based approach to registration of images with affine geometric distortion, IEEE Trans. Geosci. Remote Sens. 32 (2) (1994) 382-387.

T

[14] Z. Yang, F. Cohen, Image registration and object recognition using affine invariants and convex hulls, IEEE Trans. Image Process. 8 (7) (1999) 934-946.

IP

[15] H. Lin, P. Du, W. Zhao, L. Zhang, H. Sun, Image registration based on corner detection and affine transformation, Image and Signal Processing (CISP), 2010 3rd International Congress on. IEEE, 2010, pp. 2184-2188.

SC R

[16] K. Mikolajczyk, C. Schmid, An affine invariant interest point detector, Computer Vision—ECCV 2002, Springer Berlin Heidelberg, 2002, pp. 128-142.

[17] K. Mikolajczy, C. Schmid, Scale & affine invariant interest point detectors, Int. J. Comput. Vis. 60 (1) (2004) 63-86. [18] J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image Vis.

NU

Comput. 22 (10) (2004) 761-767.

[19] R. Kimmel, C. Zhang, A. M. Bronstein, M. M. Bronstein. Are MSER features really interesting?, IEEE Trans. Pattern Anal. Mach. Intell. 33(11) (2011), 2316-2320.

MA

[20] C. Harris, M. Stephens, A combined corner and edge detector, Alvey vision conference, 1988, pp. 15-50. [21] D. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91-110. [22] C. Schmid, R. Mohr, C. Bauckhage, Evaluation of interest point detectors, Int. J. Comput. Vis. 37 (2) (2000) 151-172. [23] J. M. Morel, G. Yu, ASIFT: A new framework for fully affine invariant image comparison, SIAM J. Imaging Sci. 2 (2)

D

(2009) 438-469.

TE

[24] P. Kovesi, Image features from phase congruency, Videre: J. Comput. Vis. Res. 1 (3) (1999) 1-26. [25] P. Kovesi, Phase congruency detects corners and edges, The Australian Pattern Recognition Society Conference: DICTA 2003. 2003, pp. 309-318.

CE P

[26] Y. Wang, Q. Zhang, L. Wang, Corner detection based on Gabor filter decomposition and phase congruency, Intelligent Unmanned Systems , 2014 10th International Conference on, Accepted. [27] A. Myronenko, X. Song, Point set registration: Coherent point drift, IEEE Trans. Pattern Anal. Mach. Intell. 32 (12) (2010) 2262-2275.

AC

[28] J. Noble, Finding corners, Image Vis. Comput. 6 (2) (1988) 121-128. [29] H. Bay, A. Ess, T. Tuytelaars, L. Gool, Speeded-up robust features (SURF), Comput. Vis. Image Underst. 110 (3) (2008) 346-359.

[30] Y. Ke, R. Sukthankar, PCA-SIFT: A more distinctive representation for local image descriptors, Computer Vision and Pattern Recognition CVPR 2004, Proceedings of the 2004 IEEE Computer Society Conference on, IEEE, 2004, pp. 506-513. [31] K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 27 (10) (2005) 1615-1630. [32] A. Wong, W. Bishop, Efficient least squares fusion of MRI and CT images using a phase congruency model, Pattern Recognit. Lett. 29 (3) (2008) 173-180. [33] A. Wong, J. Orchard, Robust multimodal registration using local phase-coherence representations, J. Signal Process. Syst. 54 (1-3) (2009) 89-100. [34] D. Fan, Y. Ye, L. Pan, A remote sensing adapted image registration method based on SIFT and phase congruency, Image Analysis and Signal Processing (IASP), 2011 International Conference on, IEEE, 2011, pp. 326-331. [35] D. Marr, E. Hildreth, Theory of edge detection, Biol. Sci. 207(1167) (1980), 187-217. [36] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell. 8(6) (1986) 679-698. [37] S. Leutenegger, M. Chli, R. Y. Siegwart, BRISK: Binary robust invariant scalable keypoints, Computer Vision (ICCV),

ACCEPTED MANUSCRIPT 2011 International Conference on, IEEE, 2011, pp. 2548-2555. [38] M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, et al., BRIEF: Computing a local binary descriptor very fast, IEEE Trans. Pattern Anal. Mach. Intell. 34(7) (2012) 1281-1298. [39] A. Alahi, R. Ortiz, P. Vandergheynst, Freak: Fast retina keypoint. Computer Vision and Pattern Recognition (CVPR), 2012

T

Conference on, IEEE, 2012, pp. 510-517. [40] T. Song, H. Li, Local polar dct features for image description, IEEE Trans. Signal Process. Lett. 20(1) (2013) 59-62.

IP

[41] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, et al., A comparison of affine region detectors, Int. J. Comput. Vis. 65(1-2) (2005) 43-72.

SC R

[42] P. Hough, Method and means for recognizing complex patterns, U.S. Patent No. 3,069,654. 18 Dec. 1962. [43] K. S. Arun, T. S. Huang, S. D. Blostein, Least-squares fitting of two 3-D point sets, IEEE Trans. Pattern Anal. Mach. Intell. (5)(1987), 698-700.

[44] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, P. Suetens, Multimodality image registration by maximization of

NU

mutual information, IEEE Trans. Med. Imaging. 16(2) (1997) 187-198.

[45] R. Berthilsson, Affine correlation, Pattern Recognition, Proceedings of the 4th International Conference on, IEEE, 1998, pp. 1458-1460.

MA

[46] P. Daniel, D. Whitteridge, The representation of the visual field on the cerebral cortex in monkeys, J. Physiol. 159 (2) (1961) 203-221.

[47] W. Xu, X. Huang, Y. Liu, W. Zhang, A local characteristic scale selection method based on Gabor wavelets, J. Image Graph. 16 (1) (2011) 72-78.

D

[48] M. Fischler, R. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and

TE

automated cartography, Commun. ACM. 24 (6) (1981) 381-395. [49] C. Bishop, Neural networks for pattern recognition, Oxford University Press, USA, 1995. [50] Oxford robotics database, http: //www.robots.ox.ac.uk/ 2012.

CE P

[51] K. Mikolajczyk, Personal homepage, http://lear.inrialpes.fr/people/mikolajczyk/Database/index.html 2012.

Figure captions:

Fig. 1 Diagram of the proposed image registration algorithm.

AC

Fig. 2 MSER detection and matching results on two images with affine distortion. (a) The initially fitted elliptical region in the reference image; (b) The initially fitted elliptical region in the image to be registered. (a) and (b) have the same size of 640  800 . The points marked on the images are the centroids of the matched MSERs. Parts of the image contents contained in the two elliptical regions are different because of many false matches; (c) The finally fitted elliptical regions after the false matches are removed. As shown in (c), there are only four pairs of mismatches which are labeled with green lines. But these four false matches have little effect on the shapes of the two ellipses. Fig. 3 Diagram of the normalization of elliptical regions and coarse alignment. Fig. 4 Detection results of the phase congruency based point detector [25] on two image regions with large scale change. (a) Image captured with short focal-length; (b) Detected feature points on the rectangle region in (a); (c) Image captured with long-focal length; (d) Detected feature points on the rectangle region in (c). The image in (d) is the zoomed result of the rectangle region in (c) by interpolating and it has the same size as the image in (b). Fig. 5 Magnitude-frequency response of a Gabor filter with  =2 2 . (a) 3D plot of G2 1D slice of G2

2

 ,   . x

2

 ,   ; (b) x

y

y

Fig. 6 Feature point detection results on different sub-bands obtained by the band-pass Gabor filter decomposition. (a) Feature point detection result on the original blurred image; (b) ~ (f) Feature point

ACCEPTED MANUSCRIPT detection results on the sub-bands obtained by the Gabor band-pass filter decomposition with

 = 2, 2, 2 2, 4, 4 2 , respectively. The numbers of the feature points in (a) to (f) are 75, 112, 122, 105, 80 and 19, respectively.

T

Fig. 7 Man-made test images with scale changes. The image in the upper left corner is the reference image and the other images are sampled from the reference image with scale factors of 0.5, 0.6, 0.7, 0.8,

IP

0.9, 1.1, 1.2, 1.4, 1.7 and 2, respectively.

Fig. 8 Comparison of different methods on the images with simulated scale changes.

SC R

Fig. 9 Real-world test images with scale changes. (a) Reference image; (b) ~ (e) Images to be registered. They have the same size of 680  850 and the scale change gradually increases from (b) to (e).

Fig. 10 Registration results obtained by the SIFT and the proposed methods on the images in Fig. 9. (a)

NU

Registration results obtained by the SIFT method; (b) Registration results obtained by the proposed method.

Fig. 11 Man-made images with affine distortions. (a) Reference image; (b) ~ (e) Affine transformed T in

Eq. (36) with

MA

images derived from the reference image by using the transformation matrix   0.2, 0.4, 0.6, 0.8 , respectively.

Fig. 12 Registration results of the proposed method on the real-world images with affine distortions. (a) ~ (e) Five pairs of real-world test images with affine distortions and their registration results obtained

D

by our proposed method. The images from left to right are the reference images, the images to be

TE

registered and the registration results, respectively. Fig. 13 Registration results of the proposed method on the images with illumination changes. (a) Reference image; (b) ~ (f) Images to be registered; (g) ~ (k) Registration results obtained by the

CE P

proposed method.

Fig. A1 Illustration of the phase angles and the angle difference. Fig. A2 Illustration of the main interval and singular intervals.

AC

Table captions: Table 1 Registration results of different methods on the real-world images with scale changes in Fig. 9. "--" denotes that the experimental data do not exist for the corresponding methods. The first RMSE values for the proposed method are the refinement results and the second RMSE values, i.e., the values in the brackets, are the coarsely aligned results. In other tables, the meaning of the symbol "--" and the representation of the RMSE values for the proposed method are the same. Table 2 Registration results of different methods on the man-made images with affine distortions in Fig. 11. "×" denotes that the method fails for the corresponding test images. Table 3 Registration results of different methods on the real-world images with affine distortions in Fig. 12. The symbol "×" stands for the same meaning as that in Table 2. Table 4 Registration results of different methods on the images with illumination changes in Fig. 13.

ACCEPTED MANUSCRIPT Highlights A combination of coarse alignment and refinement is employed in the method.



Input images are coarsely aligned by fitting and normalizing the matched MSERs.



A phase congruency and point set registration based refinement step is used.



The method accurately and efficiently aligns images with affine distortions.



The method is robust to illumination changes.

AC

CE P

TE

D

MA

NU

SC R

IP

T