Information Sciences 295 (2015) 323–336
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Feature Guided Biased Gaussian Mixture Model for image matching Kun Sun a, Peiran Li a, Wenbing Tao a,⇑, Yuanyan Tang b a National Key Laboratory of Science and Technology on Multi-spectral Information Processing, School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China b Department of Computer and Information Science, University of Macau, Macau, China
a r t i c l e
i n f o
Article history: Received 20 June 2014 Received in revised form 4 October 2014 Accepted 11 October 2014 Available online 22 October 2014 Keywords: Image matching Feature Guided Biased GMM TPS Deterministic Annealing EM
a b s t r a c t In this article we propose a Feature Guided Biased Gaussian Mixture Model (FGBG) for image matching. We formulate the matching task as a Maximum a Posteriori (MAP) problem by seeing one point set as the centroid of a Gaussian Mixture Model (GMM) and the other point set as the data. A Thin Plate Spline (TPS) transformation between the two point sets is learnt so that the GMM can best fit the data. Our main contribution is to assign each Gaussian mixture component a different weight. This is where our model differs from the traditional Self Governed Balanced Gaussian Mixture Model (SGBG), whose Gaussian mixture components have equal coefficients. The new weight is defined as a value related to feature similarity, which can be computed by simply decomposing a distance matrix in the feature space. In this way, both feature similarity and spatial arrangement are considered. The feature descriptor is introduced as a reasonable prior to guide the matching, and the spatial transformation offers a global constraint so that local ambiguity can be alleviated. We solve this MAP problem in a framework similar to [16], in which Deterministic Annealing and the Expectation Maximization (EM) algorithms are used. We show that our FGBG algorithm is robust to outliers, deformation and rotation. Extensive experiments on self-collected and the latest open access data sets show that FGBG can boost the number of correct matches. Ó 2014 Elsevier Inc. All rights reserved.
1. Introduction Finding corresponding points between two images is one of the fundamental problems in computer vision and is a key ingredient in a wide range of applications including model fitting [19,45,56], motion estimation [7,55], shape recovery [58,41,20], object recognition [15,37,13] and 3D reconstruction [21,2], etc. Corresponding points are the projections of the same scene point and can be Harris/DoG corners or SIFT features in real applications. However, image feature point matching is not an easy task for two reasons: (1) The extracted features contain many outliers and only a small portion of them could be correctly matched. (2) The transformation between these points is complex due to the projection of 3D world points at different depth to the 2D image plane. Consequently the matching results are either too sparse or with too many mismatches. In this article we propose a new method that considers both pairwise feature similarity and overall spatial alignment to address the image feature point matching problem. On the one hand, the correct feature matches can provide ⇑ Corresponding author. E-mail address:
[email protected] (W. Tao). http://dx.doi.org/10.1016/j.ins.2014.10.029 0020-0255/Ó 2014 Elsevier Inc. All rights reserved.
324
K. Sun et al. / Information Sciences 295 (2015) 323–336
reasonable guidance for the spatial alignment even though the transformation is complex and many outliers exist. This is intuitive because usually outliers have much lower feature similarity than inliers. It will be easier for the algorithm to handle complex transformation under the guidance of feature similarity. On the other hand, the spatial arrangement based method requires the motion to be regularized, which can rectify the mistakes in the feature space. The method penalizes the smoothness of the transformation so that neighboring points should move coherently. Compared with correct matches, wrong matches are disorganized. Even though a wrong feature matching with high similarity score provides a wrong guidance, it is still expected to be rectified by the motion constraint. Fig. 1 is a simple example of our method. We consider two images of a scene with repeated patterns. SIFT key points and their descriptors are extracted. We then match them using (a) SIFT [53], (b) RPM [17] and (c) our FGBG algorithm. As the recurring patterns produce many local similar regions, the local feature descriptor based matching method such as SIFT will not work well. From Fig. 1 we can see that mismatches of SIFT relate two parts which have similar local appearance but are globally irrelevant. RPM is unable to find correct matches due to large geometry changes and outliers in the original SIFT feature points. Our method not only finds the most correct matches, but also acquires satisfactory precision. This shows that our thinking of using feature similarity to guide the matching procedure while imposing spatial arrangement constraint is feasible, and can enhance the result. The Gaussian Mixture Model for point sets registration has been studied for a long time. It treats one point set as the centroid of the GMM and points from the other point set as the data described by this GMM. However, in previous work each model point is assigned the same Gaussian mixture coefficient. Thus a data point can move to any model point with equal chance. We refer to this kind of method as Self-Governed Balanced GMM (SGBG). ‘‘Self-Governed’’ means that it relies on nothing but the spatial arrangement of the points themselves, and the word ‘‘Balanced’’ indicates that all the GMM components are assigned equal weights. However, even though the SGBG algorithm has achieved great success in point set registration, it suffers from the following cases due to the lack of other information except for the spatial arrangement: (1) When the point set contains a high ratio of outliers. In the context of image matching, the initial features usually contain a large
TP: 27 TN: 8
(a) SIFT [53]
TP: 4
TN: 70
(b) RPM [17]
TP: 50 TN: 2
(c) FGBG Fig. 1. Illustration of our basic idea. From top to bottom: matching using SIFT [53], matching using RPM [17] and matching using our method. True Positives (TP) are in green and True Negatives (TN) are in red. (a) The images present several repeated patterns so that SIFT will mismatch these local similar parts. (b) Due to outliers and geometry differences, the RPM algorithm is unable to find correct matches. (c) Our idea of combining feature similarity with spatial arrangement can enhance the result. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
K. Sun et al. / Information Sciences 295 (2015) 323–336
325
portion of outliers, which can easily disable the SGBG algorithm. This is especially severe for wide baseline cases. (2) The SGBG method cannot handle large geometry changing such as large rotation, which is common in multi-view images. This is because the transformation corresponding to a large rotation will result in a large smoothness penalization. As a result, the SGBG method will return a seemingly smooth transformation but the correspondences are totally wrong. (3) The shape of the point set is flat or symmetric. The difficulty for aligning this kind of point set is the uncertainty of the transformation. Since more than one transformations could spatially align these two point sets, the SGBG method prefers a simpler and smoother transformation, which may not be the most appropriate. To tackle this, we propose a Feature Guided Biased GMM (FGBG) for image feature points matching. The word ‘‘Biased’’ means that we treat each Gaussian component differently rather than equally as in SGBG. Our main contribution is to assign each GMM component a different weight. Specifically, for a given data point, we compute each GMM component a different weight according to the feature similarity between the data point and the model point corresponding to each Gaussian component. This is achieved by simply decomposing a distance matrix in the feature space [48]. The following steps are the same as SGBG: a smooth transformation is learnt to update the centroids of the GMM so that the model can best fit the data. But the biggest difference between SGBG and FGBG is that by assigning each GMM component a different weight based on feature similarity, we are using the feature information to guide the spatial alignment. In other words, both feature similarity and spatial smoothness are considered in FGBG while SGBG considers only the latter. The motivations for doing this are: 1. If two points are similar in the feature space, they are more likely to form a correct match. So the GMM component should be given a larger weight if its corresponding model point has higher feature similarity with the data point. 2. The feature similarity constraint, which provides a reasonable prior for the alignment, together with the spatial smoothness constraint are encoded in a unified model. As a result, the performance can be enhanced. We will develop our FGBG method in the framework similar to [16], in which the Deterministic Annealing technique [22] and the EM algorithms are used. The main difference is that [16] considers only the spatial arrangement and uses a balanced GMM, while FGBG integrates the feature information into this framework by proposing a biased GMM. Even though the scene is rigid, the transformation between its projections on two image planes is non-rigid when depth discontinuity exists. We use the Thin Plate Spline (TPS) as the transformation model. The EM algorithm is an elegant method for the MAP problem of the GMM. It alternates between finding correspondences in the E-step and estimating the transformation in the M-step. A simple way to find the optimal parameters in the M-step is to differentiate the object function with respect to each parameter and set it to zero. However, as Chui pointed in [16], including extra variables as free parameters can result in more local minimums and makes the optimization harder. So the Deterministic Annealing is used as an alternate to make the method robust and insensitive to initialization. The annealing temperature is seen as the covariance of each Gaussian component at the same time. The Deterministic Annealing algorithm begins with a high temperature and gradually reduces the temperature until it reaches a preset value. In this way the correspondences are searched in a global-to-local way. The remainder of this paper is organized as follows: we give a brief introduction to the related work in Section 2. In Section 3 we introduce our proposed feature guided biased GMM and formulate the matching task as a MAP problem. In Section 4 we solve the problem using the framework similar to [16], in which Deterministic Annealing and the EM algorithms are used. After this, we summarize the algorithm and give the detailed parameters setting in Section 5. Section 6 is the experimental part. Following the robustness test and the experimental analysis of the parameters is the evaluation on a variety of datasets. Finally we conclude in Section 7.
2. Related work Among all kinds of matching methods, three classes deserve to be mentioned: the feature descriptor based methods, the spatial arrangement based methods and the methods considering both of them. Relying on detecting and matching salient features between images, the feature descriptor based methods are intuitive and simple. For each detected feature point, a high dimensional descriptor which represents the image appearance in its local neighborhood is build. The features should be highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features [39]. By this means the matching task in the two dimensional image domain is transformed into a higher dimensional feature space. Lowe [39] proposed a scale invariant feature transform(SIFT). Ke and Sukthankar [30] improved upon the SIFT descriptor by applying Principal Components Analysis(PCA) to the normalized gradient patch instead of using smoothed weighted histograms. Li et al. [34] extended SIFT to multi-spectral images. Bay et al. [3] proposed a much faster descriptor called SURF by relying on integral images and Hessian-matrix based detector. The BRIEF descriptor [11] is a ndimensional binary bitstring computed from pairwise intensity comparison and based on it a new oriented descriptor called ORB is defined by Rublee et al. [46]. Pele and Werman [44] proposed a new distance metric and its linear time computation algorithm for establishing SIFT matches in the descriptor space. Hauagge and Snavely [26] designed a specific descriptor for matching symmetric images. Mikolajczyk and Schmid [40] recently evaluated a variety of approaches and concluded that the SIFT based features perform best. Another problem is that computing pairwise distances between large SIFT sets is cumbersome. Gilinsky and Manor [23] proposed the SIFTpack to save both storage and run time. Hartmann et al. [25] proposed to predict whether a feature point is able to be matched in a machine learning perception. The learned classifier will discard
326
K. Sun et al. / Information Sciences 295 (2015) 323–336
most unmatchable features thus efficiency is boosted. However, although a lot of work has been done to improve the performance of descriptors, the results are either too sparse or with unavoidable mismatches. There are two main reasons for the sparsity. The first one is that only a small portion of the initial features could be matched. In some extreme cases, outliers in the initial features take up to 80%. The other reason is that the matching criteria requires that a point is abandoned if ambiguity happens, e.g. its nearest neighbor and second nearest neighbor are not obviously distinguished. On the other hand, mismatches are mainly because of the nearest neighbor (NN) rule and the locality of the descriptor. The former is reflected in the case of occlusion and the latter is embodied if the image presents several repeated patterns. The second class of methods consider no other information but the spatial layout. They solve the matching problem by point sets registration, in which two best aligned points denote a match. The ICP algorithm [5] is a famous and simple method for rigid registration. ICP alternates between finding the closest point and updating the transformation. However, the requirement that the two shapes have to be close enough at the beginning greatly limits the application of ICP and its variants [47]. Recently a popular view treats the alignment of two point sets as a MAP problem of the Gaussian Mixture Model (GMM). The core idea is to describe one point set by the GMM and learn a transformation to align them [17,42], or model both point sets by the GMM and minimize their divergence [28]. Chui and Rangarajan [17] extended ICP from binary assignment to soft assignment by introducing a matching probability matrix, whose sum of each row or column equals one. This can be seen as an implicit GMM representation. With the help of the Deterministic Annealing technique, the algorithm jointly estimates one-to-one correspondences and a Thin Plate Spline (TPS) [6] transformation between the two point sets. The CPD algorithm [42] explicitly modeles one point set by the GMM and the other set as the data generated by this model. The model points are forced to move coherently towards the data points until they are best aligned. Thus a non-rigid transformation with the form of Gauss Radial Basis Function (GRBF) [43] is learnt. Jian and Vemuri [28] further investigated the registration problem and proposed to model both point sets by the GMM and then minimize their discrepancy. They leveraged the closed form expression for the L2 distance between two densities, leading to an efficient registration algorithm. Lian and Zhang [35] proposed a concave optimization approach for the robust point matching problem when no outliers exist. They further extend their work to handle outliers in [36] by reducing the original object function to a function with fewer nonlinear terms. In spite of the success in point sets registration, the methods mentioned above are seldom used in image matching. This is because (1) an abundant of outliers exist in the initial features, (2) the transformation between two image feature point sets are complex due to the projection of the scene at different depth to the image plane, especially in wide baseline cases. The third class of methods take both feature similarity and spatial arrangement into consideration simultaneously. Among them graph matching is a hot topic. An attributed graph is constructed where a node attribute denotes the local appearance of a feature, and an edge attribute describes the geometric relationship between two features. The task of graph matching is to learn a mapping between two graphs that preserves the structure between them. This is an integer quadratic programming problem that optimizes a unary item, which reflects the local appearance consistency as well as a pair-wise item, which indicates the pair-wise geometric compatibility. Leordeanu and Hebert [32] build an adjacent matrix whose nodes represent possible correspondences and edges denote pairwise agreement between them. The correct correspondences are recovered according to the principal eigenvector of the adjacent matrix. Cour et al. [18] also used the spectral relaxation technique to approximate the solutions but the graph compatibility function is normalized, which greatly improved the matching accuracy. Cho and Lee [14] proposed a novel progressive framework which combines the probabilistic progression of graphs with the matching of graphs. Based on the current graph matching result, the algorithm explores the space of graphs beyond the current graphs. Torresani’s et al. [51] elaborated object function is an instance of graph matching and they proposed to use dual decomposition to globally optimize this energy. Zhou and De la Torre [57] found that by factorizing the pairwise affinity matrix the local structure for the nodes in each graph is decoupled, so that global geometric constraints could be imposed. A fast and scalable approximate spectral matching algorithm is proposed in [29] to address the memory problem when dealing with large data. Wang et al. [54] proposed to use a graph density local estimator, which enables graph matching to handle both outliers and many-to-many object correspondences. However, even though a lot of approximate or progressive methods have been proposed to control the computational complexity, the application of graph matching methods is still limited. Cai et al. [9,10] proposed to learn spatio-temporal dependency from low level features of similar images and reduce feature dimensions to improve the computation efficiency for both feature similarity and spatial smoothness. Some works formulate matching as an energy minimization problem. Isack and Boykov [27] jointly solved feature matching and multi-model fitting problems by minimizing one energy function. Liu et al. [38] decomposed deformation into three basic components and then formulated the matching problem as a two-dimensional label Markov Random Field. The binary descriptors have been widely used in the vision task for their fast computing speed and much low memory consumption. Zhuang et al. [59] proposed a new binary code that encodes both ‘‘Intensity Difference Quantization’’ and ‘‘Weakly Spatial Context Coding’’. Other works [50,49,24] utilized both kinds of information by embedding the image coordinates and feature descriptors into a unified subspace. To solve this problem, the authors defined a weight matrix whose diagonal blocks indicate the structure of each point set while the off-diagonal blocks reflect the feature similarity across different point sets. The embedded result is the solution of the Laplacian Embedding problem defined by the weight matrix. An element in this subspace can be seen as a new point defined both by its spatial and feature information. Then the matching problem is solved in the learnt subspace. A pairwise matching algorithm PW [49] was proposed in which a spectral decomposition for the affinity matrix in the subspace is adopted to find matches. In [50] each image is represented by a set of coordinates in the subspace and a manifold is learnt based on the Hausdorff distance for object classification. The
327
K. Sun et al. / Information Sciences 295 (2015) 323–336
work of Torki and Elgammal was then improved by Hamid et al. [24]. In this work, random projection is used to approximate subspace learning and matches with high confidence were used to guide the procedure for dense matching. However, the matching in the learned subspace is based on the Euclidean distance, which may also produce mismatches. 3. Mathematical formulation for the Feature Guided Biased GMM M N Suppose we have two point sets X ¼ ðxi ; g i Þjxi 2 R2 ; g i 2 RD i¼1 and Y ¼ ðyj ; hj Þjyj 2 R2 ; hj 2 RD j¼1 . The former in the two-tuples is the 2-Dimension image coordinate and the latter is its corresponding N-Dimension feature descriptor. M and N are the number of points in each point set, respectively. Our goal is to learn a transformation f that can best align f ðXÞ and Y and then to find matches. To achieve this, we model points from X by the Gaussian Mixture Model so that the probability of a data point from Y is
pðyj Þ ¼
M X
v ij N ðyj ; xi ; r2 ; f Þ;
ð1Þ
i¼1
where 2
ky f ðxi Þk 1 j N ðyj ; xi ; r2 ; f Þ ¼ pffiffiffiffiffiffiffi e 2r2 2pr
is the component of the GMM and introduce another term
ð2Þ
v ij is the weight of each component. In order to account for outliers in the point set, we
kyj x0 k2
1 N yj ; x0 ; r20 ¼ pffiffiffiffiffiffiffi e 2pr0
2r2 0
ð3Þ
P to Eq. (1). Here x0 ¼ N1 Nj¼1 yj and r20 ¼ max kya yb k2 are the center and covariance of the outlier component, respectively. Then Eq. (1) is extended to the following form
pðyj Þ ¼ ð1 hÞ
M X
v ij N
yj ; xi ; r2 ; f þ hN yj ; x0 ; r20 ;
ð4Þ
i¼1
in which h is the ratio of outliers that we assume the point set may contain. In SGBG, fv ij gM is set to a constant for each data point yj , thus only the spatial arrangement of the point set is used to i¼1 constrain the alignment. Under this situation, a data point will have the same opportunity to move towards each of the model point. We find this treatment of v ij is simple but some useful information is neglected. Different from SGBG, we comM pute fv ij gM i¼1 according to the feature similarity between fxi gi¼1 and y j . The intuition is that if a model point and a data point are similar in the feature space, they are more likely to form a correct match. So the component represented by this model point should be assigned a larger weight. By doing so, the feature similarity is utilized together with the spatial arrangement and provides reasonable prior for the alignment. However, one problem is that two non-matching points may have high feature similarity score. That is to say, the prior is not guaranteed to be completely correct. In our method, such mistakes are expected to be addressed by the constraints on the spatial transformation. To conclude, we introduce the feature similarity to guide the spatial alignment by computing different fv ij gM i¼1 for y j , and use the spatial transformation constraints to rectify the mistakes in the feature space. Denote dð; Þ as a metric that measures the pairing score between two features, then v ij has the following form:
v ij ¼ dðg i ; hj Þ:
ð5Þ
Suppose V is a M N matrix with elements v ij . Then the jth column of V is the weight vector of the GMM describing yj . There are many candidates for the choice of d. Here we leverage the SLH algorithm [48] to compute V. It computes a pairing matrix whose elements indicate the matching confidence of two features. Bigger values indicate more likely matches while smaller (or even negative) values indicate very impossible matches. In addition, the values are highly distinctive, which means that a point cannot be strongly associated with more than one points. This allows us to use the feature information to tendenkg i hj k2
tiously guide the spatial alignment. Specifically, we compute a ‘‘proximity’’ matrix G with elements Gij ¼ e 2b2 and then get its singular value decomposition G ¼ TDU, where D is a M N matrix satisfying Dij ¼ 0 ði – jÞ and Dij > 0 ði ¼ jÞ. We then replace D with another matrix E and compute the matrix V from V ¼ TEU. E is acquired by replacing Dij > 0 ði ¼ jÞ by one. If D is a square matrix, i.e. M ¼ N, this is equivalent to replacing D with an identity matrix of the same size. Compared with G, the matrix V has two good properties: (1) V reflects the ‘‘proximity’’ between features because it originates from G and (2) the ‘‘exclusion’’ rule, which prefers one-to-one match, is also considered in V because of its orthogonality. We can see that V is more robust and less ambiguous than G. However, V cannot be directly used as the weight in our application since it contains negative elements. The negative values in V can be set to zero because they indicate very impossible or even wrong matches according to [48]. Since the corresponding mixture coefficients are set to zero, these matches will not provide guidance in our FGBG framework.
328
K. Sun et al. / Information Sciences 295 (2015) 323–336
Suppose the data points are independent identically distributed, then the joint probability distribution of the whole data set Y is
pðYjX; f ; r2 Þ ¼
N Y
pðyj Þ:
ð6Þ
j¼1
Eq. (6) is also known as the likelihood function. We assume that the prior knowledge is that the transform function should be smooth. According to the regularization theory, one function is said to be smoother than another when it has less energy at high frequency [17,42]. Denoting L as an operator that extracts the high frequency part of the function f , we take the prior with the form k
2
pðf Þ ¼ e2kLf k :
ð7Þ
Thus according to the Bayes rule the posterior probability is
pðf jY; X; r2 Þ / pðf ÞpðYjX; f ; r2 Þ:
ð8Þ
4. The solution based on EM and Deterministic Annealing In Section 3 we have introduced our feature guided biased GMM and converted the matching problem into a MAP problem. In this part, we introduce how to solve this problem. Two main techniques: the EM algorithm and the Deterministic Annealing method in [16] are also used. As is known to all, maximizing Eq. (8) is equivalent to minimizing the following negative logarithm energy function
E1 ðf ; r2 Þ ¼ log pðYjX; f ; r2 Þ log pðf Þ:
ð9Þ
However, Eq. (9) is difficult to solve because it does not offer a closed form solution for the parameters. The EM algorithm is an elegant algorithm to solve the problem in Eq. (9). The EM algorithm alternates between two steps: the E-step and the M-step. In the E-step, the correspondences are estimated based on current parameters, and in the M-step, the parameters are updated according to current correspondences. As Chui pointed in [16], including extra variables as free parameters can result in more local minimum and makes the optimization harder. So we are not willing to treat r as a free parameter. We leverage the Deterministic Annealing technique to make our method robust and insensitive to initialization. Specifically, we replace the parameter r2 with a newly introduced parameter T, and gradually reduce it in the matching process. As a result, after introducing the Deterministic Annealing algorithm to the EM framework, our goal is to minimize the following energy function:
E2 ðP; f Þ ¼ kTkLf k2 þ
N X M N X M N X M X X X pij kyj f ðxi Þk2 þ T log T pij þ T pij log pij ; j¼1 i¼1
j¼1 i¼1
ð10Þ
j¼1 i¼1
where P is a matrix with elements pij and T is the annealing temperature. E-step: in the E-step, a probability matrix P is estimated. Each of its element is the posterior of the GMM component computed from
v ij N ðyj ; xi ; T; f Þ ; k¼1 v kj N ðyj ; xk ; T; f Þ þ c0
pij ¼ PM
ð11Þ
h where c0 ¼ 1h N yj ; x0 ; r20 is a constant. pij in Eq. (11) indicates to what extent a data point yj corresponds to a model point xi . Then each column of P is the matching score vector of a certain data point to all the GMM components. In our algorithm, we will denote matches according to P. Note that different from the SGBG method, pij in Eq. (8) is modulated by the introduced v ij . pij will take a large value only when xi and yj are close to each other in both spatial and feature space. To this end, our FGBG integrates both spatial information and feature information to find matches. M-step: in the M-step, the transformation function f is updated based on P estimated in the E-step by minimizing
E3 ðf Þ ¼ kTkLf k2 þ
M X kzi f ðxi Þk2 ;
ð12Þ
i¼1
P where zi ¼ Nj¼1 pij yj can be seen as the new estimated position of the data. The non-rigid transformation f parameterized by the Thin Plate Spline (TPS) is
f ðxi ; d; xÞ ¼ xi d þ /ðxi Þx;
ð13Þ
where d and x are the affine and non-affine transformation matrix, respectively. Each model point xi is represented by its homogenous coordinate. /ðxi Þ is a vector with elements computed from the kernel /a ðxi Þ ¼ kxa xi k2 log kxa xi k. Thus q. (13) becomes
E4 ðf Þ ¼ ktraceðxT UxÞ þ kZ Xd Uwk2 ;
ð14Þ
329
K. Sun et al. / Information Sciences 295 (2015) 323–336
where Z; X and U are the concatenated matrix form of zi ; xi and /ðxi Þ, respectively. To solve x and d, we first apply the QR decomposition to X
X ¼ ½Q 1 Q 2
R 0
ð15Þ
;
and substitute it into Eq. (14). Then the optimal solutions of x and d are
x ¼ Q 2 Q T2 UQ 2 þ kI M3
1
Q T2 Z
ð16Þ
and
d ¼ R1 Q T1 X Ux :
ð17Þ
We finish the above matching process and return the probability matrix P after convergence. As is mentioned above, each column of P implies to what extent a certain data point matches to all the model points. Suppose the maximum for each column is stored in a vector fdn jdn ¼ pmn ; n ¼ 1 . . . N; m 2 ½1; Mg, which means that the maximum of the nth column is its mth element. Then we denote xm and yn as a match. To this end, dn can be seen as the matching confidence. We can say that a match is ‘‘strong’’ if its matching confidence is high and ‘‘weak’’ otherwise. To achieve more robust results, we introduce a threshold parameter s on the matching confidence dn and discard the matches with confidence lower than s. 5. Algorithm and implementation details 5.1. Summary of the FGBG algorithm The flowchart of our FGBG method is shown in Fig. 2. At the beginning of our algorithm, two images I1 and I2 are read from the disk. Feature points are then extracted, including their image coordinates and feature descriptors. Next the matrix V
Compute V according to feature similarity
T
TH , E old
EM Iteration Extract feature point sets X and Y
E-step: compute P according to Eq.(11)
Read I1, I2
End Discard matches whose n Find matches according to P
Deterministic Annealing
Start
M-step: update f according to Eq.(13), (16) and (17) Compute E new from Eq.(10) E new E old E old
N Y T
Return P computed in the E-step
Y
T
T
TL
Fig. 2. The flowchart of our FGBG algorithm.
N
330
K. Sun et al. / Information Sciences 295 (2015) 323–336
(a) The “Zleby4” pair and matches selected. 100
90
number of correct matches
number of correct matches
80 70 60 50 RPM[17] CPD[42] PGM[14] PW[49] SLH[48] SIFT[53] FGBG
40 30 20 10
80 60 40 20
RPM[17] CPD[42] PGM[14] PW[49] FGBG
0
0 −20
−10 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
0
10
20
30
40
50
60
ε
outlier ratio
70
80
90
100
2
(c) deformation
(b) Outliers
Fig. 3. Robustness to outliers and deformation tested on the image pair ‘‘Zleby4’’ [12]. Outliers are randomly selected points and deformation perturbations are generated by N ð0; e2 Þ. The results show that our method has strong ability to recover matches with high ratio of outliers or large deformation.
80 CPD[42]
RPM[17]
PGM[14]
PW[49]
SLH[48]
FGBG
number of correct matches
70 60 50 40 30 20 10 0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
image index Fig. 4. Robustness to rotation tested on the NewYork sequence [1].
is computed using the SLH algorithm. Each column of V is the weight vector of the GMM describing a certain data point. We set the starting temperature to a relatively high value T H and initialize the energy of EM as infinity. The main part of our algorithm consists of two nested loops. The inner loop is the EM iteration, during which the temperature T is a constant. In the E-step, the probability matrix P is estimated from Eq. (11) and in the M-step the transformation model f is updated according to Eqs. (13), (16), and (17). After each iteration we compute the new EM energy Enew . The change between Enew and Eold is indicated by a. If a is smaller than a threshold u, the energy tends to be stable and the EM algorithm converges. Otherwise the E- and M-steps are repeated. The outer loop is the Deterministic Annealing iteration. Different from the inner loop, the outer loop gradually decreases T by a rate c. The annealing procedure will end if T reaches a relatively lower temperature
K. Sun et al. / Information Sciences 295 (2015) 323–336
331
(a)
number of correct matches
170 165 160 155 150 145 140 135 130 125 10000 1000 100
10
1
1e−1 1e−2 1e−3 1e−4 1e−5 1e−6
λ0
(b) Fig. 5. (a) The image pair ‘‘castle’’. (b) The performance with different k0 . The best performance is reached when k0 is around 0.1.
T L . Then FGBG returns the probability matrix P computed in the E-step and correspondences are built by finding the maximum for each column of P. Finally, we discard matches whose matching confidence dn is smaller than a threshold s. 5.2. Analysis and setting of the parameters We then explain our parameter setting. We follow the setting of b in [48]. In Eq. (4) the parameter h is the ratio of outliers that we assume the point set would contain. Since this ratio may vary for different instances and is difficult to know in advance, we set it to 0.5, which assumes an equal split of inliers and outliers. The starting temperature T H , the ending temperature T L and the annealing rate c are set as described in [16]. We set T H ¼ r20 ¼ max kya yb k2 , where a; b 2 ½1; . . . ; N and P T L ¼ N1 Ns¼1 min kys yt k2 , where t 2 ½1; . . . ; N and t – s. This means that at the beginning a data point is allowed to match to all the model points with even probability, and at the final stage a data point should precisely match to only one model point. The annealing rate c is set to 0.93. k is a very important parameter that controls the smoothness of f . A large k will be less tolerant to the perturbation of the transformation while a small k may lead to a disordered motion. In the former case the transformation is smooth but the data may be under-fitting. In the latter case we may fall into the trap of over-fitting and the learned transformation is too complex. We set k to a moderate value based on experimental analysis in Section 6.2. Another important parameter is s, which has close relation with the precision and the amount of matches. We set its value to 0.5 in our experiments. In our implementation, we do not use u to control the convergence of EM. Instead, we fix the iteration number of EM to 5 for each temperature. Since correspondences are searched in a global to local way, this simplification will not lead to significant performance reduction. 6. Experiment results 6.1. Robustness test of the FGBG algorithm We first carry out experiments to show that our FGBG algorithm is robust to outliers and deformation. Given the image pair ‘‘Zleby4’’ [12] in Fig. 3(a), we manually mark 83 points on each image so that they compose 83 one-to-one true positive matches, which are treated as the ground truth. After computing the SIFT feature descriptor for each point, we re-find the
332
K. Sun et al. / Information Sciences 295 (2015) 323–336
(a) 30
PGM[14] PW[49] FGBG
25 20 15 10 5
0
1
2
3
4
5
6
7
8
9
10 11 12
number of correct matches
number of correct matches
30
PGM[14] PW[49] FGBG
25 20 15 10 5
0
1
2
3
4
5
6
7
8
house image pair
hotel image pair
(b)
(c)
9
10 11 12
Fig. 6. The results on the ‘‘house’’ and ‘‘hotel’’ sequences. (a) The sampled frames from the ‘‘house’’ and ‘‘hotel’’ sequences. (b) and (c) are the results on each sequence compared with PGM [14] and PW [49].
correspondences between the two point sets, and compare the result with the ground truth. Outliers are added to both point sets by randomly selecting pixels with their SIFT descriptors computed. We gradually increase the ratio of outliers from 0% to 90%. On the other hand, we regard deformation as the Gaussian White noise added to the position for each point. Specifically, we add each point a perturbation generated by a Gaussian N ð0; e2 Þ with zero mean and e standard. We also gradually increase e so that the original point set is more deformed. Since the selection of outliers and the generation of Gaussian White noise are stochastic, we repeat 50 trials for both outlier and deformation tests and then plot the mean and variance. For both outlier and deformation tests, we compare our method with some state-of-the-art methods such as RPM [17], CPD [42], PGM [14] and PW [49]. Another two methods SLH [48] and SIFT [53] are not compared in the deformation test because only the position of the points are changed but their feature descriptors remain the same. From Fig. 3(b) we can see that with the ratio of outlier increases, the performance of FGBG drops a little, but is still the best when compared with others. In Fig. 3(c) our method can always find all the correct matches when the deformation parameter increases. This shows that our method has strong ability to recover correct matches with high outlier ratio and large deformation. Then we adopt the NewYork sequence [1] to test the robustness to rotation. This sequence contains 35 images, with the rotation angle from 0° to 360 . We match the first image to all the other 34 images and evaluate the results using the ground truth provided. From Fig. 4 we can see that our method can find the most correct matches for all the image pairs. The PW [49] algorithm is slightly worse than FGBG around 180 . The SLH [48] algorithm can always find a certain number of matches but is quite unstable. Large rotation angle leads to collapse for RPM [17], CPD [42] and PGM [14]. 6.2. The analysis of parameter k k is the weight coefficient that controls the smoothness of the TPS transformation. A small k will lead to over-fitting, which is sensitive to outliers while a large k may result in under-fitting, which is known as ‘‘fuzzy correspondence’’. In this paper we set k ¼ k0 N, where N is the number of points in X and the constant k0 is the modulation coefficient of k. So we need to find the optimal k0 . We run FGBG on the image pair ‘‘castle’’ [31] and evaluate the results of different k0 values. Fig. 5 shows the impact of different k0 on the performance of the algorithm. There is a peak of the number of correct matches when k0 is around 0.1. In our experiment we follow this conclusion and set k0 to 0.1.
333
K. Sun et al. / Information Sciences 295 (2015) 323–336
TP:42 TN:4
TP:0 TN:60
TP:33 TN:16
TP:49 TN:8
TP:53 TN:1
TP:106 TN:4
TP:3 TN:119
TP:108 TN:3
TP:109 TN:7
TP:110 TN:0
TP:67 TN:11
TP:1 TN:81
TP:11 TN:66
TP:69 TN:13
TP:69 TN:3
TP:76 TN:9
TP:1 TN:114
TP:9 TN:92
TP:88 TN:21
TP:95 TN:0
TP:27 TN:8
TP:4 TN:70
TP:8 TN:57
TP:42 TN:27
TP:50 TN:2
TP:78 TN:6
TP:3 TN:91
TP:32 TN:50
TP:80 TN:9
TP:84 TN:2
(a) SIFT [53]
(b) RPM [17]
(c) PGM [14]
(d) PW [49]
(e) FGBG
Fig. 7. The matching results of 6 image pairs with several repeated patterns as well as apparent geometry differences. True Positives (TP) are in green while True Negatives (TN) are in red. Our method performs the best when considering both the number of correct matches and the precision. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
6.3. Results on real images First we employ the ‘‘house’’ and ‘‘hotel’’ sequences [8] to test our algorithm. Each sequence contains several frames that show the 3D motion of a simple object. On each frame 30 manually labeled landmarks are provided. In our application, we get 12 frames by sampling every 9 frames for the ‘‘hotel’’ sequence and every 10 frames for the ‘‘house’’ sequence. Then we match the first frame to all the other frames, respectively. The feature descriptor used here is the Shape Context [4]. Fig. 6(a) shows the sampled frames. In (b) and (c) we plot the number of matches found by PGM [14], PW [49] and FGBG. It shows that our method defeats the other two methods which also consider both the feature similarity and the spatial arrangement. Then we would like to display more examples to visually show the advantage of our method in Fig. 7. The images were taken by a digital camera by us. Feature descriptor based methods may produce many false matches since each of these images contains several repeated patterns. Besides, there exist apparent geometry differences as well as outliers between a pair of images, which may also defect the spatial arrangement based methods. True positives and true negatives are manually labeled. From the results we can see that most mismatches of SIFT [53] relate two parts that have similar local appearance but differ from each other globally. RPM [17] also collapses due to outliers and geometry differences. PGM [14] and PW [49] are another two state-of-the-art methods which consider both spatial and feature constraints, but their results are still not as good as ours. Our method can find the most correct matches and at the same time the least mismatches are included. We want to use this experiment to show that our idea of dual constraints can enhance the result when using only feature descriptor or spatial arrangement is insufficient. We further test our algorithm on four other datasets used in recent works: ‘‘Cech [12]’’, ‘‘Lebeda [31]’’, ‘‘Li [33]’’ and ‘‘Tuytelaars [52]’’. A variety of state-of-the-art methods including RPM [17], CPD [42], SLH [48], SIFT [53], PGM [14], LTHM
Table 1 The average running time for different methods (in seconds). Method
Time
Method
Time
Method
Time
SIFT [53] PW [49] CPD [42]
1.65 12.37 97.75
SLH [48] PMRP [24] RPM [17]
2.19 54.33 334.36
LTHM [44] PGM [14] FGBG
6.13 31.43 86.51
334
K. Sun et al. / Information Sciences 295 (2015) 323–336 1400 RPM[17]
CPD[42]
SLH[48]
SIFT[53]
PGM[14]
LTHM[44]
PW[49]
PMRP[24]
FGBG
number of correct matches
1200
1000
800
600
400
zl eb y
ee va lb on ne ve te v Vy se hr ad Zl eb y3 Zl eb y4
q2
tr
pl an t
o
h rc la
yo t K
ap
ad
m lo v
ru K
he
he
D
d1 et en ic e D re sd en fe ed er
r ca
ap pl e
0
Lo le nd s on Lo −a si nd a on Lo −r nd el on ie f −v ic to ria
200
(a) Cech [12]
(c) Li [33]
100
h
ee
se
ex
as w
tr
oa
m
ur
to
0
ch
50
au
number of correct matches
RPM[17] CPD[42] SLH[48] SIFT[53] PGM[14] LTHM[44] PW[49] PMRP[24] FGBG
150
ch
ee ttr bo
zo om
ro tu nd a sh ou t va lb on ne
rr
pa
co
m ka
le st
bo x
ca
sh
w al l
ok
bo
(b) Lebeda [31]
ro
0
0
vo
100
lie
100
200
re
200
300
ar
300
400
be
400
RPM[17] CPD[42] SLH[48] SIFT[53] PGM[14] LTHM[44] PW[49] PMRP[24] FGBG
ba xi a
RPM[17] CPD[42] SLH[48] SIFT[53] PGM[14] LTHM[44] PW[49] PMRP[24] FGBG
number of correct matches
number of correct matches
200
500
500
(d) Tuytelaars [52]
Fig. 8. The result on another four datasets: ‘‘Cech [12]’’, ‘‘Lebeda [31]’’, ‘‘Li [33]’’ and ‘‘Tuytelaars [52]’’. The number of correct matches is evaluated by the fundamental matrix.
[44], PW [49] and PMRP [24] are compared with our FGBG algorithm. For each image pair we robustly estimate a fundamental matrix. Specifically, we manually select even distributed correct matches across the images and then apply RANSAC to get the result. We then use the estimated fundamental matrix to evaluate the matching result. An important assumption is that if the fundamental matrix is correctly estimated, it will act as the ground truth. In real application, we down sample some of the high resolution images for computational efficiency. Fig. 8 displays the number of correct matches for all the images. It can be seen that for the majority of image pairs, our FGBG can find the most matches. It is worth mentioning that for ‘‘box’’ in (b), ‘‘church’’, ‘‘mex’’ and ‘‘tree’’ in (d), FGBG boosts the number of correct matches more than twice when compared with other methods. More correct matches will be of great use in many other applications. We finally analyze the efficiency of our algorithm. All the methods are tested with their default parameter settings on a 2.6 GHz, 4G RAM PC. The average running time for different methods are listed in Table 1. SIFT [53], SLH [48], LTHM [44] and PW [49] are methods that will finish in a few seconds. PGM [14], PMRP [24], CPD [24] and FGBG will run tens of seconds before the end. RPM is the most time consuming because the Deterministic Annealing technique is relatively slow. It should be noticed that although the FGBG method also exploits the Deterministic Annealing technique the feature guidance strategy can greatly accelerate the convergence of the algorithm. So the proposed FGBG algorithm is more efficient than the RMP algorithm. 7. Conclusion Our main contribution is that we propose a new Feature Guided Biased Gaussian Mixture Model for image matching. First, image feature points together with their descriptors are extracted from a pair of images. One point set is modeled as the centroid of such a GMM and the other point set is treated as the data. We assign each Gaussian mixture component a different weight according to feature similarity between the data point and each of the model point. This is achieved by simply decomposing a distance matrix in the feature space. By doing so both the feature similarity and spatial arrangement are considered at the same time. The feature descriptor is introduced as a reasonable prior to guide the matching, and the spatial transformation offers a global constraint so that local ambiguity can be alleviated. The MAP problem is solved in a framework similar to [16]. Comparison results with state-of-the-art methods on a variety of datasets show that our FGBG method can boost the number of correct matches.
K. Sun et al. / Information Sciences 295 (2015) 323–336
335
Acknowledgement We would like to thank the reviewers for their time spending and the valuable comments. This work is supported by the National Natural Science Foundation of China (Grants 61371140 and 61273279), and in part by the National High-Tech Research and Development Program of China (863 Program). References [1] http://lear.inrialpes.fr/people/Mikolajczyk/Database/rotation.html. [2] A. Albarelli, E. Rodol, A. Torsello, Imposing semi-local geometric constraints for accurate correspondences selection in structure from motion: a gametheoretic perspective, Int. J. Comput. Vis. 97 (1) (2012) 36–53. [3] H. Bay, A. Ess, T. Tuytelaars, L.V. Gool, Speeded-up robust features (SURF), Comput. Vis. Image Underst. 110 (3) (2008) 346–359. [4] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Trans. Pattern Anal. Mach. Intell. 24 (4) (2002) 509– 522. [5] P. Besl, N.D. McKay, A method for registration of 3-D shapes, IEEE Trans. Pattern Anal. Mach. Intell. 14 (2) (1992) 239–256. [6] F.L. Bookstein, Principal warps: thin-plate splines and the decomposition of deformations, IEEE Trans. Pattern Anal. Mach. Intell. 11 (6) (1989) 567– 585. [7] T. Brox, J. Malik, Large displacement optical flow: descriptor matching in variational motion estimation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (3) (2011) 500–513. [8] T.S. Caetano, J.J. McAuley, L. Cheng, Q.V. Le, A.J. Smola, Learning graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 31 (6) (2009) 1048–1058. [9] Q. Cai, Y. Yin, H. Man, DSPM: dynamic structure preserving map for action recognition, in: 2013 IEEE International Conference on Multimedia and Expo (ICME), 2013, pp. 1–6. [10] Q. Cai, Y. Yin, H. Man, Learning spatio-temporal dependencies for action recognition, in: 2013 20th IEEE International Conference on Image Processing (ICIP), 2013, pp. 3740–3744. [11] M. Calonder, V. Lepetit, C. Strecha, P. Fua, Brief: binary robust independent elementary features, in: Proc. European Conf. Computer Vision, vol. 6314, 2010, pp. 778–792. [12] J. Cech, J. Matas, M. Perdoch, Efficient sequential correspondence selection by cosegmentation, IEEE Trans. Pattern Anal. Mach. Intell. 32 (9) (2010) 1568–1581. [13] M. Cho, J. Lee, J. Lee, Feature correspondence and deformable object matching via agglomerative correspondence clustering, in: IEEE International Conference on Computer Vision, 2009, pp. 1280–1287. [14] M. Cho, K.M. Lee, Progressive graph matching: making a move of graphs via probabilistic voting, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 398–405. [15] M. Cho, Y.M. Shin, K.-M. Lee, Unsupervised detection and segmentation of identical objects, in: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1617–1624. [16] H. Chui, A. Rangarajan, A feature registration framework using mixture models, in: Proc. IEEE Workshop on Math. Methods in Biomedical Image Analysis, 2000, pp. 190–197. [17] H. Chui, A. Rangarajan, A new point matching algorithm for non-rigid registration, Comput. Vis. Image Underst. 89 (2–3) (2003) 114–141. [18] T. Cour, P. Srinivasan, J. Shi, Balanced graph matching, in: Advances in Neural Information Processing Systems, vol. 19, 2006, pp. 313–320. [19] A. Delong, A. Osokin, H. Isack, Y. Boykov, Fast approximate energy minimization with label costs, Int. J. Comput. Vis. 96 (1) (2012) 1–27. [20] J. Fayad, C. Russell, L. Agapito, Automated articulated structure and 3D shape recovery from point correspondences, in: IEEE International Conference on Computer Vision, 2011, pp. 431–438. [21] Y. Furukawa, J. Ponce, Accurate,dense, and robust multiview stereopsis, IEEE Trans. Pattern Anal. Mach. Intell. 32 (8) (2010) 1362–1376. [22] D. Geiger, F. Girosi, Parallel and deterministic algorithms from MRFs: surface reconstruction, IEEE Trans. Pattern Anal. Mach. Intell. 13 (5) (1991) 401– 412. [23] A. Gilinsky, L. Manor, SIFTpack: a compact representation for efficient sift matching, in: IEEE International Conference on Computer Vision, 2013, pp. 777–784. [24] R. Hamid, D. Decoste, C.J. Lin, Dense non-rigid point-matching using random projections, in: IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2914–2921. [25] W. Hartmann, M. Havlena, K. Schindler, Predicting matchability, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014. [26] D. Hauagge, N. Snavely, Image matching using local symmetry features, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 206–213. [27] H. Isack, Y. Boykov, Energy based multi-model fitting and matching for 3d reconstruction, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014. [28] B. Jian, B. Vemuri, Robust point set registration using gaussian mixture models, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1633–1645. [29] U. Kang, M. Hebert, S. Park, Fast and scalable approximate spectral graph matching for correspondence problems, Inform. Sci. 220 (0) (2013) 306–318. [30] Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2004, pp. 506–513. [31] K. Lebeda, J. Matas, O. Chum, Fixing the locally optimized RANSAC, in: Proceedings of the British Machine Vision Conference, 2012, pp. 95.1–95.11. [32] M. Leordeanu, M. Hebert, A spectral technique for correspondence problems using pairwise constraints, in: IEEE International Conference on Computer Vision, vol. 2, 2005, pp. 1482–1489. [33] X. Li, Z. Hu, Rejecting mismatches by correspondence function, Int. J. Comput. Vis. 89 (1) (2010) 1–17. [34] Y. Li, W. Liu, X. Li, Q. Huang, X. Li, GA-SIFT: a new scale invariant feature transform for multispectral image using geometric algebra, Inform. Sci. 281 (0) (2014) 559–572. [35] W. Lian, L. Zhang, Robust point matching revisited: a concave optimization approach, in: Proc. European Conf. Computer Vision, 2012, pp. 259–272. [36] W. Lian, L. Zhang, Point matching in the presence of outliers in both point sets: a concave optimization approach, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014. [37] H. Liu, S. Yan, Common visual pattern discovery via spatially coherent correspondences, in: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1609–1616. [38] K. Liu, J. Zhang, K. Huang, T. Tan, Energy based multi-model fitting and matching for 3D reconstruction, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014. [39] D. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2) (2004) 91–110. [40] K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 27 (10) (2005) 1615–1630. [41] F. Moreno-Noguer, J. Porta, Probabilistic simultaneous pose and non-rigid shape recovery, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1289–1296. [42] A. Myronenko, X. Song, Point set registration: coherent point drift, IEEE Trans. Pattern Anal. Mach. Intell. 32 (12) (2010) 2262–2275. [43] A. Myronenko, X. Song, M.A.C. Perpinan, Non-rigid point set registration: coherent point drift, in: Advances in Neural Information Processing Systems, 2006, pp. 1009–1016.
336
K. Sun et al. / Information Sciences 295 (2015) 323–336
[44] O. Pele, M. Werman, A linear time histogram metric for improved sift matching, in: Proc. European Conf. Computer Vision, vol. 5304, 2008, pp. 495– 508. [45] T.T. Pham, T.-J. Chin, J. Yu, D. Suter, The random cluster model for robust geometric fitting, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 710–717. [46] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, ORB: an efficient alternative to sift or surf, in: IEEE International Conference on Computer Vision, 2011, pp. 2564–2571. [47] S. Rusinkiewicz, M. Levoy, Efficient variants of the ICP algorithm, in: International Conference on 3-D Digital Imaging and Modeling, 2001, pp. 145–152. [48] G.L. Scott, H.C. Longuet-Higgins, An algorithm for associating the features of two images, Proc. Roy. Soc. Lond. B Biol. Sci. 244 (1309) (1991) 21–26. [49] M. Torki, A. Elgammal, One-shot multi-set non-rigid feature-spatial matching, in: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3058–3065. [50] M. Torki, A. Elgammal, Putting local features on a manifold, in: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1743–1750. [51] L. Torresani, V. Kolmogorov, C. Rother, A dual decomposition approach to feature correspondence, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2) (2013) 259–271. [52] T. Tuytelaars, L. Van Gool, Matching widely separated views based on affine invariant regions, Int. J. Comput. Vis. 59 (1) (2004) 61–85. [53] A. Vedaldi, B. Fulkerson, VLFeat: An Open and Portable Library of Computer Vision Algorithms, 2008.
. [54] C. Wang, L. Wang, L. Liu, Improving graph matching via density maximization, in: IEEE International Conference on Computer Vision, 2013, pp. 3424– 3431. [55] Y. Wang, J. Gong, D. Zhang, C. Gao, J. Tian, H. Zeng, Large disparity motion layer extraction via topological clustering, IEEE Trans. Image Process. 20 (1) (2011) 43–52. [56] J. Yu, T.-J. Chin, D. Suter, A global optimization approach to robust multi-model fitting, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 2041–2048. [57] F. Zhou, F. De la Torre, Deformable graph matching, in: IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2922–2929. [58] J. Zhu, S. Hoi, M. Lyu, Nonrigid shape recovery by gaussian process regression, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1319–1326. [59] D. Zhuang, D. Zhang, J. Li, Q. Tian, Binary feature from intensity quantization and weakly spatial contextual coding for image search, Inform. Sci. (0) (2014).