Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Geodesic-like features for point matching Deheng Qian a, Tianshi Chen b,c, Hong Qiao c,d,n a
Samsung Research Institute China - Beijing (SRC-B), Beijing 100028, China State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China Chinese Academy of Sciences Centre for Excellence in Brain Science and Intelligence Technology, Shanghai 200031, China d State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China b c
art ic l e i nf o
a b s t r a c t
Article history: Received 18 May 2015 Received in revised form 24 August 2015 Accepted 29 August 2016 Communicated by Zidong Wang
Point matching problem seeks the optimal correspondences between two sets of points via minimizing the dissimilarities of the corresponded features. The features are widely represented by a graph model consisting of nodes and edges, where each node represents one key point and each edge describes the pair-wise relations between its end nodes. The edges are typically measured depending on the Euclidian distances between their end nodes, which is, however, not suitable for objects with non-rigid deformations. In this paper, we notice that all the key points are spanning on a manifold which is the surface of the target object. The distance measurement on a manifold, geodesic distance, is robust under non-rigid deformations. Hence, we first estimate the manifold depending on the key points and concisely represent the estimation by a graph model called the Geodesic Graph Model (GGM). Then, we calculate the distance measurement on GGM, which is called the geodesic-like distance, to approximate the geodesic distance. The geodesic-like distance can better tackle non-rigid deformations. To further improve the robustness of the geodesic-like distance, a weight setting process and a discretization process are proposed. The discretization process produces the geodesic-like features for the point matching problem. We conduct multiple experiments over widely used datasets and demonstrate the effectiveness of our method. & 2016 Elsevier B.V. All rights reserved.
Keywords: Point matching Non-rigid deformation Geodesic distance
1. Introduction It is a standard method to represent an image by extracting a set of key points from the image. Matching key points of two images is an important and fundamental problem in the field of computer vision. The application scope of matching key points broadly includes object recognition, 3D reconstruction, motion detection and so on. In order to improve the matching precision, the spacial relations among key points are exploited by representing each set of key points with a graph model. Each node of the graph represents a key point and each edge of the graph represents the spacial relations between its two end nodes. Typically, the weights on edges are assigned according to the Euclidian distances among points, which is unsuitable for non-rigid deformations. For example, two key points are far from each other in one image while their counterparts can be deceptively close in the other image when the target object deforms non-rigidly. The target object is the object from whose surface the key points are extracted. To tackle the point matching problem under non-rigid deformations, we notice that the surface of the target object is a n
Corresponding author. E-mail address:
[email protected] (H. Qiao).
manifold. As a distance measurement on a manifold, the geodesic distance is able to preserve the intrinsic geometry of the manifold and be robust to non-rigid deformations as demonstrated in [30]. This measurement can be valuable to handle non-rigid deformations between sets of points in the point matching problem. To apply the geodesic distance, the manifold, which is the surface of the target object, should be estimated firstly. In this paper, we propose to estimate the manifold depending on the set of key points under the assumption that all the key points are extracted from a unique and connected surface of the target object. We use the “manifold” and the “surface” interchangeably. We evaluate the probabilities of points to be on the manifold according to their distances to key points. Points with high probabilities belong to the estimation of the manifold. The resulted estimation of the manifold enable us to calculate the geodesic distance between key points. However, for computational efficiency, we represent the estimation by a graph model which is called the Geodesic Graph Model (GGM). In GGM, every node represents one key point in the point set. The geodesic distance between key points on the estimation is approximated by the geodesic-like distance defined on the GGM. The geodesic-like distance is designed to imitate the geodesic distance. We calculate geodesic-like distances as suggested in [30].
http://dx.doi.org/10.1016/j.neucom.2016.08.092 0925-2312/& 2016 Elsevier B.V. All rights reserved.
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
2
For two neighboring key points, their geodesic-like distance is well approximated by their Euclidean distance. For two faraway key points, their geodesic-like distance is approximated by adding up a sequence of short edges between neighboring key points. Similar to the geodesic distance, the geodesic-like distance is robust to non-rigid deformations of the manifold. To further improve the robustness of the geodesic-like distances, we discretize their lengths. We represent the distribution of lengths of the geodesic-like distances by histograms. Then the sequence number of a histogram is a feature of all the lengths in the histogram. This feature is called the geodesic-like feature. In the meanwhile, we calculate the reliabilities of lengths of geodesic-like distances according to how well the geodesic-like distances approximate their corresponding geodesic distances. The Geodesic-like features and the reliabilities are exploited to tackle the point matching problem. In order to evaluate our algorithm, we conduct several experiments on various datasets. Experimental results show the effectiveness of the proposed algorithm. In summary, the main contributions of this paper are threefold: 1) To the best of our knowledge, we are the first to estimate the surface of the target object depending on the set of extracted key points. Under the assumption that all the key points are on the surface of the object, every point in the image is assigned a probability to be on the surface of the target object according to its distances to key points. 2) Based on our graph model GGM, we propose the geodesic-like distance as a new distance measurement between key points. Compared with conventional distance measurements, the geodesic-like distance is robust to non-rigid deformations of the target object. 3) To increase the robustness of our algorithm, we further propose to estimate the reliabilities of lengths of geodesic-like distances and come up with the geodesic-like features by exploiting a discretization process. The reliabilities and the geodesic-like features are applied in the point matching tasks to demonstrate their effectiveness. The rest of this paper proceeds as follows: Section 2 discusses some related works. Section 3 details the procedure of building the GGM. Section 4 describes the method to obtain the geodesic-like features. Section 5 formulates the point matching as an energy minimization task. Section 6 reports the experimental results. The final section states our conclusions.
2. Related work Point matching methods are applied in many situations. Pan et al. propose to establish the point correspondence between a 3D reference face and an input 3D face [24]. Sahbi et al. propose to recognize different logos via a point matching process [27]. Due to the wide application of the point matching problem, myriad studies dedicated to this problem [9,14]. The algorithms tackling this problem can be roughly categorized into two types: algorithms for rigid deformation matching problem and algorithms for non-rigid deformation matching problem [8]. Algorithms for rigid deformation matching problem demand the deformation type of the target object to be rigid. In spite of their strict assumptions, these algorithms can solve lots of matching problems. They are well studied and widely applied for their relative simplicity and robustness [7]. The RANSAC algorithm [12,25] samples the matching result to estimate the parameters of a predefined transformation model. The ICP algorithm [33]
iteratively revises the transformation to minimize the distance from one point set to the other point set. However, they have difficulties in handling objects with non-rigid deformations. In order to handle non-rigid deformations, algorithms for nonrigid deformation matching problem are extensively studied. Chui and Rangarajan propose the TPS-RPM [8] which uses the thinplate spline [4] to parameterize the non-rigid deformations. Schnabel et al. propose the free-form deformations based on multi-level B-splines with multi-resolution optimization [28]. These methods focus on modeling the deformations between point sets. In the meanwhile, another approach focuses on modeling the geometric structure of each point set [10,16]. Tsin and Kanade cast the matching problem as finding the maximum kernel correlation configuration of two point sets, where the kernel correlation is defined as a function of the entropy of the point set [32]. Myronenko and Song represent the template points by a gaussian mixture model such that the matching problem is tackled as a probability density estimation problem [22]. Jian and Vemuri propose to represent the input point sets using Gaussian mixture models such that a statistical discrepancy measure between the two corresponding mixtures is minimized [15]. Zhou and Torre propose to represent each point set by a graph model and handle non-rigid point matching problems from a graph matching [17] point of view [13,36]. Scott and Nowak enforced an order preserving constraint in a contour matching method to regularize the matching process [29]. Some researchers [18,34] note that the point set is a manifold [1,26,30] embedded in an image. The geometric structure of a point set can be described by methods which describe the geometric relations among data points on a manifold. Inspired by the LLE algorithm [26] which is a manifold learning algorithm, Li et al. propose an object matching method [18] based on the fact that one point can be linearly reconstructed by its neighbor points. Their method keeps the reconstruction weights to describe the geometric relations among points. However, this method is not suitable for non-rigid deformations since the reconstruction weights of each point change largely under non-rigid deformations. Zheng and Doermann propose the PLNS algorithm [34] which has a similar idea with another famous manifold learning algorithm LPP [23]. Their method matches two points if the correspondence of one point's neighbor is a neighbor of its correspondence. This neighborhood relationship is much more robust during non-rigid deformations. However, this method discards all the geometric information between pairs of points that are far from each other, which abandons valuable information and limits the matching precision. The achievements of the above mentioned algorithms show that it is fruitful to apply the methods which capture the geometric structure on a manifold to the point matching problem. The algorithm proposed in this paper is inspired by the Isomap algorithm [30] which is another popular manifold learning algorithm. Isomap represents the data points on a manifold with a graph model. This algorithm describes the geometric structure of data points depending on the geodesic distances between points. Since the geodesic distance is insensitive to non-rigid deformations, this distance measurement is a good choice for representing the geometric structure between key points in the matching problem. Elad and Kimmel propose an algorithm [11] which takes advantage of the geodesic distances and is used in the 3D object matching problem. This algorithm is a typical work using the geodesic distance to describe the spatial relations among key points. To exploit the essence of the geodesic distance on 2D images, some studies come up with new measurements. Ling's inner distance method [19,20] for shape classification is the most famous
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
one. Their method has a variety of applications [3]. However, the inner distance is calculated depending on the contour of the shape in the image. When there is no contour or it is too vague to locate the position of the contour, the use of this algorithm is limited. In this paper, we regard the 2D surface of the target object as a manifold. Then, we propose to estimate the surface of the target object. Depending on the estimated surface, a new distance measurement is applied, which imitates the geodesic distance defined on a manifold. A preliminary version of this paper is to appear in Proceedings of the 2015 IEEE International Conference on Mechatronics and Automation (ICMA 2015).
3. Geodesic graph model In this section, we detail the procedures of estimating the surface of the target object which is regarded as a manifold. After that, we introduce the Geodesic Graph Model (GGM) which is a concise representation of the estimation of the manifold. 3.1. Estimation of manifold Let I = {p1, p2, … , pm } be an image which contains a surface of a target object. pi , i = 1, 2, … , m, are points in the image. In this image, we assume T = {p1, p2, … , pn }, n ≤ m, be the surface of the target object. K = {p1, p2, … , pk }, k ≤ n , are key points extracted from the surface. We estimate T depending on K and denote the estimated surface by T ′. T ′ is obtained by two steps. Firstly, we estimate T as points in the image with high probabilities to be on the manifold. Secondly, we extend the estimation to be a unique and connected one. In the first step, we follow [15] to present the point set by a Gaussian mixture model. Specifically, we apply a Gaussian mixture model M to estimate the probabilities of other points to be in T. M includes k Gaussian models whose means locate at pi , i ≤ k respectively. All the Gaussian models are assumed to have the same variance. Hence, given the point set K, the probability of a point pj ∈ I , to be in T is K
∑
P (pj |M ) =
cN (pj ; pi , Σ ),
(1)
i=1
where c = is the normalization factor, Σ is the variance of the Gaussian model. Depending on this formulation, points near densely located key points have high probabilities. However, the key points are often extracted from corners or edges of the surface of the target object. There may be many key points extracted from a corner while only a few from the middle of the surface. Hence, the density of key points should not affect the probability of other points. In order to address this issue, we calculate the upper bound of the probabilities of all the points. We notice that 1 K
P (pj |M ) ≤ K max cN (pj ; pi , Σ ) = maxN (pj ; pi , Σ ). i
i
(2)
Hence, we calculate a upper bound of P (pj |M ) by:
P (pj ) ≡ maxN (pj ; pi , Σ ), i
(3)
which is only determined by the distance from pj to the nearest
key point pi . If P (pj ) is lower than a threshold δ, pj is not included in T ′. After the first step, the resulted estimation of T may be separated into several parts. However, the surface of the object should be a unique and connected one. Hence, in the second step, we extend the obtained estimation with some links such that the
3
separate parts can be connected by these links. In this step, we demand that all the points in the extended links should be in T with as high as possible probabilities. In other words, we connect two separate parts by a link e where the points on e have the highest average upper bound of the probabilities to be in T:
e = argmax e
∫e
P¯ (pi ) dpi L
s . t . pi ∈ e ,
(4)
where L is the length of the link e. In practice, we may not need to calculate the upper bound of the probability of every point. Particularly, let the variance ⎡σ2 o ⎤ Σ = ⎢ o1 2 ⎥ of Gaussian models have the same variance on difσ2 ⎦ ⎣ ferent dimensions, i.e. σ1 = σ 2. Then, if there are only two separate parts, we can link them by the straight line linking the closest points of the two parts according to Eqs. (3) and (4). When there are more than two parts, we can iteratively link the closest two parts until a unique and connected estimation T ′ is obtained. Actually, we can efficiently obtain T ′ by two steps based on the analysis above. Firstly, we draw a circle at every key point with radius ε, where ε = ‖pi − pj ‖ and N (pj ; pi , Σ ) = δ , and δ is a positive threshold. All points within the circle are included in T ′. Secondly, we build the minimal spanning tree of the complete graph of the key points. All points on edges of the tree are added into T ′. As shown in Fig. 1, we obtain the final estimation of the manifold by combing the selected points with two steps. 3.2. Geodesic graph model Based on T ′, we can calculate the geodesic distance between every pair of key points by Dijkstra algorithm or A star algorithm. Both of the algorithms run in time O (kn′2 ) where n′ is the number of all points in T ′. For computational efficiency, we propose to approximate the geodesic distance between key points by a new distance measurement which is called the geodesic-like distance. To this end, we represent T ′ by a graph model which is called the Geodesic Graph Model (GGM). The GGM consists of nodes and edges. Each node represents one key point. The edges of GGM are built by two steps under the assumption that σ1 = σ 2. Firstly, we link every pair of nodes whose Euclidian distance is smaller than 2ε . Secondly, we build the minimal spanning tree of the complete graph of all the nodes. And we link every pair of nodes who are linked by an edge on the minimal spanning tree. After these two steps, we obtain the GGM. All edges of GGM are on the estimation of the manifold T ′. Intuitively, GGM is the “skeleton” of T ′. The procedure of building the GGM runs in time O (k2), which is described in Algorithm 1. Algorithm 1. The procedure of building the geodesic graph model Input: The coordinations of all key points K. Output: The geodesic graph model of the points K. 1 Calculate the Euclidian distances between every pair of key points to build the complete graph. 2 Remove all the edges longer than a threshold 2ε in the complete graph to obtain a new graph. 3 Build the minimal spanning tree of the complete graph. 4 Combine the new graph and the minimal spanning tree together by adding all their edges into a single graph model. The resulted graph model is the GGM of the points K. 3.3. Geodesic-like distance Based on GGM, we are able to define the geodesic-like distance now. The geodesic-like distance between two nodes is the shortest
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
4
Fig. 1. (a) Indicates the locations of key points. (b) Shows the selected points in the first step. (c) Is the minimal spanning tree of the complete graph of all key points. (d) Is the estimation of the manifold which is a combination of the results in (b) and (c). (e) Represents the estimation of the manifold by a graph model which is called the geodesic graph model.
path connecting them on the GGM. The geodesic-like distances between all pairs of key points can be calculated by Dijkstra algorithm in time O (k 3). Since k⪡n′, the geodesic-like distance can be calculated much more efficiently than the geodesic distance on T ′ (which is O (kn′2 )). However, the geodesic-like distance is only an approximation of the geodesic distance. Let dT ′ (pi , pj ) be the length of the geodesic distance between key points pi ∈ K , and pj ∈ K , on the estimated manifold T ′, dG (pi , pj ) be the length of the geodesic-like distance between pi and pj on the GGM. We have
⎧ ⎪ dG (pi , pj ) ≥ dT ′ (pi , pj ) ⎨ ⎪ ⎩ dG (pi , pj ) < dT ′ (pi , pj ) + 2ε (B (pi , pj ) + 1),
(5)
where B (pi , pj ) is the in-between nodes on the shortest path connecting pi and pj on the GGM (see Appendix A for details). According to inequalities (5), B (pi , pj ) shows how precisely dT ′ (pi , pj ) is approximated by dG (pi , pj ). Consequently, we propose to set the reliabilities of the geodesic-like distances according to the number of in-between nodes:
w (pi , pj ) = ρ−B (p i, p j) ,
(6)
where w (pi , pj ) is the reliability of the dG (pi , pj ), ρ is a positive number controling the decreasing speed of the reliability. 4. Geodesic-like feature In this section, the geodesic-like distances are exploited to produce geodesic-like features which are used for matching.
Note that the geodesic-like distance is not invariant to scaling. In the meanwhile, because of the infection of noise or object deformations, the geodesic-like distance between two key points in a point set is generally different from that of their counterparts in the other point set. These problems make the geodesic-like distances unsuitable for matching. To address these problems, we introduce a discretization process which exploits the geodesic-like distances to produce the geodesic-like features. The geodesic-like features are invariant to scaling. And they are more robust to noise than the geodesic-like distances. The discretization process is introduced as follows. We denote the distribution of the lengths of all geodesic-like distances on GGM by histograms. Then, we substitute the length of a geodesic-like distance by the sequence number of the corresponding histogram. To be specific, let s and l be the length of the shortest geodesiclike distance and the longest one on a GGM respectively. Then the interval [s, l] is divided equally into J histograms. The sequence number of each histogram is regarded as the geodesic-like feature of the geodesic-like distances which locate in this histogram. This J (d −s ) discretization process can be formalized as f = ⎡⎢ (l G−s) ⎤⎥, where dG is the length of a geodesic-like distance and f is the corresponding geodesic-like feature of dG. The geodesic-like features are invariant to scaling since the geodesic-like features always span the range [1, J ]. In the meanwhile, the geodesic-like features are robust to noise. Intuitively, if the length of a geodesic-like distance changes a little, the length should still belongs to the same histogram as the original one does with a high probability. We calculate the
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
5
G1 = { K1, E1} and G2 = { K2, E2 } respectively. K1 = ⎡⎣ p1, p2, …pk1⎤⎦ and K2 = ⎡⎣ q1, q2, …qk2⎤⎦ are sets of nodes. E1 and E2 are sets of edges. Hence, matching two sets of points is converted to matching the nodes of two GGMs. The matching problem is to find the correspondences among nodes of G1 and G2 while the dissimilarities of corresponded nodes are minimized. Assuming k1 ≤ k2, we add k2 − k1 dummy points {pk1+ 1, pk1+ 1, … , pk2 } into K1. Let a binary variable matrix
Fig. 2. Discretization process: the length dG is substituted by the sequence number of the histogram.
probability with which the histogram number of one length stays the same in spite of the infection of noise. We demonstrate that the probability only depends on the ratio between the range of the histogram and the variance of the noise, as is shown below. All lengths of geodesic-like distances are in the interval [s, l]. We assume that their original lengths conform to a uniform distribution in [s, l] while infected by Gaussian noise. Without loss of generality, we further assume that the Gaussian distribution has 0 mean which can be denoted as N 0, σ 2 . As shown in Fig. 2, we choose a interval [h1, h2 ] which contains a length dG of a geodesiclike distance. Then, the distribution of dG conforms to a Gaussian distribution N μ, σ 2 whose mean value μ conforms to a uniform distribution U (h1, h2 ). The probability that dG belongs to the interval [h1, h2 ] can be formalized as:
(
(
)
)
d2
Δ μ +Δ − G ∼ 1 1 P ( h1, h2 , σ ) ≡ · e 2σ 2 ddG dμ , 2Δ 2π σ −Δ μ −Δ (7) ∼ h2 −h1 where P (h1, h2, σ ) is the probablity, Δ = 2 . Using r = Δ/σ , the function above can be rewritten as:
∫ ∫
∼ P ( h1, h2 , σ ) =
r 2 2π Δ2
Δ
μ +Δ
∫−Δ ∫μ −Δ
e
−
r 2dG2 2Δ2
d dG d μ .
(8)
With the help of Taylor's theory, the function above can be rewritten as (see Appendix B for details):
∼ P (r ) =
⎞ 1 ⎛ 4r 8r 3 64/3r 5 − + − ⋯⎟ . ⎜ 3 5 2 π⎝ 2 ⎠ 3( 2 ) 10 ( 2 )
(9)
∼ The probability P (r ) is an increasing function of the ratio r. Note that the variance of the noise s is generally small. If Δ is much larger than s (r = Δ/σ ), the probability that most noise infected lengths stay in the same histogram as they originally do will be big enough. For ∼ ∼ example, when r = 1, P (r ) ≈ 0.61 and when r = 2, P (r ) ≈ 0.80. Hence, the discretization process makes the geodesic-like feature more stable than the continuous length of the geodesic-like distance.
5. Matching problem formulation Depending on the geodesic-like features, we are able to match sets of key points. The matching problem is cast as an energy minimization task in our algorithm. In this section, the energy function is introduced first. The optimization scheme of this function is described after that.
X = {0, 1}k2× k2 represents the matching result. To be specific, Xij = 1 denotes that the i-th node of G1 and the j-th node of G2 are corresponded, and otherwise Xij = 0. We use the widely applied one-to-one constraint which makes each row of X contain exactly one 1, meaning every node in K1 must be matched to exact one node in K2. Hence, the matching problem is formulated as a quadratic assignment problem:
min xT H x x
s . t . 1T X = 1T , X T 1 = 1, X ∈ {0, 1}k 2 × k 2 .
(10)
where x is a vector formed by concatenating column vectors of X, 1 is a column vector whose all elements are 1, H is a k2 k2 × k2 k2 symmetrical matrix. Let Hij; ab be the entry of H at the ((i − 1) k2 + j )-th row and ((a − 1) k2 + b)-th column. Hij; ab measures the cost if we match pi with qj and match pa with qb at the same time. To be specific, Hij; ab is calculated as:
⎧ λD p , p , if i = a, j = b u i j ⎪ ⎪ Hij; ab = ⎨ Dp p , p , q , q , if i ≠ a, j ≠ b i a j b ⎪ ⎪ ⎩ 0, otherwise .
(
)
(
)
(11)
where Du calculates the dissimilarities of unary features, Dp calculates the dissimilarities of pair-wise features, λ is a positive weight controlling the relative importance of the unary features and pair-wise features. A detailed description of the matrix H is available in [35]. For the unary features, many kinds of local features can be applied such as SIFT, SURF [21]. The distance between two features is defined according to the chosen feature descriptor. In this paper, the shape context [2] is applied as our local feature, which describes the distribution of all remaining points from the view of a reference point. we apply the geodesic-like features as the pair-wise features. The global cost of a pair of correspondences can be calculated as:
(
)
Dp pi , pa , qj , q b = − e−w (p i, p a ) w (qj, qb )
f (p i, p a )− f (qj, qb )
,
(12)
where w is the reliability of geodesic-like distance (see Section 3.3) and f is the geodesic-like feature. The matching problem (10) is a quadratic assignment problem which is NP-complete [31]. Brute methods such as the branch and bound method [5], whose worst and average complexities are exponential, are used to search the solutions. These methods are only available for small scale problems. To speed up the solving process, researchers turn to an approximate solution by applying approximation methods. In our experiments, we exploit the IPFP algorithm [17] as our optimization scheme. The IPFP algorithm solves the corresponding problem and provides a solution which indicates the correspondences. In summary, the framework of our algorithm is illustrated in Fig. 3.
5.1. Energy function
6. Experimental results
Each key point in an image is represented by a node of the GGM. Let two sets of points be represented by two GGMs,
In order to make a complete performance evaluation of our algorithm, we test our algorithm on various datasets and compare
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
6
Fig. 3. The framework of our algorithm. Based on locations of key points, we build the geodesic graph models. Then the discretization process produces the geodesic-like features. And the reliabilities of geodesic-like distances are estimated. The features and their corresponding reliabilities are input into an energy optimization scheme to obtain the matching result.
it with other related methods. Since the way to construct the graph model plays a central roll in our algorithm, we compare our algorithm with graph matching algorithms where the graph models are constructed by conventional methods. The compared methods include the Delaunay triangulation (Del), the k nearest neighbors (Knei) and the ϵ neighbors (Enei). The Del triangulates the point set such that no point is inside the circumcircle of any triangle. In the meanwhile, the Del maximizes the minimum angle of all the angles of the triangles. The Knei links each point with its k nearest points. The Enei links every pair of points whose distance is smaller than ϵ. On these graph models, the distances between points are calculated as the lengths of the shortest path connecting points. The distances are used in the matching problem as features. Besides, all the methods apply the IPFP [17] to solve the quadratic assignment problem. All the methods are evaluated according to their matching precisions. The matching precision is defined as the ratio between the number of correct correspondences of inliers and the total number of inliers. 6.1. Tools dataset In our first experiment, we evaluate our algorithm on the tools dataset [6]. This dataset includes images of 7 kinds of tools. Each kind of tool has 5 images. Each tool deforms by rotating around its joint.
We match every pair of images of the same tool. To quantitatively evaluate the performance of our algorithm, we manually select 18-30 key points along the silhouette of the shape in each image and label their corresponding ground truth. Fig. 4 shows instances of the points labeled on different images. The matching result is summarized in Table 1 where the first row indicates the kinds of tools and other rows show the matching precisions. In general, the geodesic-like features are robust to articulated deformations. The key points selected from surfaces of articulated objects can be matched by our algorithm precisely. However, if the key points are too sparse, the estimation of the manifold may be different from the true surface of the target object. Consequently, the GGM of the estimated manifold may fail to provide reliable geodesic-like features. For example, we eliminate 6 points from the middle of the hilt of each knife shape. Two resulted GGMs are shown in Fig. 5. In the left image, a point pa in the blade is linked to another point pb in the tail by a dog-leg path through an in-between point pc . While in the right image, because of large deformation, pa is linked to pb directly out of the surface of the object without any inbetween points. Hence, the geodesic-like feature f (pa , pb ) in the left image is quite different from that in the right image. As a result, the average precision of matching images of the knife decreases from 88.7% to 68.8%. (In this case, the precisions of other graph models, i.e., Del, Knei, and Enei, decrease to 57.4%, 65.9%, and 65.9% respectively.).
Fig. 4. The first row shows where the key points are manually labeled in our experiment. The second row shows the graph models established by linking every pair of points whose distance is smaller than 2ε . The third row shows the minimal spanning trees. The fourth row shows the GGMs of the shapes of tools. The last row shows the indexes of different tools.
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Table 1 Matching precisions on the tools dataset (unit: percent).
7
Table 2 Matching precisions on the Chinese character deformation sub-dataset (unit: percent).
Tool indexes
1
2
3
4
5
6
7
Del Knei Enei Ours
87.9 75.3 82.6 87.4
77.2 69.6 76.0 83.6
87.9 87.4 87.4 88.7
59.7 66.4 57.3 83.3
44.3 27.9 37.5 82.9
42.1 50.3 44.1 89.0
56.1 55.5 54.5 82.1
Deformation level
0.02
0.035
0.05
0.065
0.08
Del Knei Enei PLNS Our algorithm
0.937 0.931 0.941 0.941 0.962
0.529 0.630 0.594 0.550 0.634
0.285 0.331 0.290 0.361 0.400
0.145 0.185 0.159 0.209 0.211
0.093 0.132 0.117 0.182 0.147
Table 3 Matching precisions on the fish deformation sub-dataset (unit: percent).
Fig. 5. Two GGMs on different knife images. (a) pa and pb are linked by a dog-leg path through a in-between node pc . (b) pa and pb are linked directly.
Fig. 6. A matching instance in the Chinese deformation sub-dataset, where the point sets deform non-rigidly. The correspondences obtained by our algorithm are indicated by green lines.
This problem is in analogy to the “short circuit error” in manifold learning algorithms. To addressed this problem, more key points should be extracted from the surface of the target object. When more key points are extracted, the estimation of the manifold would be more precise. And the geodesic-like features would be more robust to articulated deformations. 6.2. Chinese character and fish dataset In our second experiment, we test our algorithm on the Chinese character and fish dataset [8]. This dataset includes two subsets, i.e, a Chinese character deformation sub-dataset and a fish deformation for sub-dataset. Each subset includes 500 pairs of images with non-rigid deformations (see Fig. 6 for instance). Every 100 pairs of images bear the same level of non-rigid deformations. The level of non-rigid deformations is quantified following [8]. 6.2.1. Empirical comparison Here, we also compare our algorithm with the PLNS algorithm1 [34] which tackles the non-rigid point matching problem using an idea proposed in a manifold learning algorithm [23]. PLNS iteratively matches two point sets and warps one set towards the other such that the two point sets are registered. For a fair comparison, the PLNS algorithm runs for one iteration, i.e., the algorithm only matches the point sets for once and does not warp one point set to the other. As shown in Tables 2 and 3, this experiment demonstrates the advantage of our algorithm in representing the spatial relations 1 We use the code of the PLNS algorithm which is implemented by its authors and apply the default parameters.
Deformation level
0.02
0.035
0.05
0.065
0.08
Del Knei Enei PLNS Our algorithm
0.946 0.994 0.878 0.880 0.998
0.787 0.844 0.566 0.482 0.873
0.493 0.529 0.338 0.365 0.601
0.381 0.426 0.284 0.250 0.497
0.279 0.290 0.198 0.193 0.356
between sets of key points with non-rigid deformations. Conventional graph models can build a relatively stable graph while the set of points deforms rigidly. However, their performances degenerate while the set of points deforms non-rigidly. That is because the spatial relations represented by these graph models are based on conventional distance measurements, such as the Euclidian distance which is very sensitive to the changes caused by non-rigid deformations. In the meanwhile, our algorithm constantly outperforms PLNS. A possible explanation is that PLNS only focuses on neighboring structures. It abandons all geometric relations between far away points. In contrast, our algorithm exploits the geometric relations between every pair of points. 6.2.2. Parameter setting Note that, the parameter ε plays a central roll in our algorithm. In this paper, we set ε = αdm where dm is the mean Euclidian distance between key points and α is a positive ratio. We show how our algorithm behaves when we vary α. As shown in Fig. 7, neither a too large α nor a too small one obtains the best matching precision in this experiment. In our algorithm, the parameter α is set to balance the trade-off between the ability of the GGM to capture non-rigid deformations and the robustness of the GGM. Specifically, by decreasing α, we can vary the GGM from the complete graph of all key points to the minimal spanning tree of the complete graph. α represents our priori on the level of non-rigid deformations. A small α enables the resulted GGM to capture small deformations of the point set. However, the structure of the resulted GGM is easy to be changed. In contrast, a large α leads to a more robust GGM. Empirically, we set α = 0.6 when the deformation of the point set is mainly non-rigid and otherwise α = 3. 6.3. CMU house and hotel dataset In our third experiment, we test our algorithm on the CMU house and hotel dataset [31]. This dataset includes two sub-datasets, i.e., the house sub-dataset and the hotel sub-dataset. The house sub-dataset consists of 111 sequential frames of a 3D rigid house model. Similarly, the hotel sub-dataset consists of 100 frames of a 3D hotel model. Every frame in this dataset includes 30 key points. The deformation between frames are resulted from the change of view point. Fig. 8 shows a pair of matched images in the house sub-dataset. We create frame pairs by using two frames separated by a specific number of in-between frames. The separation between frames increases from 10 to 90 with step size 10. The matching
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
8
Fig. 7. The matching error rates with different α.
algorithm. The number of outliers added into each image increases from 0 to 10 with step size 2. Note that our algorithm is proposed with the assumption that all the key points are extracted from the surface of the target object. When the assumption is not hold, the performance of our algorithm degenerates, as shown in Fig. 10. That is because the outliers change the structures of graph models. The pair-wise features between points on a graph model would be quite different from those of their counterparts on the other graph model. Interestingly, our algorithm using the geodesic-like features outperforms all other algorithms using conventional graph models. This may owes to the weight setting process in our algorithm. Specifically, the weight setting process enables our algorithm to pay more attention to the pathes that link nodes with less in-between nodes. For example, the pathes linking nodes directly (without any in-between nodes) would have the greatest influence on the matching result. In contrast, a path linking two nodes has at lest one in-between node if it goes through any outliers. Hence, the pathes going through outliers will have less influence on the matching result than the pathes who link inliers directly. Consequently, in our algorithm, the weight setting process alleviates the influence of outliers on the matching process and enables our algorithm to outperform conventional algorithms.
7. Conclusion Fig. 8. Typical matching instance of our algorithm on the house dataset.
results under different separations between frames are shown in Fig. 9. As shown in the results, our algorithm achieves the best or the second best performance in this dataset. Hence, when sets of points deform rigidly, our algorithm has competitive performance compared with other algorithms, even though our algorithm is designed to be robust to non-rigid deformations. A possible explanation is that rigid deformations can be regarded as special cases of non-rigid deformations. 6.4. Pascal dataset In this experiment, we use a real image dataset [17] which includes 30 image pairs of cars and 20 image pairs of motorbikes that are selected from the PASCAL 2007. Each pair of images includes 15–52 manually labeled correspondences and several outliers. Using this dataset, we test how the outliers would affect our
In this paper, we apply a distance measurement in manifold learning algorithms, i.e., the geodesic distance, to the point matching problem. Specifically, we propose a new matching algorithm depending on the estimation of the surface of the target object. The esitmation is consicely represented by the GGM. The distances between nodes on GGM are exploited to produce features for matching. In addition, a reliability estimation process strengthens our algorithm. With multiple experiments, we demonstrate that our algorithm can provide robust features for the point matching problem. In experiments, our algorithm outperforms other methods while handling non-rigid deformations and performs competitively while handling rigid deformations. Though outliers make the precision of our algorithm degenerate, our algorithm can still perform better than graph matching methods which use conventional graph models. In our future research, we would like to build a mechanism to automatically set the parameters in our algorithm depending on the key points and the task.
Fig. 9. (a) Shows the matching precisions on the house sub-dataset; (b) shows the matching precisions on the hotel sub-dataset.
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
9
Fig. 10. (a) Shows the matching precisions on the car sub-dataset; (b) shows the matching precisions on the motorbike sub-dataset.
Acknowledgement This work is supported in part by the Strategic Priority Research Program, Chinese Academy of Sciences under Grant XDB02080003, in part by BMST under Grant D16110400140000 and Grant D161100001416001, in part by National Natrual Science Fundation of China under Grant 61522211 and Grant 61473275.
Appendix A. Proof of inequalities (5) In this appendix, we prove inequalities (5). We consider 2 points pi and pj . Note that dT ′ (pi , pj ) is the shortest path between pi and pj on the estimated manifold T ′. dG (pi , pj ) is the shortest path between pi and pj on the GGM. In the meanwhile, all edges of GGM are on the estimated manifold T ′. Hence, dT ′ (pi , pj ) ≤ dG (pi , pj ) is proved. Recall the steps to obtain the estimation of the manifold T ′. After step one, we obtain several separate parts (e.g., there are two parts in Fig. 1(b)). Hence, we can divide the path of dG (pi , pj ) into two kinds of segments. One kind of segments are the segments within each part. The other kind of segments are segments connecting different parts. Without regarding to the two end nodes, the length of segments within each part is shorter than 2εb (pi , pj ), where b (pi , pj ) is the number of in-between points within the part. Since there are B (pi , pj ) nodes in the path of dG (pi , pj ), the in-between nodes would induce segments whose lengths are shorter than 2εB (pi , pj ) in total. For the two end nodes, they would induce 2ε in total to the length of dG (pi , pj ). In summary, the total length of segments within parts is shorter than 2ε (B (pi , pj ) + 1). For the segments connecting different parts, these segments are in common to dG (pi , pj ) and dT ′ (pi , pj ). Consequently, dG (pi , pj ) < dT ′ (pi , pj ) + 2ε (B (pi , pj ) + 1). Then, inequalities (5) is proved.
Δ
r 2 2π Δ2
μ +Δ
∫−Δ ∫μ −Δ
1 μ= 2 πΔ
e
r 2dG2
−
d dG d
2Δ2
r ( u +Δ)
Δ
2
∫−Δ ∫r ( u−Δ2 Δ)
e − x dx du . (13)
2Δ
We use the Taylor formula to the integrand and get:
1 2 πΔ
=
r ( u +Δ)
Δ
∫−Δ ∫r ( u−Δ2 Δ)
2
e − x dx du (14)
2Δ
r ( u +Δ)
⎛ ⎞ 1 ⎜ 1 − x2 + x 4 − ⋯⎟ dxdu ⎝ ⎠ 2
1 2 πΔ
∫−Δ ∫r ( u−Δ2 Δ)
1 2 πΔ
1 5 x − ⋯⎟ ∫−Δ ⎜⎝ x − 13 x3 + 10 ⎠
Δ
2Δ
(15)
r ( u +Δ)
=
Δ
⎛
2Δ
⎞
du r ( u −Δ) 2Δ
=
1 2 πΔ +
=
⎛ ⎝
(
)
2r 3 3u2Δ + Δ3
−
3 ( 2 Δ)3
(
) + ⋯⎞⎟ du
2r 5 5u4 Δ + 10u2Δ3 + Δ5 10 ( 2 Δ)5
⎟ ⎠
(17)
⎛ 2r 3 u3Δ + uΔ3 1 ⎜ 2r Δu − ⎜ 2 πΔ ⎝ 2Δ 3 ( 2 Δ)3
(
+
=
Δ
∫−Δ ⎜⎜ 22rΔΔ
(16)
(
)
)
2r 5 u5Δ + 10/3u3Δ3 + uΔ5 10 ( 2 Δ)5
⎞ + ⋯⎟⎟ ⎠
Δ
−Δ
⎞ 1 ⎛ 4r 8r 3 64/3r 5 − + + ⋯⎟ ⎜ 3 5 2 π⎝ 2 ⎠ 3( 2 ) 10 ( 2 )
(18)
(19)
Appendix B. Proof of equation (9) In this appendix, we present the details to obtain Eq. (9) from Eq. (8). Let r = Δ/σ and x =
rdG 2Δ
, then
∼ =P (r ).
(20)
Hence, Eq. (9) is deduced from Eq. (8).
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i
10
D. Qian et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
References [1] M. Belkin, P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (2003) 1373–1396. [2] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002) 509–522. [3] S. Biswas, G. Aggarwal, R. Chellappa, Efficient indexing for articulation invariant shape matching and retrieval, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–8. [4] F.L. Bookstein, Principal warps thin-plate splines and the decomposition of deformations, IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989) 567–585. [5] T.M. Breuel, A comparison of search strategies for geometric branch and bound algorithms, in: Eur. Conf. Comput. Vis., Springer, 2002, pp. 837–850. [6] A.M. Bronstein, M.M. Bronstein, A.M. Bruckstein, R. Kimmel, Analysis of twodimensional non-rigid shapes, Int. J. Comput. Vis. 78 (2008) 67–88. [7] L.G. Brown, A survey of image registration techniques, ACM Comput. Surv. (CSUR) 24 (1992) 325–376. [8] H. Chui, A. Rangarajan, A new algorithm for non-rigid point matching, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., volume 2, pp. 44–51. [9] D. Conte, P. Foggia, C. Sansone, M. Vento, Thirty years of graph matching in pattern recognition, Int. J. Pattern Recogn. 18 (2004) 265–298. [10] O. Duchenne, F. Bach, I.S. Kweon, J. Ponce, A tensor-based algorithm for highorder graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011) 2383–2395. [11] A. Elad, R. Kimmel, On bending invariant signatures for surfaces, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 1285–1295. [12] M.A. Fischler, R.C. Bolles, Random sample consensus a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM 24 (1981) 381–395. [13] S. Gold, A. Rangarajan, A graduated assignment algorithm for graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 18 (1996) 377–388. [14] M. Izadi, P. Saeedi, Robust weighted graph transformation matching for rigid and nonrigid image registration, IEEE Trans. Image Process. 21 (2012) 4369–4382. [15] B. Jian, B.C. Vemuri, Robust point set registration using gaussian mixture models, IEEE Trans. Pattern Anal. Mach. Intell. 33 (2011) 1633–1645. [16] H. Jiang, M.S. Drew, Z.N. Li, Matching by linear programming and successive convexification, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 959–975. [17] M. Leordeanu, M. Hebert, R. Sukthankar, An integer projected fixed point method for graph matching and map inference, in: Proc. Conf. Neural Information Processing Systems, pp. 1114–1122. [18] H. Li, X. Huang, L. He, Object matching using a locally affine invariant and linear programming techniques, IEEE Trans. Pattern Anal. Mach. Intell. (2013) 411–424. [19] H. Ling, D.W. Jacobs, Using the inner-distance for classification of articulated shapes, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., vol. 2, pp. 719–726. [20] H. Ling, D.W. Jacobs, Shape classification using the inner-distance, IEEE Trans. Pattern Anal. Mach. Intell. 29 (2007) 286–299. [21] K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005) 1615–1630. [22] A. Myronenko, X. Song, Point set registration coherent point drift, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010) 2262–2275. [23] X. Niyogi, Locality preserving projections, in: Proc. Conf. Neural Information Processing Systems, vol. 16, p. 153. [24] G. Pan, X. Zhang, Y. Wang, Z. Hu, X. Zheng, Z. Wu, Establishing point correspondence of 3d faces via sparse facial deformable model, IEEE Trans. Image Process 22 (2013) 4170–4181. [25] R. Raguram, J.M. Frahm, M. Pollefeys, A comparative analysis of ransac techniques leading to adaptive real-time random sample consensus, in: Proceedings of Eur. Conf. Comput. Vis., Springer, 2008, pp. 500–513. [26] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323–2326. [27] H. Sahbi, L. Ballan, G. Serra, A. Del Bimbo, Context-dependent logo matching and recognition, IEEE Trans. Image Process 22 (2013) 1018–1031. [28] J.A. Schnabel, D. Rueckert, M. Quist, J.M. Blackall, A.D. Castellano-Smith, T. Hartkens, G.P. Penney, W.A. Hall, H. Liu, C.L. Truwit, et al., A generic framework for non-rigid registration based on non-uniform multi-level free-form deformations, in: Medical Image Computing and Computer-Assisted Intervention, pp. 573–581.
[29] C. Scott, R. Nowak, Robust contour matching via the order-preserving assignment problem, IEEE Trans. Image Process 15 (2006) 1831–1838. [30] J.B. Tenenbaum, V. De Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) 2319–2323. [31] L. Torresani, V. Kolmogorov, C. Rother, A dual decomposition approach to feature correspondence, IEEE Trans. Pattern Anal. Mach. Intell. (2013) 259–271. [32] Y. Tsin, T. Kanade, A correlation-based approach to robust point set registration, in: Eur. Conf. Comput. Vis., Springer, 2004, pp. 558–569. [33] Z. Zhang, Iterative point matching for registration of free-form curves and surfaces, Int. J. Comput. Vis. 13 (1994) 119–152. [34] Y. Zheng, D. Doermann, Robust point matching for nonrigid shapes by preserving local neighborhood structures, IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 643–649. [35] F. Zhou, F. De la Torre, Factorized graph matching, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 127–134. [36] F. Zhou, F. De la Torre, Deformable graph matching, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., IEEE, pp. 2922–2929.
Deheng Qian received the B.S. degree in computer science and technology from the Huazhong University of Science and Technology (HUST) in 2011 as an Outstanding Graduates. He is currently working toward the Ph.D. degree in the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. His research interests broadly include pattern recognition and machine learning.
Tianshi Chen received the B.S. degree in mathematics from the Special Class for the Gifted Young, University of Science and Technology of China (USTC), Hefei, China, in 2005, and the Ph.D. degree in computer science from the Department of Computer Science and Technology, USTC, in 2010. He is currently an Associate Professor with the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. His current research interests include computer architecture and computational intelligence, and have published more than 40 papers in the related fields.
Hong Qiao received the B. Eng. degree in hydraulics and control, the M. Eng. degree in robotics from Xian Jiaotong University, Xian, China, the M.Phil. degree in robotics control from the Industrial Control Center, University of Strathclyde, Strathclyde, U.K., and the Ph. D. degree in robotics and artifcial intelligence from De Montfort University, Leicester, U.K., in 1995. She is currently a Professor with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China. Her current research interests include information- based strategy investigation, robotics and intelligent agents, animation, machine learning, and pattern recognition. She is currently an Associate Editor of the IEEE Transactions on Cybernetics, and the IEEE Transactions on Automation Science and Engineering.
Please cite this article as: D. Qian, et al., Geodesic-like features for point matching, Neurocomputing (2016), http://dx.doi.org/10.1016/j. neucom.2016.08.092i