Pattern Recognition 38 (2005) 1079 – 1085 www.elsevier.com/locate/patcog
A bottom-up algorithm for finding principal curves with applications to image skeletonization Xiabi Liu, Yunde Jia∗ Department of Computer Science and Engineering, Beijing Institute of Technology, Beijing 100081, PR China Received 29 April 2004; received in revised form 29 November 2004; accepted 29 November 2004
Abstract This paper proposes a new method for finding principal curves from data sets. Motivated by solving the problem of highly curved and self-intersecting curves, we present a bottom-up strategy to construct a graph called a principal graph for representing a principal curve. The method initializes a set of vertices based on principal oriented points introduced by Delicado, and then constructs the principal graph from these vertices through a two-layer iteration process. In inner iteration, the kernel smoother is used to smooth the positions of the vertices. In outer iteration, the principal graph is spanned by minimum spanning tree and is modified by detecting closed regions and intersectional regions, and then, new vertices are inserted into some edges in the principal graph. We tested the algorithm on simulated data sets and applied it to image skeletonization. Experimental results show the effectiveness of the proposed algorithm. 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Principal curves; Principal oriented points; Image skeletonization; Kernel smoother; Minimum spanning tree
1. Introduction Principal component analysis is widely used in dimension reduction, and feature extraction. Principal curves are nonlinear generalizations of principal components and have received significant attention since their introduction by Hastie and Stuetzle [1]. Considerable work has been reported about applications of principal curves, such as, shape detection and object identification [2,3], gradient analysis in ecology [4], image skeletonization [5–7], feature extraction and pattern classification [8,9], speech recognition [10], and forecasting [11]. There have been several different definitions of principal curves. The earliest one by Hastie and Stuetzle [1] ∗ Corresponding author. Tel.: +86 10 689 40955; fax: +86 10 689 45011. E-mail addresses:
[email protected] (X. Liu),
[email protected] (Y. Jia).
emphasizes the self-consistency property of principal curves, which means that each point of a principal curve is the average of all data projecting there. Tibshirani [12] gave his definition in terms of the mixture probability model in which a distribution is decomposed into a latent variable distribution on a curve and a conditional distribution given the latent variable value. Kégl et al. [6] defined a principal curve as a curve which minimizes the expected squared distance from data to their projection on the curve over a class of curves with bounded length. Sandilya and Kulkarni [13] provided a similar definition, but they constrained total turn instead of length. Delicado [14] introduced the notion of principal oriented points (POPs) and made principal curves visit only POPs. Based on these definitions, a few methods for finding principal curves from data sets have been proposed. Hastie and Stuetzle [1] proposed an alternation between projecting data onto the curve and estimating conditional expectations on projectors by the scatter smoother or the spline smoother. Banfield and Raftery [2] modified the Hastie–Stuetzle
0031-3203/$30.00 䉷 2005 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2004.11.016
1080
X. Liu, Y. Jia / Pattern Recognition 38 (2005) 1079 – 1085
method and used the projection residual of the data, instead of the data themselves, to estimate conditional expectations for reducing both bias and variance. Tibshirani [12] used the EM algorithm to maximize the log-likelihood of the observation implied by his mixture model under the Gaussian assumption. Verbeek et al. [15] proposed a k-segments algorithm which incrementally combines local line segments into the polygonal line to achieve an objective similar to Tibshirani’s. Kégl et al. [6] presented the polygonal line algorithm which starts with an initial polygonal line, adds a new vertex to the polygonal line at each iteration, and updates the positions of all vertices so that the value of a penalized distance function is minimized. Singh et al. [5] used the batch formulation of the self-organizing mapping (SOM) to obtain principal curves. Delicado [14] found the principal oriented points one by one and orderly linked them to estimate principal curves. Other related techniques, such as the generative topographic mapping and the growing cell structures, can also be used to find approximations to principal curves [15]. For highly curved or self-intersecting curves such as spiral-shaped curves [6,15], existing methods did not work well. Verbeek et al. [15] attempted to solve this problem by combining line segments which were optimized to minimize the total squared distance of all points to their closest segments into a polygonal line. The first principal component of all data is often used as the initial estimation of the principal curve when lacking the prior knowledge. Unfortunately it is a bad initialization for a highly curved or self-intersecting curve. So it is necessary to consider the local feature of a principal curve from the beginning. In this paper, we present a bottom-up strategy to construct a graph (called a principal graph similar to that at Kégl et al. [6]) for representing a principal curve. Instead of starting with a simple topology such as the first principal component and then increasing its complexity iteratively, we directly span the sufficient complex topology and then refine it iteratively. In our algorithm, POPs in local areas are founded as initial candidate points on a principal curve; we then take these points as a set of vertices, and perform a two-layer iteration process from it to construct a principal graph. This paper is organized as follows. Section 2 introduces the definitions of principal curves and principal oriented points. Section 3 describes the bottom-up algorithm in detail. Section 4 discusses the test results on simulated data sets and applications to image skeletonization. We conclude the paper with a discussion in Section 5.
Projection Point Principal Direction Hyperplane
Principal Curve Principal Oriented Point Data Point
Fig. 1. Illustration of principal curves and principal oriented points.
defined as
f (X) = sup : X − f () = inf X − f () .
A principal curve always has the self-consistency property [1]. Let X denote a random vector in R d , and f () denote a smooth (infinitely differentiable) curve in R d parameterized by ∈ R 1 . The projection index f : R d → R 1 is
(1)
The curve f () is self-consistent if f () = E(X|f (X) = )
(2)
for almost all . Intuitively, the self-consistency means that a principal curve passes through the “middle” of a distribution and each point of it is the average (under the distribution) of all points that project there, as illustrated in Fig. 1. According to the self-consistency mentioned above, we cannot know whether a point is self-consistent unless we know the whole curve to which this point belongs. Delicado [14] discussed the self-consistency of a single point and established the definition of principal oriented points (POPs) based on the property of the first principal component for normal distribution that can be stated as the projection of the normal random variable onto the hyperplane orthogonal to the first principal component has the lowest total variance among all the projected variable onto any hyperplane. Let b ∈ S d−1 = {w ∈ R d |w = 1}, H (X, b) be hyperplane orthogonal to b passing through X : H (X, b) = {Y ∈ R d |(Y − X)t b = 0}, u(X, b) and (X, b) be the conditional expectation and the total variance of random variables on H (X, b), respectively: u(X, b) = E(Y |Y ∈ H (X, b)), and (X, b) = T V (Y |Y ∈ H (X, b)). The b for achieving the infimum of (X, b) is defined as the principal direction of X and denoted by b∗ (X): b∗ (X) = arg minb∈S d−1 (X, b). The corresponding u(X, b) is u∗ (X) = u(X, b∗ (X)).
2. Principal curves and principal oriented points
(3)
Fixed points of u∗ (X) are defined as principal oriented points which are denoted by (X): (X) = {Y ∈ R d |Y ∈ u∗ (X)}, as shown in Fig. 1. Delicado [14,16] proposed an algorithm to find a corresponding POP starting with an arbitrary point in the data set. Let u˜ ∗ (X) be the sample version of the function u∗ (X) in the formulation (3). X0 be an arbitrary point in the data set.
X. Liu, Y. Jia / Pattern Recognition 38 (2005) 1079 – 1085
Starting from X0 , u˜ ∗ (X) is iterated and Xk , k 1 is defined as Xk = u˜ ∗ (Xk−1 ) until convergence or a prefixed maximum number of iterations is reached. If convergence is attained, the last Xk is accepted as a POP.
1081
START
Find an initial set of vertices
3. The bottom-up algorithm In order to find a principal curve from a set of data points Xn ={x1 , . . . , xn }, we firstly find POPs from Xn as the set of initial candidate points Ym = {y1 , . . . , ym }. POPs are computed by starting with every point in Xn one by one using Delicado’s algorithm. If the Euclidean distance between the new POP and all points in Ym is larger than the bandwidth of the local area, this POP is included in Ym . Starting with Ym , a principal graph is constructed to represent the principal curve through a two-layer iteration process. In inner iteration, we use kernel smoother to smooth the positions of vertices in Ym . In outer iteration, the principal graph is spanned by computing the minimum spanning tree (MST) on Ym and is modified by detecting closed regions and intersectional regions; then new vertices are inserted into some edges in the principal graph. The flow chart of our algorithm is given in Fig. 2.
Voronoi partition
Kernel smoothing
N Convergence ?
Y
Insert new vertices
MST Computation
Closure and Intersection Detection
N Convergence ?
3.1. Vertices smoothing The vertices smoothing process includes two steps. In the first step, all data points are partitioned into m Voronoi regions V1 , . . . , Vm by finding their closest vertices in Ym Vi = x ∈ Xn i = arg min x − yj . (4)
Y END Fig. 2. The flow chart of our algorithm.
j
In the second step, every vertex in Ym is updated by its conditional expectation which is estimated by the kernel smoother n n yi = xj K(yt − yi ) K(yt − yi ) , j =1
j =1
x j ∈ Vt ,
(5)
where K(•) is a monotonically decreasing kernel function. In our work, we use the Gaussian kernel. These two steps alternate until the relative change in the average squared distance of all data points to their closest vertices in Ym is below a threshold. Let Dk2 denote the average squared distance after the kth iteration 1 Dk2 = n
n xj − yt 2 ,
j =1
xj ∈ Vt ,
(6)
the relative change in the average squared distance is DC = 2 |/D 2 . |Dk2 − Dk−1 k−1 After the inner iteration, we check the distances of vertices in Ym . If the distance between two vertices is less than
a minimum distance, one of the two vertices is selected randomly and reserved, the other is deleted. The bandwidth of the kernel smoother and the minimum distance between the two vertices are two crucial parameters in the vertices smoothing process. If they are too small, the consequent principal graph will be zig-zagging. If they are too large, the consequent principal graph cannot fit the data well. We first give them larger values and decrease them gradually as the number of outer iteration grows. In our experiments, this strategy works well. 3.2. Principal graph construction and new vertices insertion We first construct a fully connected weighted graph on Ym , in which the weight of every edge is the Euclidean distance between two of its vertices, then we obtain a basic principal graph by using Prim’s algorithm to compute the minimum spanning tree (MST) of this fully connected weighted graph. Although MST is an effective way to span complex shapes, it is unable to represent closed topological prop-
1082
X. Liu, Y. Jia / Pattern Recognition 38 (2005) 1079 – 1085
3.3. Convergence examination Let G be the principal graph, G = (V , E), where V = {v1 , . . . , vm } is the set of vertices, and E = {(vi1 , vj 1 ), . . . , (vik , vj k )} = {e1 , . . . , ek } is the set of edges. Let d 2 (xi , vl ), d 2 (xi , et ) and d 2 (xi , G) denote the squared distance from a data point xi to a vertex vl , an edge et and the graph G, respectively, d 2 (xi , vl ) = xi − vl 2 ,
(7)
2 d 2 (xi , et ) = min x − vit + (1 − ) vj t , 0 < < 1,
(8)
d 2 (xi , G) = min min d 2 (xi , vl ) , min d 2 (xi , et ) , t l
Fig. 3. Illustration of the imperfection at intersectional regions in MST.
then the average squared distance of all points to the graph G is d 2 (Xn , G) =
erty. Here we follow the work of Singh et al. [5] to detect closed regions and modify the principal graph accordingly. We obtain the Delaunay triangulation of the data set. Two vertices in the principal graphs are connected if they satisfy the following three conditions: (1) there is an edge between them in the Delaunay triangulation; (2) there is no edge between them in the principal graph; and (3) the number of edges in the resultant cycle after connecting them in the principal graph is larger than a threshold. Another imperfection in MST is that an intersectional vertex often becomes a short path as illustrated in Fig. 3. We present the following method to detect and rectify it. Let S be a short path in which the number of edges is below a threshold, ys and ye be its starting vertex and end vertex, respectively, S be the set of the vertices in S, and Ts and Te be the sets of the adjacent vertices of ys and ye , respectively. Let Ts = Ts − S , and Te = Te − S . If card(Ts ) > 1 and card(Te ) > 1, we obtain the Descartes product Ts × S × Te . For every element (ts , s , te ) in Ts × S × Te , a temporary graph is constructed by deleting edges (ts , ys ) and (te , ye ) from the principal graph and adding edges (ts , s ) and (te , s ) to the principal graph. For all these temporary graphs and the principal graph, the average squared distances of all data points to the graph using Eq. (10) (given in Section 3.3) are computed. If the minimum average squared distance of all these temporary graphs is less than that of the principal graph, the temporary graph corresponding with the minimum average squared distance is accepted as a new principal graph. After constructing the principal graph, we compute the average weight of all edges in the principal graph, and then we select midpoints of all edges whose weights are larger than the average weight as new vertices and insert them into Ym .
(9)
n 1 2 d (xi , G). n
(10)
i=1
If the relative change DC(Xn , G) = |d 2 (Xn , Gk ) − d 2 (Xn , Gk−1 )|/d 2 (Xn , Gk−1 ) in d 2 (Xn , G) is below a threshold, the algorithm is converged. 4. Experimental results We conducted two types of experiments on twodimensional data sets. The first experiment tested the capability of the proposed algorithm for finding principal curves from data distribution through simulated data sets, and the second was aimed at testing the effectiveness of the proposed algorithm in the applications of image skeletonization through images of handwritten characters and objects. 4.1. Simulated data sets We generated data sets along some curves by the commonly used additive noise model [6] X = X + e,
(11)
where X is a data set uniformly distributed on a smooth curve (called generating curve), e = (e1 , e2 ) is a bivariate additive noise which is independent of X, and X is the generated data set. We tested the proposed algorithm on various shaped curves, such as line, circle, half-circle, S-shaped, etc. In all tests, the zero mean bivariate uncorrelated Gaussian noises with various variances were used. The results are promising. In order to stress the algorithm’s effectiveness for highly curved and self-intersecting curves, we selected experimental results on the spiral-shaped curve and the rope-shaped
X. Liu, Y. Jia / Pattern Recognition 38 (2005) 1079 – 1085
1083
Data points Generating curve
Data points Generating curve
Principal curve
Principal curve
(a)
(b)
Fig. 4. Results on generating curves. (a) Spiral-shaped, E(ei2 ) = 0.16, i = 1, 2, 260 data points. (b) Rope-shaped, E(ei2 ) = 0.04, i = 1, 2, 200 data points.
curve to illustrate in Fig. 4. Kégl et al. [6] reported the gradual degradation of their polygonal line algorithm (PL algorithm) on spiral-shaped generating curves of increasing length. The PL algorithm performed well on spirals with one and half, and two loops, but failed when the number of loops attained three. The Hastie–Stuetzle algorithm behaved less satisfactorily in the same experiments. Delicado and Hueta [16] also compared the performances of their algorithm, the PL algorithm and the Hastie–Stuetzle algorithm on the spiral with two loops and concluded that their algorithm and the PL algorithm behave quite similarly, and that both perform better than the Hastie–Stuetzle algorithm. Verbeek et al. [15] illustrated the failure of different algorithms, and further demonstrated the result of their k-segment algorithm on the spiral with two and half loops. The resultant polygonal line using twelve line segments (in their experiment, twelve is the maximum number) by the k-segment algorithm recovered the basic shape of the spiral, but did not fit the curve well. In our experiment, we used the spiral with more loops than all the above experiments, and the result was almost indistinguishable from the generating spiral-shaped curve.
Fig. 5. Skeletonization results of handwritten digits.
4.2. Image skeletonization The self-consistency property of principal curves is apparently similar to the equidistance property of medial axis of shapes. If foreground pixels of a shape are represented by a two-dimensional data set, the skeleton of this shape can be approximated by the principal curve of this data set. Principal curves have been applied in image skeletonization by Singh et al. [5] and Kégl et al. [6,7]. In this experiment, we also used the proposed and unchanged algorithm to find skeletons of images. The performance of the algorithm was tested on three sets of images. The first one is the images of isolated handwritten digits from the MNIST Database which is a subset of the NIST Database [17]. The second one is the images of isolated handwritten English letters and handwritten Chinese characters captured by using a graphic’s tablet. The third
Fig. 6. Skeletonization results of handwritten English letters and Chinese characters.
one is the bilevel images of objects which were transformed from original images by bilevel thresholding. Some results for these three sets of images are shown in Figs. 5–7.
1084
X. Liu, Y. Jia / Pattern Recognition 38 (2005) 1079 – 1085
Fig. 7. Skeletonization results of objects.
The PL algorithm is effective for characters not containing loops or crossing, but for those containing loops or crossing, it needs to be extended [7]. Our unchanged algorithm was found to be effective for all handwritten characters, whether it contains loops, crossing or not.
5. Discussion In this paper, principal curves are found from data sets by finding corresponding graphs called principal graphs. A set of vertices is firstly initialized based on principal oriented points, and a principal graph is subsequently constructed from these vertices, which involves the kernel smoothing, the minimum spanning tree computation, closed regions and intersectional regions modification, and new vertices insertion. The final objective of the algorithm is to minimize the average squared distance of all data points to the principal graph, at the same time making the result smooth. This goal is achieved by alternating between the vertices smoothing and the graph construction iteratively. Now we cannot prove that the procedure converges, or guarantee that the criterion decreases in each step. However, we have not observed the convergence problem in extensive tests. Furthermore, it is possible to substitute this procedure by other objective optimization methods, such as some objective function. Although the principal graph itself is not a curve, we can transform it into the corresponding curve if necessary by searching the Hamiltonian path or the Hamiltonian cycle in the principal graph. In this way, we can also smooth the result further by the spline smoother. The main difference between the proposed method and other methods is that we use a bottom-up strategy. Instead
of starting from a simple topology such as the first principal component used in existing algorithms and then increasing its complexity iteratively, we directly span the sufficient complex topology and then refine it iteratively. The advantage of this strategy is that the local feature is regarded from the beginning, and the global information is also involved in the next steps, so that the highly curved and self-intersecting curves can be produced properly. This is also the initiative to suggest this bottom-up strategy, and experimental results validated its effectiveness. We used the POPs computation proposed by Delicado, but our algorithm for finding principal curves is totally different from that of Delicado. In Delicado’s algorithm, a principal curve is formed by connecting POPs orderly. These POPs are computed one by one and are unchangeable afterwards. This means that the global self-consistency property of the resultant principal curve is ignored in a way. So Delicado’s algorithm is effective in many cases, but it performed unsatisfactorily in some situations, such as at the intersectional regions of self-intersecting curves. It should be noted that our algorithm is modular, and the vertices initialization is not just processed by POPs computation, but by other routines. For example, the traditional thinning methods can be used to obtain the initial set of vertices in applications of image skeletonization.
6. Concluding summary Principal curves are nonlinear generalizations of principal components. A few methods for finding principal curves from data sets have been proposed in past works, but these methods did not work well for highly curved and self-
X. Liu, Y. Jia / Pattern Recognition 38 (2005) 1079 – 1085
intersecting curves. In this paper, we present a bottom-up algorithm to tackle this problem. In our algorithm, principal curves are found from data sets by finding corresponding graphs called principal graphs. We first initialize a set of vertices based on principal oriented points, and then construct the principal graph from these vertices through a two-layer iteration process. In inner iteration, the kernel smoother is used to smooth the positions of the vertices. In outer iteration, the principal graph is spanned by minimum spanning tree and is modified by detecting closed regions and intersectional regions, and then, new vertices are inserted into some edges in the principal graph. The proposed algorithm has been tested on simulated data sets which are generated along smooth curves by the additive noise model, and also directly applied to image skeletonization and tested on images of handwritten characters and objects. Experimental results show the effectiveness of the proposed algorithm.
Acknowledgements This work was partially supported by Grant no. (60473049) from the Chinese National Science Foundation. The authors thank Pedro Delicado for offering his source code of finding POPs.
References [1] T. Hastie, W. Stuetzle, Principal curves, J. Am. Stat. Assoc. 84 (1989) 502–516. [2] J.D. Banfield, A.E. Raftery, Ice floe identification in satellite images using mathematical morphology and clustering about principal curves, J. Am. Stat. Assoc. 87 (1992) 7–16. [3] D.C. Stanford, A.E. Raftery, Finding curvilinear features in spatial point patterns: principal curve clustering with noise, IEEE Trans. Pattern Anal. Mach. Intell. 22 (6) (2000) 601–609.
1085
[4] G. De’ath, Principal curves: a new technique for indirect and direct gradient analysis, Ecology 80 (7) (1999) 2237–2253. [5] R. Singh, V. Cherkassky, N. Papanikolopoulos, Self-organizing maps for the skeletonization of sparse shapes, IEEE Trans. Neural Networks 11 (1) (2000) 241–248. [6] B. Kégl, A. Krzyzak, T. Linder, K. Zeger, Learning and design of principal curves, IEEE Trans. Pattern Anal. Mach. Intell. 22 (3) (2000) 281–297. [7] B. Kégl, A. Krzyzak, Piecewise linear skeletonization using principal curves, IEEE Trans. Pattern Anal. Mach. Intell. 24 (1) (2002) 59–74. [8] K.Y. Chang, J. Ghosh, Principal curves classifier—a nonlinear approach to pattern recognition, Proceedings of the IEEE International Joint Conference on Neural Networks, Anchorage, Alaska, USA, May 4–9, 1998, pp. 695–700. [9] K.Y. Chang, J. Ghosh, Principal curves for nonlinear feature extraction and classification, Proceedings of SPIE: Applications of Artificial Neural Networks in Image Processing III, San Jose, CA, USA, January 26 1998, pp. 120–129. [10] K. Reinhard, M. Niranjan, Parametric subspace modeling of speech transitions, Speech Commun. 27 (1999) 19–42. [11] Y. Han, E. Corchado, C. Fyfe, Forecasting using twinned principal curves and twinned self-organising maps, Neurocomputing 57 (2004) 37–47. [12] R. Tibshirani, Principal curves revisited, Stat. Comput. 2 (1992) 183–190. [13] S. Sandilya, S.R. Kulkarni, Principal curves with bounded turn, IEEE Trans. Inform. Theory 48 (10) (2002) 2789–2793. [14] P. Delicado, Another look at principal curves and surfaces, J. Multivariate Anal. 77 (2001) 84–116. [15] J.J. Verbeek, N. Vlassis, B. Kröse, A k-segments algorithm for finding principal curves, Pattern Recognition Lett. 23 (2002) 1009–1017. [16] P. Delicado, M. Huetra, Principal curves of oriented points: theoretical and computational improvements, Comput. Stat. 18 (2) (2003) 293–315. [17] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324.
About the Author—XIABI LIU received his bachelor’s degree in computer application from Huazhong University of Science and Technology in 1998. He is currently a Ph.D. student in Beijing Institute of Technology. His main research interests include handwriting recognition, computer vision and artificial intelligence. About the Author—Dr. YUNDE JIA is a professor in the department of computer science, currently a vice chair of the school of information science and technology at Beijing Institute of Technology. He earned his bachelor’s, master’s and Ph.D. degrees in mechatronics from Beijing Institute of Technology in 1983, 1986 and 2000, respectively. He founded the vision and intelligent systems lab in 1997 at the university just after working as a visiting researcher in Carnegie Mellon University for two years. His research interests include computer vision, media computing, bioinformatics, and multi-modal human computer interaction.