June 1995
ELSEVIER
Pattern Recognition Letters 16 (1995) 557-563
An algorithm for polygonal approximation based on iterative point elimination * Arie Pikaz ap*y1, Its’hak Dinstein b92 ’ Computer Science Department, Tel Aviv University, Ramat Aviv, Tel Aviv, 69978,Israel b Electrical and Computer Engineering, Ben Gurion University, Beer Sheva, 84105, Israel Received 15 September 1993; revised 3 January 1995
Abstract
A simple and fast algorithm for polygonal approximation of digital curves is proposed. The algorithm is based on a greedy iterative elimination of a point with the currently minimal error value. The error criterion is defined such that the elimination of a curve point requires the update of the error associated only with its two neighbors. The use of a heap data-structure yields a worst case complexity of dn log n). The algorithm is independent of the starting point. Keywords: Pattern recognition; Shape analysis; Digital curves; Polygonal approximation;Heap data structure
1. Introduction
Boundary curves play an important role in shape representation and analysis. Contours are digital curves that can be represented in a number of ways. Polygonal approximation is a simple and convenient way for representation of digital curves. Its advantages over other curve representation schemes like higher-order splines and other timctional representations, are its computational simplicity, coding efficiency, good preservation of local properties, and the low complexity of feature extraction. * This work was supported in part by the Paul Ivaaier Center for Robotics sad Productioa Automation, Bea Gurion University of the Negev, Beer Sheva, 84105, Israel. Corresponding author. hail:
[email protected] 1 Arie Pi was with the Computer Science Department, Ben Gurioa University of the Negev. * Email:
[email protected] l
One aspect of polygonal approximation of boundary curves deals with the preservation of shape perception. Attneave (1954) showed that comers of boundary curves contain most of the information of the shape. A polygon constructed by connecting neighboring comer points with straight lines usually enables the recognition of the shape by a human observer. This claim motivated a number of comer detection algorithms. (Teh and Chin, 1989) contains a good review of such algorithms. Another aspect of polygonal approximation of boundary curves deals with approximation theory. The polygonal approximation error is measured by the mean square error (E,) or by the uniform error (E,). The aim of polygonal approximation is either to obtain an error smaller than a specified threshold with at minimal number of vertices, or to obtain a number of vertices less than a specified threshold with a minimal error. The vertices can be a subset of the digital curve
0167~8655/95/$09.50 Q 1995 Blsevier Science B.V. All rights reserved sSDIO167-8655(95)00008-9
558
A. Ptkaz, I. Dinstein/Pattern
Recognition Letters 16 (1995) 557-563
points, or they can be points that do not belong to the curve. There are three basic algorithmic approaches to polygonal approximation of a digital curve yielding polygonal vertices which are subsets of the curve points. The approaches are Merge, Split, and Split and Merge (see Pavlidis, 1980, Section 7.7). It stems from the fact that polygonal approximation is actually a segmentation of the curve points. The simple Merge algorithms are those based on a linear scan of the digital curve, where at each curve point a decision is made either to merge the point with the current straight line or to start a new line and define the previous point as a vertex. In that category reside the algorithms of Sklansky et al. (19721, which find a minimal perimeter polygon, the more recent version of Sklansky and Gonzales (19801, the algorithm of Tomek (19741, of Williams (19781, which is based on the cone-intersection method (as Sklansky and Gonzales), and of Kurozumi and Davis (1982) that find a mimnax approximation. The main disadvantage of the linear scan approach is that it does not take global considerations into account. It also suffers from overshooting. Among the more advanced Merge algorithms that use global considerations are the one by Montanari (1970) that iteratively reduces the number of vertices and converges to a local minimum, and the one by Dunham (1986) that finds a polygon with a minimal number of vertices using dynamic programming. Split algorithms are based on recursive partitioning of the curve at points which are the farthest from a given chord connecting two vertices. See for example the algorithm in (Ramer, 19721, and in (Duda and Hart, 1973, p. 338). The main disadvantage of this approach is the dependency on the starting point. It also suffers from stressing outliers. In the Split and Merge category is the algorithm by Pavlidis and Horowitz (19741, that is based on split and merge iterations leading to a local minimum number of vertices, and local improvement of the vertices locations. For reviews of these techniques see (Kurozumi and Davis, 1982; Dunham, 1986). 1.1. The proposed approach A common approach in the above algorithms is to choose curve points to be vertices of the polygonal
approximation. It can be done in the opposite direction by choosing at each stage a curve point that will not be a vertex. A reasonable choice is a point of which the elimination will cause minimal “damage”, by some given error criterion. Such a greedy approach may not result in an optimal approximation, but it will converge to a solution close to an optimum because of the small incremental steps of the algorithm. Two types of approximation errors are considered. One is the area between the polygonal approximation and the original curve. The second is the maximal distance between them. The approximation error caused by the elimination of a single point depends on that point and its two closest neighbors. The algorithm can yield a polygonal approximation with either a maximal error less than a prespecified threshold, or a prespecifted number of vertices. The computational complexity is (n + k log n), where k is the number of eliminated points, and IZ is the number of points in the original digital curve (k < n). The algorithm has an aspect of global considerations (in contrast to linear scan algorithms) and it is independent of the starting point (in contrast to the split algorithm). The internal order of point elimination in one section of the curve is independent of other sections. Global aspects actually determine the order of elimination between sections. The algorithm was first introduced in (Pikaz, 1992). 1.2. Relevance to previous work Regarding the above three schemes (Merge, Split and Split and Merge), our algorithm should be classified as a merge scheme, since at each iteration two edges are merged into one. Within this context, the algorithms of Leu and Chen (1988) and of Wu and Leou (1993) should be mentioned. Leu and Chen presented an algorithm for polygonal approximation, based on replacing pairs or triplets of adjacent edges by their corresponding chord. The replacements were held iteratively. At each iteration, all the segments (composed of pairs or triplets of consecutive edges) distant from their chords by no more than a prespecified E were replaced by their chords, under the condition that the error values corresponding to the neighbor segments are greater. The procedure is halted when there are no more candidates for replacements. Boxer et al. (1993) improved Leu and
A. Pikaz, I. Dinstein/Pattern Recognition Letters 16 (1995) 557-563
Chen’s algorithm, by enforcing the approximation to be distant from the input curve by no more than a prespecified value, according to the Hausdorff metric. They presented an example of input for which the output of Leu and Chen is arbitrarily distant (according to the Hausdorff metric) from the input curve. This technique of Leu and Chen may also be regarded as a vertices elimination procedure. Our algorithm is more flexible and simple since it handles one vertex at a time. The flexibility is incorporated by the fact that with our algorithm the number of output vertices can be controlled, as opposed to Leu and Chen’s. Wu and Leou’s technique is similar. They replaced at each stage a pair of edges with its corresponding chord. The error is measured as signed area (the sign is determined according to the desired output, i.e. bounded polygon, bounding polygon or neither), instead of maximum distance. 1.3. The structure of this paper Section 2 of this paper presents the algorithm and its computational complexity. A theoretical discussion is presented in Section 3. Section 4 consists of experimental results, and the summary and conclusions in Section 5 conclude the paper. 2. The iterative point elimination algorithm A heap data structure is used in the point elimination algorithm. A heap is a binary tree in which the key of each node is less than or equal to the keys of its sons. The heap data structure is used for the construction of priority queues, where the top element is the minimal one. A heap consisting of n nodes can be constructed in O(n) time. A “delete min” operation takes O(log n) time, and so does a key update operation. For a detailed discussion on a heap data structure and related operations, see (Aho et al., 1974, p. 87). Consider a heap in which the elements are points on a digital curve r ‘. The key associated with each heap element is the error generated by the elimination of the respective point in the polygonal approximation r’. Two possible types of errors are considered: (a> The error at point pi is the area of the triangle defined by the points pi_ 1, pi, pi+ 1. Let the points
559
be ordered counterclockwise. Then, the area of the triangle is given by
Notice that s is computed by integer arithmetic. This expression may be obtained using the Green theorem, which related line integrals to the area enclosed by the respective curve. (The expression can be generalized for the area enclosed by a simple polygon.) (b) The error at point pi is the height of the triangle defined by the points pi_ 1, pi, pi+ 1. Let the points be ordered counterclockwise. Then, the height hi of the triangle is given by h?= ~[xi(Yi+I-Yi-1)+xi+I(Yi-1-Yi)+xi-1(Yi-Yi+1)12 I
[(xi+,-x*-,)2+(Yi+l-Yi-1)2]
Here, again, the height square is obtained by integer arithmetic. The numerator is the squared area of the triangle, and the denominator is its squared base. The error computation requires a fixed number of operations per point. The elimination of a point requires the update of the error values of only its two immediate neighbors. The algorithm stopping rule is based either on the number of dropped points or on the current error-value. It should be noticed that the error-value will not necessarily increase monotonically, in particular as long as the curve has not been smoothed. At the beginning, the errors are expected to fluctuate around small values. During the iterative process several levels of error are expected, with possibly discernible jumps. A practical way to choose a threshold for the error-value is by using an initial run of the algorithm in order to choose an adequate level of error. Psuedocode and complexity Procedure IterativePointElimination (L: list of the points of the curve r”) begin BuildHeap from list L, and attach each point its error value. O(n) operations Let k be the number of vertices to be removed. {or: Let E be the error threshold.} for i: = 1 to k do (or: repeat until the error of p is greater than a given E) DeleteMin(Heap, p) O(log n> operations
A. Pihz, I. Dinstein/Pattern Recognition Letters 16 (1995) 557-563
560
P i-l
f
P.1+2
Fig. 1. A convex curve, and the triangles defining the errors associated with the elimination of point pi_ 1, before and after the elimination of point pi.
Compute the error-values of p’s two neighbors, say q and r. O(1) operations Remove p from the points list. o(1) operations UpdatecHeap, q) @log n) operations UpdatecHeap, r) O(log n) operations endfor end
3. Discussion The following proposition ensures a controlled points elimination. Proposition 1. The elimination of a point from a conue.x (concave) polygonal curve causes the increase of its neighbors error. This holds both for the area error as well as for the height error. Proof. Consider, without loss of generality, a convex polygonal curve, as shown in Fig. 1. The error generated when point pi_ 1 is eliminated is either the area or the height of the triangle pi-z, pi-r, pi* If point pi is eliminated, the possible error caused by eliminating pi_ 1 becomes the area or the height of triangle pi-Z, piel, pt+l* The base (pim2, J++~) is longer than the base (pi_ 2, pi>, and the height of triangle pi_*, pi-l, P~+~, is longer than that of triangle pi_ 2, pi_ 1, pi. A similar argument applies to the point P~+~. 0 The proposed algorithm is based on a greedy approach, and therefore it is only sub-optimal. The following proposition enlightens this point.
Proposition 2. Given as an input is a convex (concave) polygonal curve with n vertices, denoted by r. Let pi be a point with the minimal error-value (according to one of the above error criteria). Consider the set of polygonal curves obtained from r by eliminating k < n vertices. Let {Opt,(r)) be the subset which contains the polygonal approximations such that the total area (or the maximal distance) between them and the original curve is minimal. Then, there exists an Optk(r) for which one of the following holds: (a) The point pi is not included in it. (b) Both pi_ 1 and pi+ 1 are not included in it. Proof. If there is an Optk(r) in which pi is not included, we are done. Suppose then that pi is included in each and every curve of {Optk(r 1). It implies that both its neighbors, pi_ 1 and pi+ 1, are not included in any of those curves. Consider the two possibilities: 1. Both pi_ 1 and pi+I are included in Opt k( r 1. Then, the elimination of pi and re-insertion of one of the previously eliminated points does not increase the error. This contradicts the assumption. 2. Only one neighbor, say pi+ 1, is included in Optk( r >.Then, eliminating pi and re-inserting pi_ 1 does not increase the error. This contradicts the assumption. 0 The proposition can be summarized by the following recursive formula: {OptkW
f-7[{OPtk-r(r-
{Pi>))
where by {Opt,(r )) we refer to the set of all Opt@). That is, an Opt,(r) can be obtained in at least one of the following ways: eliminating k - 1 vertices from an Optk_ ,(r- {pi)) or eliminating k-2 vertices from an Opt,_Jr-{pi_l, pi+lI). Our greedy algorithm always searches for an Opt,(r) in the set IOptk_r(r{pi))}. This formula has no practical meaning since it requires an exponential complexity. The computational complexity solves the difference equation C, = C,,_ r + C,_ 2, where C,, is the required complexity for input with n vertices. This is the well-known
561
A. Pikaz, I. Dinstein / Pattern Recognition Letters 16 (1995) 557-563
Fibonacci formula, and its solution, with boundary conditions C, = 0, C, = 1, is
+E)“-(GE)“). Notice that for a simple convex closed polygon, optimal approximation of k vertices with an error criterion of area, is equivalent to a k-gon with maximal area which is bounded by the original polygon (n-gon, with n > k). 3.1. On the error criteria Two error criteria are considered for the evaluation of the polygonal approximation. The first criterion is the area between the polygonal approximation and the original curve. The second one is the maximal distance between the approximation and the source. The first criterion is more suitable when smoothing is an important objective of the approximation. Since the area of triangles defined by peakpoints and their two neighbor points is relatively small, peak-points are likely to be eliminated, by the first criterion. The second criterion is better when shape preservation is important because of its tendency to preserve corners. Its output is similar to the Split algorithm, mentioned in the introduction. For curve sections, with a constant curvature sign, the order of elimination is irrelevant. The area between the polygonal approximation and the input curve, when two of the curve points are removed, is equal to the sum of the areas generated by the removal of each one of them. This is not true for the maximal distance error criterion.
An arbitrary selection of the elimination order may result in a slightly different polygonal approximation. Consider an algorithm modification in which all the points having the same minimal error are simultaneously eliminated. The following proposition exists under this change. Proposition 3. Given a digital curve and an error threshold. At each iteration, as long as the minimal error is smaller than the threshold all the points with minimal errors are simultaneously eliminated. The obtained polygonal approximation is unique. Proof. The algorithm defines a unique operation for each iteration. Using induction on the iteration index proves the proposition. 0 The above-mentioned modification is practically unnecessary. Formal independence of the starting point does not ensure that little change in the input, caused by a noise, will not result in a significantly different approximation. It is more important to ensure that the approximation has the same geometrical meaning as the input. We will relate this problem in Section 4, with the experimental results.
4. Experimental results The key shown in Fig. 2 is used in the first experiment. The digital curve representing the key’s contour consists of 435 points. The polygonal approximation of the digital curve based on the area error contains 22 points. The elimination of any one of these 22 points would have caused an abrupt
3.2. On the uniqueness of the polygonal approximation One of the problems of iterative polygonal approximation algorithms is the dependency on the starting point selection. The proposed algorithm is based on local error considerations and therefore the dependence on the starting point selection is negligible. The order by which points are eliminated in one neighborhood is independent of other neighborhoods. There may be a problem when a number of points in the same neighborhood have the same minimal error.
Fig. 2. The key’s input and output contours, ment 1.
overlapped
- experi-
562
A. Pikaz, I. Dinstein/Pattern
Recognition Letters 16 (1995) 557-563
Fig. 3. The key’s contour - experiment 2.
change in the total error. In the figure, the input and output curves are shown overlapped. The curve from Fig. 2 was rotated by 90” and scaled by a factor of 0.5. The transformed curve was composed of 435 points, and its polygonal approximation containing 23 points is presented in Fig. 3. The total area errors are 72 for the experiment of
Fig. 2 and 15 for the experiment in Fig. 3. There is a ratio of four between the sizes of the respective digital curves, therefore the area errors are at the same level. Fig. 4 shows a synthetic digital curve generated with cubic splines, consisting of 735 points. Fig. 5 shows three polygonal approximations of the curve depicted in Fig. 4, obtained according to the maximal distance error-criterion. Each approximation contains only 10 points. In order to demonstrate the stability of the approximation a random noise with peak to peak amplitude of two pixels was added to the input curve (the noise was uniformly distributed along the normal to the curve at each point). Fig. 5(a) shows the polygonal approximations with and without added random noise to the input curve. Fig. 5(b) presents the approximation obtained after 90” rotation and scaling by a factor of 0.5. The
Fig. 4. A synthetic digital curve.
Fig. 5. (a) Polygonal approximations, overlapped, with and without noise. (b) Polygonal approximation after 90” rotation and 50% scaling of the original curve. (The approximation was inversely transformed in order to be comparable to Fig. S(a).)
A. Pikaz, I. Dinstein / Pattern Recognition Letters 16 (1995) 557-563
563
Fig. 6. Approximation of the curve from Fig. 4, with 20 vertices. The vertices are marked by circles.
approximation was inversely transformed for the representation. The three approximations preserve the high-curvature points of the digital curve. Finally, Fig. 6 shows a polygonal approximation to the curve of Fig. 4. The approximation consists of 20 points (which was the stopping rule) and it was computed using the maximal distance as an error criterion for point elimination. The polygonal approximation is very close to the digital curve even though it has only 20 vertices.
Duda, R.O. and P.E. Hart (1973). Pattern Classification and Scene Analysis. Wiley Interscience, New York. Dunham, J.G. (1986). Optimum piecewise linear approximation of planar curves. IEEE Trans. Pattern Anal. Mach. Intell. 8 cl), 67-7.5. Kurozomi, Y. and W.A. Davis (1982). Polygonal approximation by minmax method. Computer Graphics and Image Process-
5. Summary and conclusions
Pavlidis, T. and S.L. Horowitz (1974). Segmentation of plane
ing 19, 248-264. Leu, L.G. and L. Chen (1988). Polygonal approximation of 2-D shapes through boundary merging. Pattern Recognition Lett. 7, 231-238.
Montanari, U. (1970). A note on minimal length polynomial approximation to a digitized contour. Comm. ACM 13 cl), 41-47. CUNeS.
An algorithm for polygonal approximation is proposed. Instead of iteratively choosing the approximation vertices, it iteratively eliminates those points of which the elimination causes minimal “damage”. This enables the definition of a local error criterion. With a Heap data-structure this leads to a simple algorithm with worst case complexity of O(n log n). The resulting algorithm is independent of starting point, preserves local features, and retains global aspects.
References Aho,A.V., J.E. Hopcroft and J.D. Ullman (1974). The Design Analysis of Computer Algorithms.
and
Addison-Wesley, Reading,
MA. Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review 61 (3), 183-193. Boxer, L., C.-S. Chang, R. Miller and A. Rau-Chaplin (1993). Polygonal approximation by boundary reduction. Pattern Recognition Lett. 14, 111-119.
IEEE Trans. Comput. 23 (81, 860-870.
Pavlidis, T. (1980). Structural Pattern Recognition. Springer, New York. Pikaz, A. (1992). Recognition and Processing of Planar Digital Curoes, MSC. thesis, Ben Gurion University of the Negev, Beer Sheva, Israel, July 1992 (in Hebrew). Ramer, U. (1972). An iterative procedure for polygonal approximation of plane closed curves. Computer Graphics and Image Processing 1, 244-256. Sklansky, J., R.L. Chazin and B.J. Hansen (1972). Minimum-perimeter polygons of digitized silhouettes. IEEE Trans. Comput. 21 (3), 260-268.
Sklansky, J. and V. Gonzalez (1980). Fast polygonal approximation of digitized curves. Pattern Recognition 12, 327-331. Teh, C.H. and R.T. Chin (1989). On the detection of dominant points on digital curves. IEEE Trans. Pattern Anal. Mach. Intell. 11 (8), 859-872. Tomek, 1. (1974). Two algorithms for piecwise-linear continues approximation of functions of one variable. IEEE Trans. Comput. 23, 445-448.
Williams, C.M. (1978). An efficient algorithm for the piecewise linear approximation of planar curves. Computer Graphics and Image Processing
8,286-293.
Wu, J.-S. and J.-J. Leou (1993). New polygonal approximation schemes for object shape representation. Pattern Recognition 26 (41, 471-484.