Pattern Recognition Letters 20 (1999) 617±634
www.elsevier.nl/locate/patrec
A boundary concavity code to support dominant point detection Terence M. Cronin
1
CECOM RDEC Intelligence and Information Warfare Directorate, AMSEL-RD-IW-TP, Building 600, Ft. Monmouth, NJ 07703-5211, USA Received 24 July 1998; received in revised form 9 February 1999
Abstract A symbolic algorithm is described to detect dominant points of a simple closed boundary. First, a concavity code is constructed from the Freeman chain code to classify the degree of concavity or convexity of boundary coordinates. Then, dominant points are extracted by discarding shallow curvature sequences of the concavity code, by appealing to a technique called error budgeting. Ó 1999 Elsevier Science B.V. All rights reserved. Keywords: Dominant points; Curve segmentation; Convexity; Shape representation; Polygonal approximation
1. Introduction In a landmark paper written shortly after the inception of information theory, experiments in visual perception indicated that most of the information in a contour is located at points of high curvature, or at points where curvature diers considerably from a straight line (Attneave, 1954). During the intervening years, much eort has gone into automating the task of extracting points of relatively high curvature, which have come to be known in the literature as dominant points. Unfortunately, there is no formal de®nition of dominant points, because there is no formal de®nition of curvature for a discrete digital boundary (although there is a de®nition for continuous curves). There are numerous scienti®c applications that
1 Tel.: 732-427-6507; e-mail:
[email protected]. army.mil
require detection of the dominant points of a digital boundary, including curve segmentation, object shape encoding, data compression and pattern classi®cation. The lack of a formal de®nition notwithstanding, methodology has been developed over the years to evaluate dominant point detection algorithms. When connected in order, the set of dominant points detected by an algorithm represents a polygonal approximation of the original boundary data. Quantitatively, an algorithm is evaluated via error analysis of the polygonal approximation. Error is computed by calculating the distance between each of the original boundary points and the nearest edge of the polygonal approximation. Both the maximal single error and the integral squared error (ISE) are computed as quantitative measures. For objectivity, the error measures may be compared against those for an optimal polygonal approximation (e.g., Perez and Vidal, 1994), or judged for overall merit (Rosin, 1997). Another
0167-8655/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 9 9 ) 0 0 0 2 5 - 2
618
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
evaluation method, rarely used because it is considered more subjective than the others, involves polling a panel of human subjects to select dominant points of a curve, and comparing the consensus to algorithmic results. The latter is the method originally used by Attneave (1954), although at that time there were no algorithms against which the results of perceptual experiments could be compared. This paper is organized as follows. Section 2 brie¯y describes the evolution of previous work in dominant point detection. Section 3 introduces a new data structure, called the boundary concavity code, developed for dominant point detection and other applications. Section 4 describes how the concavity code may be leveraged to create an error budget for discarding low curvature sequences of coordinates from a boundary. Section 5 discusses the logic ¯ow of the new algorithm Section 6 tabulates experimental results applied to three boundaries commonly used as benchmarks by the dominant point detection community. Section 7 summarizes the work and suggests directions for future research. 2. Previous work on dominant point detection This section highlights general trends in dominant point detection. Surveys that are more comprehensive may be found in (Teh and Chin, 1989; Ansari and Delp, 1991; Ray and Ray, 1992; Li, 1995; Zhu and Chirlian, 1995; Cornic, 1997). Early attempts at dominant point detection were parametric in nature. Representative of this work are Rosenfeld and Johnston (1973), who characterized curvature at each boundary point by ®nding the local maxima of the k-cosine function, and Freeman and Davis (1977), who used the boundary chain code to extract local maxima as dominant points. However, it became clear that simple parametric curvature-seeking techniques were not eective for boundaries containing similar features at dierent scale. If a parameter was set too high, then certain dominant points were overlooked; if too low, non-dominant points were admitted. The ®rst non-parametric algorithm for dominant point detection is attributed to Sankar and
Sharma (1978), who computed global curvature as a function of local curvature of a boundary point together with its predecessor and successor points. Although this algorithm was deemed more eective than parametric techniques, it was not capable of handling cases involving similar features of disparate scale. As dominant point detection technology evolved, it became clear that it was important to search, sometimes extensively, both fore and aft of a boundary point to determine relative curvature there. Teh and Chin (1989) emphasized the need to ®nd precisely a region of support for each boundary point, and that the region of support actually is more important than the determination of discrete curvature measures used by previous algorithms. The region of support as developed for the Teh±Chin algorithm was symmetric about a boundary point. Ansari and Delp (1991) and Ansari and Huang (1991) combined the concept of region of support with Gaussian ®ltering to suppress boundary noise. In the latter paper, a simple algorithm that produces no error was implemented, by discarding only points with zero local curvature. Ray and Ray (1992) extended the Teh±Chin algorithm to feature an asymmetric region of support, with a kl-cosine function tailored to the endpoints of the asymmetric interval. Pei and Lin (1992) exploited scale-space ®ltering to characterize dominant points. Held et al. (1994) hierarchically labeled a contour at various levels of curvature. Zhu and Chirlian (1995) developed the concept of critical level, to discard points from a set satisfying signed dierence criteria. Cornic (1997) searched for left and right limits of support for a boundary point, and then performed logical operations to determine how often they appear as limits for other points of the boundary; limits appearing most frequently are likely to be dominant points. It should be emphasized that the algorithms of Teh and Chin (1989), Ansari and Huang (1991), Ray and Ray (1992) and Cornic (1997) impose a constraint not adopted by the other algorithms. The constraint is that two consecutive vertices of the original polygon may not be considered dominant points. The motivation is a concern that two consecutive points must be somehow dependent upon each other and therefore indicate a region of
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
619
redundancy. This restriction frequently leads to greater integral squared error than algorithms not subscribing to the constraint. When performing error analysis of dominant point algorithms, the discrepancy must be taken into account when comparing results. 3. A new approach to boundary classi®cation to support dominant point detection In this section, a new algorithm is described to classify the points of a digital boundary, to develop a data structure useful for dominant point detection. The classi®cation algorithm labels each boundary point as being convex, concave, or neither. The data structure produced by the classi®cation algorithm is called the boundary concavity code. The algorithm is designed for closed digital boundaries. By closed, it is meant that the boundary's ®rst and last coordinates are identical, and that there are no discontinuities in the boundary. To recognize convexities and concavities, the classi®cation algorithm requires that a boundary be oriented in a clockwise direction. Clockwise orientation may be achieved with the following logic. Traverse the boundary to locate the topmost coordinate T, the rightmost coordinate R, the bottommost coordinate B and the leftmost coordinate L. Remember the order in which T, R, B and L were discovered. If the order was TRBL, RBLT, BLTR or LTRB, then the boundary is oriented clockwise. Otherwise, the boundary is counterclockwise, and its coordinates and chain code should be reversed, to achieve clockwise compliance. Note that this algorithm is purely symbolic. If one of the following two ambiguous cases is detected: {T R and B L} OR {T L and B R}, an appeal to the literature (O'Rourke, 1998) may be made for one of the older, less symbolic orientation algorithms. Once the boundary is clockwise compliant, the concavity code is constructed from the Freeman chain code (Freeman and Davis, 1977), illustrated in Fig. 1. Every coordinate of a closed boundary corresponds to a pair of adjacent chain codes. The ®rst, called the left chain code, describes direction
Fig. 1. Freeman chain code.
of a boundary coordinate from its predecessor, and the second, the right chain code, direction to its successor. Analysis of adjacent chain code elements for a clockwise oriented boundary may be used to ®nd boundary maxima and minima. A local maximum, called a convexity, is a coordinate where the boundary is convex outward with respect to the boundary interior. Similarly, a local minimum, called a concavity, is a coordinate where the boundary is concave inward. If the coordinate is neither convex nor concave, then it forms a straight angle with its predecessor and successor, and is called a run point. In summary, there are three broad classi®cation categories for a boundary point: it may be a convexity, concavity, or part of a run. At a ®ner level of detail, a convex or concave coordinate is re®ned into eight categories, represented by two subscripts. The ®rst subscript encodes the type of angle that the coordinate forms with its predecessor and successor coordinates: zero degree (z), acute (a), right (r) or obtuse (o). The second subscript encodes the direction of the coordinate's left chain code: straight (`s', for a vertical or horizontal chain code) or diagonal (`d', for a diagonal chain code). For example, a convex coordinate with diagonal left chain code is encoded xzd if it forms a zero degree angle, xad if it forms an acute angle, xrd if it forms a right angle, and xod if it forms an obtuse angle. Similarly, a concave coordinate with diagonal code is encoded, respectively, vzd , vad , vrd and vod . A run coordinate
620
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
Table 1 Local angle at coordinate, relative curvature and concavity code Angle formed by boundary coordinate with predecessor and successor
Discrete angle formed at coordinate
Relative curvature
Concavity code: left chain lies on diagonal segment
Concavity code: left chain lies on straight segment
Run (straight angle) Convex zero angle Convex acute angle Convex right angle Convex obtuse angle Concave zero angle Concave acute angle Concave right angle Concave obtuse angle
p 0 p/4 p/2 3p/4 ÿ0 ÿp/4 ÿp/2 ÿ3p/4
None Extremely high Very high High Moderate Extremely high Very high High Moderate
rd xzd xad xrd xod vzd vad vrd vod
rs xzs xas xrs xos vzs vas vrs vos
is encoded with a single subscript, as rs or rd , dependent upon straight or diagonal left chain code. The concavity code variables x, v and r are designed to suggest association, respectively, with the words convex, concave and run. The eighteen boundary behaviors, angles formed, relative curvature, and concavity codes are summarized in Table 1. Examples of the eighteen types of angles are illustrated in Fig. 2. The concavity code of a boundary coordinate as a function of its left and
right chain codes is summarized in Table 2. The left chain code appears at the leftmost column and the right chain code across the top. As an example, the chain code pair `23' translates to concavity code `vos ', indicating an obtuse concavity with straight left chain code. Some of the table entries, notably the zero angle codes, appear with an asterisk (*). Zero angle codes result from undersampling of data during analog to digital conversion, when a boundary feature having area in the real world degenerates to a linear feature in the digitized world. These cases occur when a boundary coordinate forms an angle of zero degrees with its predecessor and successor coordinates. The resultant ambiguity makes it impossible to resolve concavity or convexity unless backtracking is performed to locate a previous chain code unequal to the coordinate's left chain code. Thus, resolution of the zero degree
Table 2 Conversion of chain code pairs into concavity code for a clockwise boundary
Fig. 2. Eighteen possible vertex angles formed by clockwise digital boundary.
L/R
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7
rs xod xrs xad vzs vad vrs vod
vos rd xos xrd xas vzd vas vrd
vrs vod rs xod xrs xad vzs vad
vas vrd vos rd xos xrd xas vzd
xzs vad vrs vod rs xod xrs xad
xas xzd vas vrd vos rd xos xrd
xrs xad xzs vad vrs vod rs xod
xos xrd xas xzd vas vrd vos rd
Note: Items with an asterisk require an additional chain code to resolve.
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
621
Fig. 3. Chromosome boundary with start point circled. Fig. 4. Semicircles boundary with start point circled.
entries of Table 2 requires an additional Freeman chain code, and hence the asterisk. Chain and concavity codes are illustrated in Appendix A for some well-known boundaries used as benchmarks by the dominant point community. The boundaries are called chromosome (Fig. 3), semicircles (Fig. 4) and leaf (Fig. 5). Chain codes are cited from Ray and Ray (1992), and concavity codes are derived from Table 2. In each ®gure, the start points for the chain and concavity codes are circled. The direction of traversal is clockwise. To generate the boundary concavity code requires O[n] processing time, where n is the number of boundary coordinates. This step is accomplished with a preprocessing pass over the boundary to ensure clockwise orientation, a second pass to create the Freeman chain code, and a third pass, over the chain code, to obtain the concavity code. Each pass is achieved in linear time. 4. Using the concavity code to create an error budget To utilize the concavity code for dominant point detection, a concept called error budgeting is
introduced. Error budgeting is a technique that associates a speci®c sequence of the concavity code with the average integral squared error introduced, should it be decided to discard the sequence. Suppose that two points of a boundary tentatively are considered dominant points. Between the two points may be a sequence of other points, exhibiting a speci®c concavity code sequence. Certain sequences, speci®cally those containing obtuse angle vertices and run vertices, produce less ISE when discarded than sequences containing zero, acute or right angle vertices. This is because in general, sequences of obtuse and run vertices exhibit smaller curvature than sequences containing other types of vertices. It is desirable to associate a speci®c sequence with the ISE should it be discarded, and this is the purpose of the error budget. The error budget is sorted on ascending average ISE per non-run point discarded. It is assumed that a discarded sequence is ¯anked on each side by a dominant point, and that the ¯anking points are not discarded along with the sequence. The error budget is a dierent concept than the codebook used by the vector quantization community, for example, Choo and
622
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
Fig. 5. Leaf boundary with start point circled.
Freeman (1992). A codebook is typically sorted on the most frequently occurring instance of an event, whereas the error budget is sorted on the average ISE introduced should an event (speci®c sequence) be discarded. An error budget for sequences of length ®ve or less is shown in Table 3. Currently, the error budget is constrained to contain sequences consisting only of run vertices and alternating obtuse vertices. The constraint is intended to encourage discarding shallow curvature sequences from the boundary, and to limit the length of the table to a
manageable size. If the table were extended to accommodate every possible sequence of length ®ve or less, then an upper bound on table size is 195 2,476,099 entries. This number is derived by realizing there are 18 distinct concavity codes or a null string possible at each position of the sequence. Of course, many such permutations are not admissible; e.g., it is not possible for a straight run and diagonal run to be consecutive. To circumvent the performance issues that would be introduced by adoption of an unconstrained error budget, it was decided to limit admission only to speci®c sequences. To qualify as an entry in column 1 of Table 3, a sequence is required to pass a series of four tests: 1. The sequence must be of length ®ve or less. 2. The sequence may contain only obtuse and run vertices. 3. The sequence must contain at least one obtuse vertex. 4. If a sequence contains multiple obtuse vertices, then the vertices must alternate between convex and concave. For example, xos rd vod is a legitimate sequence, but xos rd xod is not. To explain the derivation of the errors tabulated in the error budget, consider the sequence in the ninth row of Table 3. The candidate sequence to be discarded is the four-tuple `rd vod xos rd ' (row 9, column 1), and the length of the sequence is four (column 2). Without loss of generality, the sequence may be registered at the origin of a Cartesian coordinate system, with orientation shown in Fig. 6. The endpoints (0,0) and (4,5), ¯anking the sequence, although not part of the sequence, are tentatively considered dominant points. In between are coordinates (1,1), (2,2), (2,3) and (3,4) with respective concavity codes rd , vod , xos and rd . To obtain the error measures entered in row 1, columns 3±5 of Table 3, the Euclidean distance is computed from each of the four central points of Fig. 6 to the line segment having end points (0,0) and (4,5). The points furthest from the line segment are coordinates (2,2) and (2,3), each at a distance of 0.3123. This number, the maximum error measure for the sequence, is entered in column 3. The sum of the squared distances of the four points to the line segment is 0.2439, the ISE measure entered in column 4. Column 5 contains
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
the ISE divided by the number of non-run points in the sequence (in this case two), which is 0.1220. The error measures for other rows of the table are computed similarly. Note that the table is sorted in ascending order, using ISE per non-run point (column 5) as the sort key. The motivation for sorting the table is to assure the algorithm minimizes average ISE when discarding coordinates from the boundary, since the table is processed sequentially during dominant point detection. Sequences of coordinates are discarded only if their removal produces smaller average ISE than sequences appearing later in the table. The technical soundness of the approach is corroborated by comparative analysis (Section 5), in which the new method results in less ISE than any dominant point algorithm previously developed, and compares quite favorably with optimal solutions. 5. Dominant point detection with the new algorithm With error budget in hand, it is straightforward to detect dominant points of a boundary. The set of dominant points is initialized to be the set of all non-run vertices of a boundary (which happens to be the simple solution to dominant point detection proposed by Ansari and Huang (1991)). This policy is in keeping with the spirit of Attneave (1954), who argued that dominant points exhibit relatively high curvature (conversely, run points exhibit zero local curvature). The policy is controversial because it may cause the algorithm to produce larger ISE than algorithms not subscribing to the constraint. The issue is further discussed below, where the new algorithm is compared to an unconstrained optimal algorithm. After dominant points are initialized to be the non-run vertices of a boundary, the error budget of Table 3 is processed sequentially, by searching the boundary concavity code for a sequence listed in column 1 of the table. If the sequence is found, and neither ¯anking point has yet been discarded, then the sequence is discarded from the boundary, and the number of remaining dominant points is recorded. Note that the same sequence may appear multiple times in the boundary, and dis-
623
carded multiple times, with a cumulative reduction in the number of dominant points. As the algorithm proceeds, more sequences are discarded from the boundary, with each discard producing commensurately more aggregated ISE. The fact that the error budget is sorted on average ISE per non-run point assures that aggregate ISE remains small as each row of the error budget is processed. More formally, the algorithm design is as follows: 1. Discard all points of type run from the boundary, to obtain an initial set of dominant points D0 . Initialize the error budget index i to 1. 2. Refer to row i of the error budget shown in Table 3. Exhaustively, search the boundary concavity code for the sequence contained in column 1. If the sequence is found, and ¯anking points have not been discarded on previous iterations, then discard the sequence. 3. Output the set of dominant points Di remaining after row i is processed. 4. Increment index i. If i exceeds the number of rows in the error budget, return. Otherwise, loop back to step 2. 5.1. Discussion First, observe that with the current implementation, coordinates are retained as dominant points if their concavity code is of type zero, acute or right angle. This is because the error budget currently discards only coordinates with no curvature or moderate curvature, de®ned in Table 1, respectively, as run and obtuse vertices. Furthermore, inspection of the error budget will reveal that if a sequence contains multiple obtuse vertices, then the vertices must alternate between concave and convex, with possible run vertices allowed between them. Currently, there are 75 sequences listed in the error budget. Short sequences discarded on early iterations may be discarded again during later iterations, because a longer sequence with larger integral error may overlap and subsume the shorter sequence. Future work may expand the error budget to include sequences longer than ®ve vertices. In addition, it is possible to extend the
624
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
Table 3 Error budget to discard obtuse concavity code sequences of length ®ve or less Concavity code sequence to be discarded
Number k of non-run points in sequence
Maximum error
Integral squared error (ISE)
Average ISE per non-run point (ISE/k)
Average ISE ranking (row number)
vod xos ; xod vos vod xos vod xos ; xod vos xod vos vod xos rd vod xos ; xod vos rd xod vos vos xod vos xod ; xos vod xos vod vos xod ; xos vod vos xod rs vos xod ; xos vod rs xos vod rd xod vos ; rd vod xos vod xos rd ; xod vos rd rd vod xos rd ; rd xod vos rd vos xod vos ; xos vod xos vod xos vod ; xod vos xod rd vod xos vod xos ; rd xod vos xod vos vos rd xod vos ; xos rd vod xos vod xos rd vod ; xod vos rd xod vod rs xos vod xos ; xod rs vos xod vos vos rd xod vos xod ; xos rd vod xos vod rd rd vod xos rd ; rd rd xod vos rd vod rd xos vod ; xod rd vos xod rs vos xod vos xod ; rs xos vod xos vod vos xod vos xod rs ; xos vod xos vod rs rs vos xod ; rs xos vod vos xod rs ; xos vod rs vos rd rd xod vos ; xos rd rd vod xos rd rd vod xos ; rd rd xod vos rs vos xod rs ; rs xos vod rs vos ; vod ; xos ; xod vos rd xod ; xos rd vod vod rs xos ; xod rs vos vos rd xod rs vos ; xos rd vod rs xos vod rs xos rd vod ; xod rs vos rd xod vod rs rs xos vod ; xod rs rs vos xod rs rs vos xod rs ; rs rs xos vod rs rd rd rd vod xos ; rd rd rd xod vos vos rd xod vos rd ; xos rd vod xos rd rs rs vos xod ; rs rs xos vod vos rd rd xod ; xos rd rd vod rd vod xos vod ; rd xod vos xod vod rs xos vod rs ; xod rs vos xod rs rs vos xod vos ; rs xos vod xos vod rs rs xos ; xod rs rs vos rd vod rs xos ; rd xod rs vos vod rs xos rd ; xod rs vos rd rd vod ; rd xod rd vod rs xos rd ; rd xod rs vos rd vos rd rd rd xod ; xos rd rd rd vod rs rs rs vos xod ; rs rs rs xos vod vos rd xod rs ; xos rd vod rs rs vos rd xod ; rs xos rd vod rs vos rd xod vos ; rs xos rd vod xos rd vod rs xos vod ; rd xod rs vos xod rd rd vod xos vod ; rd rd xod vos xod
2 4 4 4 2 4 2 2 2 3 3 4 3 3 4 4 2 3 4 4 2 2 3 2 2 1 2 2 3 3 3 2 2 3 2 2 3 3 3 2 2 2 1 2 2 2 2 2 3 3 3
0.2774 0.3430 0.2774 0.3714 0.3162 0.3162 0.4000 0.4000 0.3123 0.4472 0.4472 0.5547 0.5145 0.5145 0.4472 0.4472 0.3841 0.5571 0.6325 0.6325 0.4851 0.4851 0.5547 0.4685 0.3922 0.4472 0.4472 0.4472 0.4472 0.4472 0.6325 0.4932 0.5121 0.5547 0.5883 0.5145 0.6860 0.6325 0.7428 0.5571 0.6860 0.6860 0.5547 0.5547 0.5547 0.6576 0.7428 0.7428 0.8944 0.8944 0.8321
0.1538 0.2941 0.3077 0.3448 0.2000 0.4000 0.2400 0.2400 0.2439 0.4000 0.4000 0.5385 0.4412 0.4412 0.6000 0.6000 0.3115 0.5172 0.7000 0.7000 0.3529 0.3529 0.5385 0.3659 0.3846 0.2000 0.4000 0.4000 0.6000 0.6000 0.7000 0.5135 0.5082 0.7692 0.5769 0.5882 0.8824 1.0000 1.0345 0.6897 0.7353 0.7353 0.3846 0.7692 0.7692 0.8378 0.8621 0.8621 1.4000 1.4000 1.4615
0.0077 0.0735 0.0770 0.0862 0.1000 0.1000 0.1200 0.1200 0.1220 0.1333 0.1333 0.1346 0.1471 0.1471 0.1500 0.1500 0.1558 0.1724 0.1750 0.1750 0.1765 0.1765 0.1795 0.1830 0.1923 0.2000 0.2000 0.2000 0.2000 0.2000 0.2333 0.2368 0.2541 0.2640 0.2885 0.2941 0.2941 0.3333 0.3447 0.3449 0.3677 0.3765 0.3846 0.3846 0.3846 0.4190 0.4311 0.4311 0.4667 0.4667 0.4872
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
625
Table 3 (Continued) Concavity code sequence to be discarded
Number k of non-run points in sequence
Maximum error
Integral squared error (ISE)
Average ISE per non-run point (ISE/k)
Average ISE ranking (row number)
rs vos ; rs xos rs vos rd xod rs ; rs xos rd vod rs vod rs rs rs xos ; xod rs rs rs vos rd rd vod ; rd rd xod rd rd vod rs xos ; rd rd xod rs vos rs rs vos xod vos ; rs rs xos vod xos rs vos rd rd xod ; rs xos rd rd vod vod rs rs xos rd ; xod rs rs vos rd rd vod rs rs xos ; rd xod rs rs vos vos rd rd xod rs ; xos rd rd vod rs rd rd rd vod ; rd rd rd xod rs vos xod vos rd ; rs xos vod xos rd rd vod xos vod rs ; rd xod vos xod rs rs rs vos rd xod ; rs rs xos rd vod rs rs vos ; rs rs xos rd rd rd rd vod ; rd rd rd rd xod rs rs rs vos ; rs rs rs xos rs rs rs rs vos ; rs rs rs rs xos rd rd vod rs rd rd xod rs rs rs vos rd ; rs rs xos rd rd rd rd vod rs ; rd rd rd xod rs rs rs rs vos rd ; rs rs rs xos rd rs rs vos rd rd ; rs rs xos rd rd rd rd vod rs rs ; rd rd xod rs rs
1 2 2 1 2 3 2 2 2 2 1 3 3 2 1 1 1 1 1 1 1 1 1 1
0.6325 0.6325 0.6325 0.6000 0.8321 0.9487 0.8944 0.8944 0.8944 0.8944 0.6247 0.8944 0.8944 0.9487 0.7276 0.6402 0.7845 0.8220 1.0290 1.1142 1.1094 1.2649 1.3416 1.3416
0.5000 1.0000 1.0000 0.5600 1.2308 1.9000 1.4000 1.4000 1.4000 1.4000 0.7317 2.2000 2.2000 1.6000 0.8235 0.9016 1.1538 1.4865 1.9118 2.2414 2.6153 3.4000 3.8000 3.8000
0.5000 0.5000 0.5000 0.5600 0.6154 0.6333 0.7000 0.7000 0.7000 0.7000 0.7317 0.7333 0.7333 0.8000 0.8235 0.9016 1.1538 1.4865 1.9118 2.2414 2.6153 3.4000 3.8000 3.8000
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
Fig. 6. Derivation of maximum error for row 9 of Table 3.
error budget by including vertices of type zero, acute and right angle. An obvious research issue is ecient matching of an error budget sequence with the boundary concavity code. One promising strategy is to generate an index into a boundary, to precompile locations at which each of the 18 types of boundary vertices occur ± this technique facilitates ecient string registration and matching.
A detailed example is shown in Table 4, illustrating the use of the error budget to detect dominant points for the chromosome boundary. The number of dominant points is initialized to 37, the number of non-run points in the boundary. When the ®rst row of the error budget (Table 3) is processed, the sequence `xod vos ' is found at positions 35 and 36 of the boundary. The ¯anking vertices (points 34 and 37) are non-run points not discarded on previous iterations. Hence, the sequence satis®es the ¯anking criterion and is discarded, contributing 0.1538 to the total ISE, thereby reducing the number of dominant points to 35. Next, rows 2±4 of the error budget are processed, but the respective sequences are not present in the boundary. Then, three sequences listed in row 5 of the error budget are found, respectively, at boundary positions 4 and 5, positions 21 and 22, and positions 36 and 37. The ®rst two sequences satisfy the ¯anking criterion, reducing the number of dominant points, respectively, to 33 and 31. However, the third sequence does not meet the criterion,
626
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
Table 4 Audit trail of the new algorithm's extraction of dominant points for the chromosome Error budget (Table 3) row number processed
Sequence found in chromosome boundary (¯anked by non-run points)
Location of sequence in boundary
Is either ¯anking vertex discarded?
Number of new non-run vertices discarded
Number of dominant points after discard
ISE of sequence (from Table 3 look-up)
Total ISE to date (overlapped sequence ISEs are subtracted)
Start 1 5 5 5 7 7 7 8 11 13 14 16 19 21 21 22 26 26 26 27
± xod vos vos xod xos vod vos xod rd xod vos rd vod xos rd vod xos xod vos rd xod vos xod xos rd vod xos vod xos rd vod xos rd vod xos vod rs vos xod vos xod rs vos xod rs vos xod xos vod rs xos xod xos vos rd xod
± 35±36 4±5 21±22 36±37 2±4 19±21 56±58 28±30 35±37 18±21 28±31 18±22 33±37 26±28 33±35 57±59 7 39 54 9±11
± No No No Yes Yes Yes No No No Yes No No No Yes Yes Yes No No No No
± 2 2 2 0 0 0 2 2 1 0 1 2 1 0 0 0 1 1 1 2
37 35 33 31 31 31 31 29 27 26 26 25 23 22 22 22 22 21 20 19 17
± 0.1538 0.2000 0.2000 N/A N/A N/A 0.2400 0.2400 0.4000 N/A 0.4412 0.6000 0.7000 N/A N/A N/A 0.2000 0.2000 0.2000 0.4000
0.0000 0.1538 0.3538 0.5538 0.5538 0.5538 0.5538 0.7938 1.0338 1.2800 1.2800 1.4812 1.8812 2.1812 2.1812 2.1812 2.1812 2.3812 2.5812 2.7812 3.1812
since the ¯anking vertex at position 35 was previously discarded, when the ®rst row of the error budget was processed. Hence, the third sequence is not discarded at this time. Continuing in this fashion, it may be seen from Table 4 that additional sequences are located and discarded from the boundary when processing rows 7, 8, 11, 14, 16, 19, 26 and 27 of the error budget. Conversely, sequences are located but retained when processing rows 7, 13, 21 and 22. Concavity code sequences are discarded and Table 4 is extended until 17 dominant points remain, although this number is arbitrary, chosen to facilitate comparison with other algorithm results, tabulated in Section 6. In general, Table 4 is extended by processing beyond row 27 of the error budget, until the sequence in the last row of the error budget is checked for inclusion in the boundary. In the next section, it will be seen that the total ISE generated by the new algorithm for the 17 dominant points is smaller than that gen-
erated by previous heuristic algorithms, and is competitive with optimal solutions. To detect dominant points using the concavity code and error budget requires O[n3 ] processing time, where n is the length of the boundary. A single pass over the boundary is required to initialize the set of dominant points, by discarding run vertices. Other passes are required to discard sequences of points according to the error budget in Table 3. Each iteration of string matching consumes O[n2 ] time. When integrated over the entire error budget, the processing time is O[n3 ]. The algorithm is symbolic. No ¯oating-point operations are required at execution time, since the error budget is computed during a preprocessing step. It is instructive to compare the computational complexity of the new approach to an algorithm producing an optimal polygonal approximation. Perez and Vidal (1994) developed a dynamic programming algorithm that ®nds the minimum integral squared error in O[n3 ] time, from a start point
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
627
given a priori. However, in general, the start point may be any of n vertices of a boundary, and hence the algorithm requires O[n4 ] time to ®nd the optimal solution. The Perez±Vidal algorithm must compute Euclidean distances during execution time, potentially requiring many ¯oating point operations. 6. Experimental results Dominant points detected by the new algorithm for the chromosome, semicircles and leaf boundaries are shown, respectively, in Figs. 7±9. To
Fig. 7. Dominant points (black) detected for chromosome boundary, M 17.
Fig. 8. Dominant points (black) detected for semicircles boundary, M 30.
Fig. 9. Dominant points (black) detected for leaf boundary, M 28.
628
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
facilitate comparative evaluation, the number of dominant points detected (respectively, 17, 30 and 28) was selected to be the same as that found by other algorithms known to have run on the same boundaries. A comparison of the new algorithm with previous work on the chromosome, semicircles and leaf boundaries is shown in Tables 5±7. For the new algorithm, the total integral squared error (rightmost column of each table) is calcu-
lated by summing up the individual ISE contributed by each sequence of Table 3 that is discarded from the boundary. To provide a measure of objectivity, the results are compared to two versions of a dynamic programming algorithm (Perez and Vidal, 1994) known to produce an optimal polygonal approximation. The ®rst optimal approach, entitled Perez±Vidal, version one, does not allow two
Table 5 Comparative results for chromosome shape, with N 60 boundary points Dominant point detection algorithms Permit zero curvature vertex to be dominant point?
Number of dominant points M
Maximum error
Author (date of publication)
Permit two consecutive vertices to be dominant points?
Integral squared error
Rosenfeld and Johnston (1973) Freeman and Davis (1977) Sankar and Sharma (1978) Teh and Chin (1989) Ansari and Huang (1991) Ansari and Huang (1991), simple Ray and Ray (1992) Zhu and Chirlian (1995) Cornic (1997) Cronin (1995) Perez and Vidal (1994), optimal version one Perez and Vidal (1994), optimal version two
Yes Yes Yes No No Yes No Yes No Yes No Yes
No No No No No No No No No No Yes Yes
8 8 12 16 16 37 18 16 17 17 17 17
1.540 1.510 2.030 0.710 2.000 0.000 0.650 0.707 0.860 0.633 0.515 0.515
21.940 22.560 28.890 5.910 20.250 0.000 4.810 4.677 5.540 3.181 3.193 3.130
Table 6 Comparative results for semicircles shape, with N 102 boundary points Dominant point detection algorithms Permit zero curvature vertex to be dominant point?
Number of dominant points M
Maximum error
Author (date of publication)
Permit two consecutive vertices to be dominant points?
Integral squared error
Rosenfeld and Johnston (1973) Freeman and Davis (1977) Sankar and Sharma (1978) Teh and Chin (1989) Ansari and Huang (1991) Ansari and Huang (1991), simple Ray and Ray (1992) Zhu and Chirlian (1995) Cornic (1997) Cronin (1995) Perez and Vidal (1994), optimal version one Perez and Vidal (1994), optimal version two
Yes Yes Yes No No Yes No Yes No Yes No Yes
No No No No No No No No No No Yes Yes
30 19 10 22 28 52 27 30 30 30 30 30
0.740 1.410 8.000 1.000 1.260 0.000 0.880 0.633 1.210 0.485 0.707 0.485
8.850 23.310 769.53 20.610 17.830 0.000 11.500 4.295 8.380 2.907 3.740 2.643
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
629
Table 7 Comparative results for leaf shape, with N 120 boundary points Dominant point detection algorithms Permit zero curvature vertex to be dominant point?
Number of dominant points
Maximum error
Author (date of publication)
Permit two consecutive vertices to be dominant points?
Integral squared error
Rosenfeld and Johnston (1973) Freeman and Davis (1977) Sankar and Sharma (1978) Teh and Chin (1989) Ansari and Huang (1991) Ansari and Huang (1991), simple Ray and Ray (1992) Zhu and Chirlian (1995) Cornic (1997) Cronin (1995) Perez and Vidal (1994), optimal version one Perez and Vidal (1994), optimal version two
Yes Yes Yes No No Yes No Yes No Yes No Yes
No No No No No No No No No No Yes Yes
17 17 20 29 30 57 32 No data 28 28 28 28
1.760 1.720 3.480 0.990 2.130 0.000 0.990 No data No data 0.743 0.743 0.743
43.42 45.27 71.15 14.960 25.570 0.000 14.180 No data 19.880 7.298 7.495 6.795
Fig. 10. Optimal solution (consecutive DPs not allowed) for chromosome, M 17.
dominant points to be consecutive. Figs. 10±12 illustrate the three solutions found by version one of the optimal algorithm. The second optimal approach, entitled Perez±Vidal, version two, is more general, in that it allows any number of dominant points to be consecutive. Figs. 13±15 illustrate the
Fig. 11. Optimal solution (consecutive DPs not allowed) for semicircles, M 30.
three solutions produced by version two of the optimal algorithm. An important dierence exists between the optimal algorithm and the new algorithm. Both implementations of the optimal algo-
630
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
Fig. 13. Optimal solution (consecutive DPs allowed) for chromosome, M 17.
Fig. 12. Optimal solution (consecutive DPs not allowed) for leaf, M 28.
rithm allow run points (having zero local curvature) to be dominant points, whereas the new method described in this paper does not. This disparity may cause the new algorithm to exhibit higher ISE than the optimal algorithm. Recall that Table 4 of the preceding section illustrates dominant point detection by the new algorithm for the chromosome boundary. Consider the 17 dominant points found by the new algorithm, illustrated in Fig. 7. The start point (point 1) is circled, and the direction of traversal
Fig. 14. Optimal solution (consecutive DPs allowed) for semicircles, M 30.
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
Fig. 15. Optimal solution (consecutive DPs allowed) for leaf, M 28.
is clockwise to point 60. Table 5 indicates that the ISE is smaller than that generated by previous algorithms for the same number of dominant points, and is frequently smaller than previous algorithms having the advantage of more dominant points. The total ISE for the new algorithm is smaller than that produced by the optimal solution not allowing two consecutive vertices to be dominant points, but is larger than the optimal solution allowing consecutive dominant points. The reason that the new algorithm's
631
ISE is larger than that produced by version two of the optimal algorithm is that the latter allows run vertices (vertices 16 and 33) to be dominant points. For the semicircles boundary, the ISE is smaller than previous algorithms using the same number of dominant points. It is also smaller than the optimal solution for algorithms not allowing two consecutive vertices to be dominant points. Again, version two of the optimal algorithm exhibits smaller ISE than the new algorithm, because it permits run vertices (points 47 and 57) to be dominant points. For the leaf boundary, the algorithm produces smaller ISE than that produced by all previous algorithms except the optimal algorithm allowing two consecutive vertices to be dominant points. Of the three sample boundaries, the leaf shape is the only one to exhibit a zero angle point, namely vertex number 34 (located at the tip of the leftmost of the top three lobes). The leaf also exhibits three coordinates that are visited more than once: vertex number 33 has the same coordinates as vertex 35, vertex 93 as vertex 114 and vertex 94 as vertex 113. Although these pairs share identical coordinates, the respective chain and concavity codes are different. It is instructive to note that adoption of constraints by dominant point algorithms may lead to larger ISE than algorithms that do not adopt the constraints. First, as noted by Cornic (1997), the policy of not allowing two dominant points to be consecutive may contribute to larger ISE. In all three boundary results, it is apparent that this constraint adversely aects version one of the optimal algorithm when compared to version two. Second, the policy of not allowing run points (with local curvature of zero) to be dominant points may contribute to larger ISE. To wit, in the three boundary examples, this constraint adversely affects the new algorithm when compared to version two of the optimal algorithm, but not when compared to version one. Given these discrepancies, there may be evidence to pose as a conjecture that on the average, adoption of the ®rst constraint may more adversely aect ISE than adoption of the second constraint. This is an issue for future research.
632
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
6.1. Application of the concavity code to a problem in terrain classi®cation Although the concavity code is useful for dominant point detection, the primary motivation for developing the code was not boundary compression. Instead, it was desired to solve a problem in terrain classi®cation, using contour maps
(Cronin, 1995). Given a contour representing a cross section of a mountain, the problem was to segment the contour into convex sections and concave sections, respectively corresponding to spurs and draws in the terrain. To solve the problem, the boundary concavity codes for a set of topographical contours were generated and syntactically parsed to locate non-trivial spurs
Fig. 16. Terrain classi®cation from concavity code (spurs white and draws black).
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
and draws. An example output is shown in Fig. 16. 7. Conclusion A symbolic algorithm has been introduced to detect dominant points of a closed digital boundary. For boundaries used as benchmarks by the dominant point community, the algorithm generates solutions with smaller integral squared error than solutions produced by previous algorithms. The new algorithm exploits two data structures called the boundary concavity code and the error budget. The boundary concavity code classi®es every coordinate of a boundary as a convexity, concavity or run point. The error budget is used to discard shallow curvature sequences of vertices
633
from the boundary. In addition to discarding run points, which are redundant in an informationtheoretic sense, the new algorithm discards sequences of obtuse vertices. An experimental comparison with other heuristic methods indicates that the symbolic technique produces smaller integral error, while avoiding numeric computations. When compared to optimal methods, the new algorithm exhibits smaller computational complexity and produces favorable ISE, commensurate with the constraint of not allowing points with zero local curvature to be dominant points. In addition to the data compression application inherent to dominant point technology, the concavity code has been applied to a terrain classi®cation application. For a topographical contour, concavities correspond to ravines or draws, and convexities to ridges or spurs. Other
Table 8 Chain and concavity codes for the three boundaries Chain code for the chromosome boundary 55454 32011 60010 10765
01111 55455
Concavity code for the chromosome boundary xos xod xrs vos rd xod vos rd rd rd vas rd xod vos xod xod vrs rs vos xod vos xod xos xod xos rd rd xod vos rd
12112 55555
12006 55431
65655 12122
rd vod xos rd vod rd rd rd rd rd
xos vod xrs rs xrs rd rd xod xos xrd
rs xos vod xos rd rd vod xos vod rs
Chain code for the semicircles boundary 00007 00777 56454 43436 22254 54434 11111 00100
77766 66656 23221 00
76666 55454 21322
66665 44434 22222
76766 33232 21221
Concavity code for the semicircles boundary vod rs xos rd rd rs rs rs rs xos xos vod xrs vos xod rs xos vod xos vad rs rs rs vas xod vos xod rs xos vod rd rd rd rd rd xod rs vos xod rs
rd rd rd xod rs rs rs rs xos vod xrs vos xod rs xos rs rs
vos xod rs rs rs xos rd xod vos xod vod xos vrd xod rs
rs rs rs rs xos rs rs rs xos vod rs rs rs rs rs
vrd xod vos xod rs xos rd xod vos xod rs xos vod rs xos
Chain code for the leaf boundary 33333 32307 22267 77222 00100 56656 67666 66666
00003 12766 55001 64222
32323 61111 10665 22222
07000 16665 65655 22232
03323 66550 55566 24434
Concavity code for the leaf boundary rd xod vos xad xos xos rd rd rd rd xod rs rs xzs vos rd rd vad rs rs rs rs vos xod rs xas vod rs xos vod rs vos xod rs rs rs rs rs rs rs
vod rs rs rs vas xos vod xas xod rs xos rd vad rs vos rs xrs xrs rs rs
rd xod vos xod vos rs vas rd rd rd rd xod xrs rs xos rs rs rs rs rs
xad xos vod rs rs rd xad rs rs xos vod xos vod xos rd rs rs rs vos xod
rs vas rd xod vos vod rs xos rd vad rd rd rd vod rs rs xrs rs xos vod
634
T.M. Cronin / Pattern Recognition Letters 20 (1999) 617±634
researchers working in the area of dominant point detection may ®nd the boundary concavity code and error budget to be useful for the development of asymmetric support regions. Acknowledgements Thanks to an anonymous referee for suggesting a comparison of new algorithm to the optimal solution based on the Perez±Vidal dynamic programming algorithm. Thanks to a second referee for requesting additional material explaining derivation of the error budget and technical validity of the new algorithm. Finally, grateful acknowledgment is paid to Paul Rosin, who provided an implementation of the Perez±Vidal algorithm. Appendix A. Chain and concavity codes for the three boundaries Table 8 presents chain and concavity codes for the three boundaries. References Ansari, N., Delp, E., 1991. On detecting dominant points. Pattern Recognition 24 (5), 441±451. Ansari, N., Huang, K., 1991. Non-parametric dominant point detection. Pattern Recognition 24 (9), 849±862. Attneave, F., 1954. Some informational aspects of visual perception. Psychological Review 61 (3), 183±193. Choo, C., Freeman H., 1992. An ecient technique for compressing chain-coded line drawing images. In: Confer-
ence Record of the 26th Asilomar Conference on Signals, Systems and Computers, pp. 717±720. Cornic, P., 1997. Another look at the dominant point detection of digital curves. Pattern Recognition Letters 18 (1), 13±25. Cronin, T., 1995. Automated reasoning with contour maps. Computers & Geosciences 21 (5), 609±618. Freeman, H., Davis, L., 1977. A corner-®nding algorithm for chain-coded curves. IEEE Transactions on Computers 26, 297±303. Held, A., Abe, K., Arcelli, C., 1994. Towards a hierarchical contour description via dominant point detection. IEEE Transactions on Systems, Man, and Cybernetics 24 (6), 942±949. Li, Z., 1995. An examination of algorithms for detection of critical points on digital cartographic lines. The Cartographic Journal 32 (12), 121±125. O'Rourke, J., 1998. Private communication. Pei, S., Lin, C., 1992. The detection of dominant points on digital curves by scale-space ®ltering. Pattern Recognition 25, 1307±1314. Perez, J.-C., Vidal, E., 1994. Optimum polygonal approximation of digitized curves. Pattern Recognition Letters 15 (8), 743±750. Ray, B., Ray, K., 1992. An algorithm for detection of dominant points and polygonal approximation of digitized curves. Pattern Recognition Letters 13 (12), 849±856. Rosenfeld, A., Johnston, E., 1973. Angle detection on digital curves. IEEE Transactions on Computers 22, 875±878. Rosin, P.L., 1997. Techniques for assessing polygonal approximations of curves. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (6), 659±666. Sankar, P., Sharma, C., 1978. A parallel procedure for the detection of dominant points on a digital curve. Computer Graphics and Image Processing 7, 403±412. Teh, C., Chin, R., 1989. On the detection of dominant points on digital curves. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (8), 859±872. Zhu, P., Chirlian, P., 1995. On critical point detection of digital shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (8), 737±748.