Computer Vision and Image Understanding Vol. 76, No. 3, December, pp. 267–277, 1999 Article ID cviu.1999.0807, available online at http://www.idealibrary.com on
General Ribbon-Based Thinning Algorithms for Stylus-Generated Images Elyse H. Milun and Deborah K. W. Walters Department of Computer Science and Engineering, University at Buffalo, Buffalo, New York 14260
and Yiming Li Technology Group, Pershing, 19 Vreedland, Florham Park, New Jersey 07932 Received September 7, 1999; accepted September 27, 1999
Thinning algorithms for stylus-generated images are presented which are based on the general ribbon model of stylus-generated images. These algorithms have several advantages over existing thinning algorithms, including the existence of a formal specification of the desired output of a thinning algorithm; the preservation of image features which have been shown to be the most perceptually significant for the human perception of stylus-generated images; and the ability to deal easily with images which contain both stroke and blob objects. °c 1999 Academic Press
1. INTRODUCTION Hand-drawn images created with some form of stylus have long been used as a means of communication, and much of the information is derived from the path(s) used in the drawing process. The study of these images is an important part of computer vision, as there are many applications where the analysis of such images is required [7, 8, 11, 12, 22–24, 27]. Researchers have used thinning algorithms in an attempt to recover the path from a stylus-generated image [1, 2, 6, 38]. But researchers have found that the thinning results in practical applications are often so poor [13, 14] that the technique has sometimes been dropped in favor of other encoding methods [20]. One of the main problems with thinning algorithms is that there is no formal definition of what the operation is or what it should produce, though a general agreement on the requirements is discussed in [21]. Each thinning algorithm produces a slightly different thinned image. This makes comparison of the different algorithms extremely subjective because comparison is based solely on what “looks good.” Haralick proposed one means of analyzing thinning algorithms based on ribbons [15]. A second problem with existing thinning algorithms is that they do not preserve geometric information. In general, thinning algorithms have problems with the unattached ends of a
path, with regions of high curvature, and with regions where a path or a portion of a path intersects. Figure 1 shows examples of these problems. Unfortunately these are the parts of the image that convey the most perceptual information [37]. Thinning algorithms, for the most part, do not consider maintaining perceptual information, an important feature since it is not one of the stated criteria of thinning algorithms. However, the performance of many computer vision algorithms that require thinning as an early stage of processing could be improved through the use of a thinning algorithm that preserved perceptually relevant features [36]. There is a third problem which is specific to the analysis of stylus-generated images: the thinning algorithm must be able to distinguish between stroke objects (elongated parts) and blob objects (nonelongated parts). Blob objects are those objects in which the stylus has been used to indicate a region by filling it in, whereas a stroke object represents an elongated region. Since it is assumed that the information in blob objects resides in their boundaries, it is only stroke objects that need to be or should be thinned. Previous work has concentrated on decomposing a shape into such parts based on erosion and dilation [29, 31]. This paper presents another approach to overcoming these problems. 2. CRITERIA FOR THE DESIRED OUTPUT OF THINNING There are several possible criteria for the desired output of thinning algorithms. 1. Information Preservation: thinning algorithms should not lose information, especially geometric information; 2. Perceptual Validity: the results should agree with human perception, not because the goal is to model human perception, but rather because in human–computer communication via stylus-generated images, the computer interpretation should
267 1077-3142/99 $35.00 c 1999 by Academic Press Copyright ° All rights of reproduction in any form reserved.
268
MILUN, WALTERS, AND LI
FIG. 1. Sample images that exhibit (a) unattached ends of a path, (b) regions of high curvature, and (c) regions where a path or a portion of a path intersects. The areas are inside the highlighted squares.
agree with the human interpretation for accurate communication; 3. Selectability: the algorithm should be able to operate on images containing both stroke objects and blob objects and to automatically select only stroke objects for thinning; 4. Reconstructability: thinning algorithms should produce output from which the input can be reconstructed; 5. Minimality: the output should not contain arbitrary information; and 6. Formal Specification: it should be possible to formally specify the desired output of a thinning algorithm. The importance of each criterion is dependent on the particular application. For example, if the purpose of the algorithm is compression, then reconstructability would be essential, though perceptual validity may be unnecessary. Or, if the goal is human– computer communication, then reconstructability may not be important and perceptual validity would be required. The continuous thinning algorithm presented in this paper satisfies all six of these criteria. The issue of selectability for both algorithms is assumed to have been resolved prior to input to these algorithms. A discussion on separating stroke objects from blob objects is presented in [26].
some simple stroke objects, the spine can be recovered from the ribbon by using dilation’s opposite operation, erosion, denoted ª [17, 18, 25, 33]. Though this is not always the case, the eroded ribbon, defined as E = R ª G, will be used in the computation of the spine. It has been shown that algorithms based on the General Ribbon model can be used to distinguish between stroke and blob input and to remove blob input from an image [26]. Therefore, it will be assumed that the input to the thinning algorithms consists only of stroke objects. 4. CONTINUOUS THINNING ALGORITHM 4.1. Is the Spine the Desired Output of a Thinning Algorithm? Given the general ribbon model it may seem that the desired output of thinning should be the spine, as the spine is assumed to contain the information in stroke objects. For general ribbons with no regions of high curvature and no intersections, erosion alone will yield a unique spine. However, there is no unique spine when ribbons self-intersect or intersect: for a given R and a given G, there can be multiple spines that can produce R. For example, consider the ribbon shown in Fig. 2b. The figure shows three different spines, each of which could produce the ribbon when dilated with G. R and G alone do not necessarily contain enough information to compute a unique S; therefore S cannot be the desired output of a thinning algorithm. 4.2. Is the Intersection of All Possible Spines the Desired Output? Since a given ribbon could result from a number of different spines, it might be argued that the desired output of a thinning
3. GENERAL RIBBONS To satisfy the formal specification criterion, it is first necessary to be able to formally specify the input to a thinning algorithm. Since the interest here is in stylus-generated images, the first step is the development of a formal model for such images. General ribbons were developed as a model for stylus generated images [26, 37] and can be used to specify the class of input to a thinning algorithm. A general ribbon, R, is created from the dilation, denoted as ⊕, of a given spine, S by a generator, G, where R = S ⊕ G. Figure 2a shows a generator, a spine, and the resulting ribbon. General ribbons may be used to model stylus-generated images by assuming that the generator is the cross section of a stylus and that the spine is the path of the stylus. In some cases this model is an exact model, such as images generated using a computer drawing program. In other cases, the model will only be an approximation, such as images generated using a pencil on paper. The analysis of general ribbons is based on the fact that the ribbons are created via dilation of a spine with a generator. For
FIG. 2. (a) The dilation using the generator, with the spine yields the general ribbon. (b) A ribbon and a number of spines that could have been used to generate it, with the given generator. (c) Examples of possible definitions of a common spine. I contains only four points.
GENERAL RIBBON-BASED THINNING ALGORITHMS
algorithm should be what is common amongst the various spines. By the minimality criteria, the output of a thinning algorithm should not contain arbitrary information. Thus the desired output should contain information that is common to all possible spines of a given ribbon. One way to obtain the common information is to take the intersection of all possible spines of a ribbon. T DEFINITION 1. I is the intersection spine of R if I = i Si , where Si ⊕ G = R. Could this intersection spine be the desired output of a thinning algorithm? It satisfies the criterion of minimality since it only includes points which are in all possible spines. However it does not satisfy the criteria of reconstructability or perceptual validity, as illustrated in Fig. 2c. The figure shows a ribbon, the generator that was used to create it, and a possible spine, S1 , which could have generated the ribbon: note that it contains a gap that is equal to the horizontal width of the generator. Note also that it would be possible to have spines with a single gap of the same size anywhere (except at the corners) which would also generate the ribbon. Therefore the intersection spine of the ribbon consists of just the four dots as seen in I of the figure, but the four-dot spine does not agree with the human perception that the ribbon was generated by a square. In addition, it is clear that the intersection spine does not satisfy the reconstructability criterion. 4.3. Accidental Alignment and the Desired Output The problems with the intersection spine occur with ribbons that contain spines which could be interpreted as being accidentally aligned. For example, Fig. 3a shows a ribbon and three possible spines which would all yield the same ribbon with the given square generator. All but the first spine could be interpreted as being accidentally aligned in the sense that there is no indication from the boundary of the ribbon that the spines have end points in the region. Figure 3b shows ribbons formed from the spines of Fig. 3a using a circle generator, and illustrates the type of information in the ribbon boundary which indicates the presence of spine endpoints.
269
This type of accidental alignment can occur when the generator contains straight edges which are parallel to sections of a spine and is formally defined as follows. DEFINITION 2. Given a ribbon, R, B is the ribbon boundary of R if ∀b ∈ B, b is a boundary point of R. DEFINITION 3. Two spines, S1 and S2 , are said to be nonaccidentally aligned if and only if for all p1 ∈ S1 , p2 ∈ S2 , and the line L connecting p1 and p2 , if there is a line segment, L B ⊆ B, such that: 1. G( p1 ) ∩ L B 6= ∅ and G( p2 ) ∩ L B 6= ∅; 2. G( p1 ) ∪ G( p2 ) ⊇ L B ; 3. L is parallel to L B ; 4. if L ⊆ P then L ⊆ (S1 ∪ S2 ). Note that S1 and S2 can be the same spine. This definition produces spines which satisfy the human interpretation of being nonaccidentally aligned. 4.4. Desired Output of the Continuous Thinning Algorithm The desired output of the continuous general ribbon thinning algorithm will consist of the intersection of all possible nonaccidental spines for a given ribbon, and will be referred to as the common spine, C. T DEFINITION 4. C = α Sα , where Sα is a set of spines which are pairwise nonaccidently aligned. 4.5. Definition of the Potential Spine and Ribbon Boundary Points The potential spine, P, is defined as being the boundary of the eroded ribbon, E. Only the ribbon boundary, B, and the potential spine, P, are needed to compute the common spine. By defining ribbon boundary points as belonging to one of two possible classes and defining potential spine points as belonging to one of three possible classes, an algorithm for computing the common spine can be developed. The following definitions form the basis of the necessary classes: DEFINITION 5. G 0 (c) is an inverted generator if G 0 (c) is the subset of the plane formed by rotating G(c) by π radians about the origin c. DEFINITION 6. Given b ∈ B and p ∈ P, b and p are associated if (G 0 (b) ∩ P) contains p. The points in the ribbon boundary can be classified in terms of their association with potential spine points as follows: DEFINITION 7. A ribbon boundary point, b, is unique if it is associated with only one p ∈ P. DEFINITION 8. A ribbon boundary point, b, is multiple if it is associated with more than one p ∈ P. Potential spine points can be classified as belonging to one of the following three sets:
FIG. 3. Accidental alignment.
DEFINITION 9. U = { p ∈ P | p associated with at least one unique point of B}.
270
MILUN, WALTERS, AND LI
DEFINITION 10. A = { p ∈ P | p ∈ / U and p associated with at least two adjacent points in B}. DEFINITION 11. N = { p ∈ P | p ∈ / U and p associated with only nonadjacent, isolated points}. 4.6. How to Compute the Common Spine The continuous general ribbon-thinning algorithm is based on the following theorem, which states simply that the output of such an algorithm should consist of the set of unique points of P, U , and the set of adjacent points in P, A. It is assumed that the generator is convex and is constant (i.e., it does not change size or shape during the image generation process). THEOREM 1. C = U ∪ A. Proof of this theorem is constructed as follows: DEFINITION 12.
BG = {g ∈ G | g is a boundary point of G}.
LEMMA 1. Assume G is a convex generator with a piecewise smooth boundary and p1 , p2 in BG . If there is a p in line p1 p2 and p is in BG , then p1 p2 is contained in BG . LEMMA 2. Assume G is a convex generator with a piecewise smooth boundary and p1 , p2 in BG . If p1 and p2 are not on a line segment of BG , then the tangent vectors of BG at p1 and at p2 are not equal. Proof. Let L 1 and L 2 be tangent lines of BG at p1 and p2 , respectively. There are three situations concerning the relative position of L 1 and L 2 . (1) L 1 and L 2 intersect with one common point; (2) L 1 and L 2 coincide; (3) L 1 and L 2 parallel without intersection. If the situation is either (1) or (3), then Lemma 2 is true. Because G is a convex generator, the whole G should be on one side of L 1 and L 2 . If L 1 and L 2 coincide, then the line p1 p2 should be part of BG . This contradicts our assumption. h LEMMA 3. Let S be a spine of an image with boundary B. For any point p ∈ G(q) ∩ B, q ∈ P, then we have either p is an isolated point of G(q) ∩ B or there is a piece of boundary C( p) of the image containing p such that C( p) ⊂ G(q) ∩ B. Proof. Denote the boundary of G(q) by B(G(q)). Suppose Lemma 3 is not true. Then in any right (or left) neighborhood of p on B there are two points u ∈ B(G(q)), v ∈ B − B(G(q)). Hence there are two points u 1 and u 2 in B(G(q)) ∩ B such that all the points between u 1 and u 2 on B(G(q)) are not on B and the distance between u 1 and u 2 along the boundary of B(G(q)) is very small. Since the generator is convex all points inside of the image bounded by B and the piece of B(G(q)) from u 1 to u 2 cannot be generated. Therefore this Lemma is true. h LEMMA 4. Let p be an arbitrary point on P. If G( p) ∩ B contains a piece of B which is not a line segment, then p is in U . Proof. Since the generator is convex with piecewise smooth boundary, the piece of B can only be generated by G( p) from Lemma 2. Hence p is in U . h
THEOREM 2.
P = U ∪ A ∪ N.
Proof. This theorem can be derived from the combination of Lemmas 2 and 3. h THEOREM 3. U ⊂ C. Currently there is no conclusive proof for Theorem 3, yet no counterexamples have been found. THEOREM 4.
A ⊂ C.
Proof. Assume p ∈ A and line segment L B ⊂ G( p) ∩ B. Since p is not in U , if the generator is moved along L B, it is still inside the image. Hence there is a maximum line segment L P ⊂ P which contains point p and parallels to L B. For any set of nonaccidentally aligned spines S, there are two points p1 and p2 which are on the closed line of L P and are in S for any S. By the definition of nonaccidental alignment the line p1 p2 is on P. Therefore we conclude that p is in C. h THEOREM 5.
N ∩ C = ∅.
Proof. Since the potential spine P is a nonaccidentally aligned spine which generates the image, we can claim that for any point p, P − { p} is still a nonaccidentally aligned spine which reconstructs the image. This can be seen from the fact that if there are two different points p1 and p2 ( pi 6= p, i = 1, 2) and line segment L B on image boundary such that (G( p1 ) ∩ B) ∪ (G( p2 ) ∩ B) contains L B and line p1 p2 crosses point p, then p must be a point in L, which means p is not in N . h The continuous general ribbon-thinning algorithm, in conjunction with the blob–stroke segmentation algorithm, satisfies all of the criteria for thinning algorithms stated in Section 2. The results of the thinning algorithm are consistent over multiple orientations, assuming that both the ribbon and the generator are rotated by the same angle. The algorithm maintains all geometric information because it preserves all spines that are pairwise nonaccidentally aligned. Perceptual validity is maintained in the same manner. Selectability is assumed by the use of the blob– stroke segmentation algorithm. Reconstructability is maintained because the combination of the set of nonaccidentally aligned spines and the generator is all that is needed to recreate the original ribbon. While the resulting spine may not be minimal, it is the minimal spine that satisfies the other criteria. Finally, the continuous general ribbon thinning algorithm has a formal specification of the desired output. The one limitation to the general ribbon-thinning algorithm is that it is restricted to convex generators with a piecewise smooth boundary. 4.7. Discrete Implementation of Continuous Algorithm The definitions and theorems are all based on a continuous Euclidean space, while any practical implementation will have to deal with discrete space. Fortunately the mathematical operations of dilation and erosion, which play such a critical role
GENERAL RIBBON-BASED THINNING ALGORITHMS
in the algorithms, have well-understood discrete definitions and implementations. The problem occurs with the requirement of a convex generator with a piecewise smooth boundary. In the continuous space circles and ellipses are convex with a piecewise smooth boundary. In the discrete space, these shapes contain concavities. It is not possible to have a discrete shape which is convex and has a piecewise smooth boundary. Thus the discrete algorithm can only be an approximation to the continuous and thus does not necessarily satisfy all of the stated criteria. 5. DISCRETE THINNING ALGORITHM 5.1. Algorithm Description As the stylus moves along its path, only a small percentage of the stylus comes in contact with the boundary of the resulting ribbon, the exceptions being endpoints, corners, and other areas of high curvature. There are some points on the ribbon boundary that could only have been created if the stylus was positioned at a specific point. These boundary points are also generated by specific points on the boundary of the generator. It is the combination of the unique ribbon boundary point and the corresponding points on the generator that yields the most information when trying to rebuild the path of the stylus. The points that make up the set of possible spine points lie on the boundary of the eroded ribbon. The points in this set that can be considered perceptually relevant are computed by traversing from unique boundary points along the ribbon boundary while also traversing from the associated unique potential spine points along the potential spine. If an identical connected path exists, the ribbon boundary points must have been generated by the same point on the stylus. The traversed potential spine points are added to the computed spine. 5.2. Additional Definitions The following are additional definitions that are used in the development of the Discrete General Ribbon Thinning algorithm. DEFINITION 13. Given b ∈ B and p ∈ P, b and p are associated if b ∈ (( p ⊕ G) ∩ B). DEFINITION 14. A ribbon boundary point, b, is unique if it is associated with only one p ∈ P. DEFINITION 15. is defined as:
The set of unique potential spine points, U ,
U = { p ∈ P | p is associated with at least one unique point of B}. DEFINITION 16. is defined as:
The set of multiple potential spine points, M,
M = { p ∈ P | p is not associated with any unique points} and M ∩ U = ∅. DEFINITION 17. The neighborhood of a point, p, consists of the 8 points that surround p.
271
DEFINITION 18. A point p is a boundary point if p ∈ P and the neighborhood of p contains an interior point of E − P. DEFINITION 19. The set of line points, Lp, is defined as Lp = { p ∈ P | p is not a boundary point}. DEFINITION 20. A point p is a bifurcation point if p is a boundary point and the neighborhood of p contain a line point. DEFINITION 21. The set of bifurcation points, Bi, is defined as Bi = { p ∈ P | p is a bifurcation point}. DEFINITION 22. The set of edge points, Ep, is defined as Ep = {p ∈ P | p ∈ / Lp and p ∈ / Bi}. 5.3. The General Ribbon Algorithm For a given pixel p, the pixels that make up its 3 × 3 window are the neighbors of p and are labeled 1, . . . , 8, starting at the right 4-connected neighbor and then numbering counterclockwise. The pixels N 1 ( p), N 2 ( p), . . . , N 8 ( p) are the 8connected neighbors of p and are collectively denoted by N ( p). The basic thinning algorithm is as follows: 1. Compute E, P, B, Lp, Ep, Bi, and ∂G. 2. Compute the set of unique boundary points, Bu , the set of unique line points ULp , unique edge points UEp , unique biforcation points UBi , the set of multiple line points MLp , multiple edge points MEp and multiple bifurcation points MBi . 3. For each point {u | (u ∈ U ) and (u ∈ / UBi )} (a) Compute U B(u) = (u ⊕ ∂G) ∩ B (b) Mark all points G M(u) = {(gm ∈ ∂G) | ((u ⊕ gm) ∩ Bu )} (c) Mark all points M A(u) = {(ma ∈ Bu ) | (ma ∈ (u ⊕ G M))} (d) Mark all points M A(U − u) = {(ma ∈ Bu ) | (ma ∈ ((U − u) ⊕ G M))} (e) For all points b ∈ M A(u): i. For all neighbors, i = 1 . . . 8 of b, determine if N i (b) ∈ / M A(U − u) and N i (u) ∈ / Bi B such that N i (b) ∈ ii. If i. is true AddedPoints = Traverse(b, u, i, U B(u)). 4. Resulting Spine = U ∪ MLp ∪ Added Points. The Traverse algorithm is as follows: For a given boundary point location, b, its neighbor position i, and a point on the potential spine p: 1. If N i ( p) ∈ (Ep ∪ Lp ∪ Bi) add N i ( p) to AddedPoints. 2. If N i ( p) ∈ Bi, add 1 to a counter, bi counter, that keeps track of the number of points in Bi that have been traversed —If bi counter > 1 then all points added since the last Bi point are removed. —If bi counter ≤ 1 determine if ((N i ( p) ⊕ ∂G) ∩ B) − U B(u) = ∅, if false, continue, otherwise stop traversal here. 3. Set b = N i (b) and for all neighbors, i = 1 . . . 8 of b, / M AU −u . determine if N i (b) ∈ B and N i (b) ∈ 4. If 3 is true then recursively traverse from current N i (b).
272
MILUN, WALTERS, AND LI
5.4. Perceptual Validity An experiment was performed to determine the perceptual relevance of certain features of sample spines. The hypothesis of the experiment is that humans, when given a choice between two possible spines, will choose the one that exhibits simplicity and closure over the one that does not. The experiment consisted of showing subjects a number of images and then asking them which of two paths they felt was used to create the image. In some cases the choice of spine was incapable of yielding the ribbon with the given stylus. Such images were used as controls. Five subjects were used in for this experiment. The experiment consisted of showing each subject 39 images. The images were displayed in four possible orientations: the original orientation, the image rotated by 90◦ , the image rotated by 270◦ , and the image with its pixel values flipped in both the x and y directions. Each instance of the image was shown in two different positions on the screen. A total of 312 images were displayed. Each subject was seated in front of a computer using a chin rest to position the head so that the eyes were centered on the computer screen. The lighting level was fixed, to allow for dark adaptation. The subjects were shown an on-screen introduction, with a short practice session before the experiment began. After each ribbon was displayed, a masking pattern was shown to prevent the subject from retaining a retinal image of the ribbon. After the masking pattern, the pair of spines that were associated with the ribbon were displayed side by side. The spines were shown in the same orientation as the ribbon. The placement of the spines was randomized so that different instances of the same image might not have the spines positioned in the same order. The subjects had to choose the spine that they felt was the more appropriate, a two-alternative forced choice. The choice was made via the use of a mouse as a response button. The results of the study suggest that simplicity and closure, in order to create good continuation, are the two characteristics of spines that carry the most perceptual relevance. In cases where simplicity and closure were exhibited in competing choices, simplicity was chosen with a higher frequency than closure, though closure was also chosen. Two postprocessing stages were added to the general ribbon algorithm to bring its results into closer agreement with human perception by increasing simplicity and providing more closure. 5.5. Postprocessing 1 The base algorithm requires postprocessing to remove sections of the computed spine from AddedPoints. These sections are similar to sections of ∂G and satisfy the relation {∀ p ∈ section | ((( p ⊕ ∂G) ∩ B) ∩ ((ULp−sub ⊕ ∂G) ∩ B)) 6= ∅}, where the points in ULp−sub are endpoints or have a distance <3 from E − P. Figure 4 shows an example that was created with the given triangular generator and a spine that exhibits self-intersection, curvature, and corners. The resulting thinned image without
FIG. 4. A triangular generator, the spine, the resulting ribbon, the computed spine without postprocessing, and the computed spine with postprocessing. The removed section of the computed spine approximately matches the boundary of the generator.
postprocessing contains a triangular section in the area of the self-intersection. The resulting spine after postprocessing 1 has the triangular section removed. What remains is a single point, which was the apex of the triangular section. Once the points in ULp−sub have been selected, postprocessing is performed as follows: For each u ∈ ULp−sub 1. Compute C = (u ⊕ ∂G) 2. For each p ∈ AddedPoints, if ( p ⊕ ∂G) ∩ C, AddedPoints = AddedPoints − p. 5.6. Postprocessing 2 To increase the perceptual validity of the resulting spine, some line segments need to be added within the area of (Ep ∪ (E − P)). The role of the additional line segments is to connect the lines in AddedPoints that enter the area of (Ep ∪ (E − P)) to yield a spine with a better sense of closure and continuation. The additional lines also increase the reconstructability of the resulting spine. To generate the additional line segments, the line information of the current state of the spine is computed. This is performed by using Rosin and West’s [32] method of line extraction. Each line is then checked to see if it is connected to or included in another line. A line segment is considered to be extendible if it is not a point, if it is not completely included in another line, or if at least one of its endpoints is not connected to another line. The lines that are extendible are selected if they are adjacent to a section of (E − P) that has more than two line segments adjacent to it. The sections of (E − P) that have more than two line segments adjacent to them are typical of spines segments that have intersecting paths. After the lines are selected, the extension process occurs in four stages. In all stages, when a line is added to the resulting image Bresenham’s line-drawing algorithm is used to determine the placement of points on the line. The first stage looks at the selected lines and determines if there is another line, marked or unmarked, that has a similar slope where 1. the line that connects the two lines also has the same approximate slope; 2. the respective endpoints of the two lines are both unconnected; and 3. the ratio of the number of points of the connecting line that are in E to the number that are not in E is greater than 75%.
GENERAL RIBBON-BASED THINNING ALGORITHMS
273
FIG. 5. The test spines and generators for the comparison process.
Condition 3 is used because most people do not draw a line with the same point placement as Bresenham’s algorithm. If all the conditions are met, the points of the new line that are in E are added to a spare image and the respective endpoints are marked as connected. The second stage of the process looks at the selected lines that remain unconnected. From each unconnected endpoint a line is extended until it intersects another point on the current spine that does not belong to the starting line. If the ratio of the number of points follows the 75% rule, the points of the drawn line that are in E are added to the spare image from stage 1 and the respective endpoint is marked as being connected. The third stage looks at the unconnected selected lines and determines if there are two selected lines such that when they are both extended the lines will intersect. The intersection point that is closest to the appropriate endpoint is saved. The last stage looks at the lines that have an intersection point from stage 3. The lines connecting the appropriate endpoints and their respective intersection points are added to the spare image if they follow the 75% rule. When the stages are completed, the points in the spare image are added to the computed spine and a second pass is performed. The goal of the second pass is to add lines that connect isolated points to the rest of the spine. The isolated points may be the result of the first postprocessing technique from Section 5.5. The second pass begins by computing the line information of the new spine and then checking to see if the lines are connected or included in other lines. Next, the isolated points are marked and checked against all of the line segments to see if 1. the line connecting the line segment to the isolated point has the same approximate slope as the line segment and 2. the connecting line follows the 75% rule. If the conditions are met, the points of the connecting line that are in E are added to the thinned spine. 5.7. Comparison with Other Algorithms In general, existing thinning algorithms do not preserve geometric information. The algorithms have problems with un-
attached ends of a path, regions of high curvature, and regions where a path or a portion of a path intersects. A comparison was performed between the discrete general ribbon-thinning algorithm, the medial axis transform (MAT), and Arcelli’s thinning algorithm [4]. Arcelli was chosen because it is a typical sequential iterative thinning algorithm and it appears to be the best existing thinning algorithm for preserving perceptual information. MAT was chosen as an example of a nonsequential algorithm. The comparison tested each algorithm’s ability to maintain the problem areas of the spine. Figures 5a–5i shows the set of test spines that were used. Each spine was dilated by the generators shown in Figure 5(1–3) to produce a series of ribbons. Spines (a)–(g) were created to test each algorithm’s ability at the problem areas mentioned above. Spines (h) and (i) were created by someone not associated with the research. The comparison looked at the line information of the results of each of the thinning algorithms and how it compared to the line information of the original spine. The comparison considered line structure, endpoints, slope information, and the number of lines that were not in the original spine. Slope information refers to the number of slopes that were maintained and the number that were added to the thinned image. Line information refers to the number of extraneous lines that were not part of the original structure of the spine. Table 1 shows the average results over all of the ribbons for each algorithm. The results of the Arcelli and MAT algorithms are dependent on the generator that was used to create the ribbon. Though the results of the discrete general ribbon-thinning algorithm differs when different generators are used, the basic structure remains intact. When using a generator that is symmetric with respect to 90◦ rotations, such as a circle or a square, the results of Arcelli are often similar to the original spine. The main structure of the resulting MAT skeleton will also be similar to the original spine, except that extra lines are added at the corners, at areas of joining, and at the unattached ends of paths. If the generator is not symmetrical with respect to 90◦ rotations, such as a triangle, the results of Arcelli and MAT can drastically differ from the
274
MILUN, WALTERS, AND LI
TABLE 1 The Average Results of the Comparison for Each Generator: Percentage of Endpoints Maintained, Percentage of Slopes Maintained, Number of Slopes That Were Not in the Original Spine, and Number of Additional Lines Generator
% of endpts maintained
% of slopes maintained
# of slopes not in tested spines
1. 2. 3.
95.6% 96.0% 93.4%
1. 2. 3.
29.7% 29.1% 2.7%
Arcelli 96.7% 83.0% 71.6%
1. 2. 3.
31.5% 24.0% 1.7%
90.4% 80.3% 46.4%
Discrete thinning algorithm 94.2% 4 95.4% 2 95.4% 6
# off add. lines not in tested spines
4 3 4
15 21 46
12 12 38
38 51 62
58 73 70
MAT
original spine. The discrete general ribbon-thinning algorithm does not exhibit this problem. Figure 6 displays a figure with corners, an intersection, curved regions, and line ends. The Arcelli results are unable to maintain straight lines at the region corresponding to the intersection. The corners are also rounded. The MAT results are unable to maintain any line structure for the region corresponding to the intersection and the corner. Extra lines are also added in the curved region.
FIG. 7. Results of the thinning algorithms for three spines with three generators.
FIG. 6. Results of the thinning algorithms for a given spine and three generators.
The results of the General Ribbon algorithm are consistent with the structure of the original spine. Figure 7 contains three images that display accidental alignment, corners, and line ends. The first two images are identical with the exception of a line connecting the bottom section of the image. The Arcelli results are unable to maintain the corners and the general line structure of the step section of the first two images. The results for the third image are unable to maintain the two separate lines of the original spine. The same is true for the MAT results. The results of the General Ribbon algorithm are more consistent with the structure of the original spine and perform closer whenever possible. The results of the three algorithms on the images from the UB Stylus Image Test Bank, which contains arbitrary stylus generated images which can be used in testing stylus-generated image algorithms, were also compared. This test bank is part of the Web-based Image Database for Benchmarking Image Retrieval Systems [19] being developed at UB for the test of image retrieval algorithms. Figures 8–10 show the results of the three algorithms on images that do not have a high sampling rate. Each
GENERAL RIBBON-BASED THINNING ALGORITHMS
275
FIG. 8. Test image for comparison.
figure shows the spine, the ribbon, and the results of the three algorithms. In Fig. 8 the Arcelli algorithm at first glance appears to give the best result. The MAT results for this image exhibits divergence at the areas that correspond to unattached ends of paths. The MAT results also round over the corners and in addition add extra lines to maintain reconstructability, as best seen in the D in Fig. 8. The Arcelli result is cleaner in the respect that it is connected and smooth. The Arcelli results also round the corners. However, while the results of the general ribbon algorithm for these images are not as clean, they do have a greater resemblance to the actual spine. The extra lines in the “S” of SHADOW are caused by the accidental alignment of the bottom portion of the “S” with its shadow. For Fig. 9, a cube, the Arcelli algorithm is unable to maintain the corners of the cube, which is unfortunate, since these are the most perceptually relevant areas for humans [37]. The MAT algorithm is unable to maintain the structure of the cube at the corners of the cube and at areas of line intersection. The inability of these algorithms to maintain the corners and intersections can have an effect on subsequent algorithms that look for these perceptually relevant features. The general ribbon algorithm is able
to maintain the geometric information in both of these problem areas. Figure 10 depicts a pool table with two cues that intersect at acute angles with each other and with the pool table. The Arcelli and MAT algorithms are unable to maintain the line structure when the two separate pool cues cross each other and intersect the pool table. Both algorithms add additional lines that are not consistent with the original line structure of the spine. The general ribbon algorithm is able to maintain the perceptually relevant geometric information in these problem regions. Another difference between the general ribbon algorithm and other skeletonization algorithms is its ability to compute the spine for an image created with a disjoint generator. This ability is based on the use of the generator and the erosion process. The other skeletonization algorithms do not take the generator into consideration and treat the image as a single entity, which is thinned.
FIG. 9. Test image for comparison.
FIG. 10. Test image for comparison.
276
MILUN, WALTERS, AND LI
5.8. Discussion of the Discrete General Ribbon Algorithm The discrete general ribbon algorithm, in combination with the stroke/blob segmentation algorithm, satisfies most of the criteria that are described in Section 2. Geometric information is maintained in the algorithm’s postprocessing stages. Perceptual validity is maintained in the same manner. Selectability is assumed by the use of the blob–stroke segmentation algorithm. The discrete general ribbon algorithm is not guaranteed to create a reconstructible spine, though the differences are generally minimal. For Figs. 6 and 7, the original ribbons can be fully reconstructed from the thinned image and the generator. For Figs. 8–10, the Type I errors, the ratio of incorrect “on” points to “on” points in the original image, are 0.00046, 0.00016 and 0.00008 respectively. The Type II error for each of the figures is 0.00000. Type II error is defined as the ratio of incorrect “off” points to “off” points in the original image. The original ribbon and the reconstructed ribbon generally differ by only a couple of points. The differences are in part due to the use of Bresenham’s line drawing algorithm in the second postprocessing stage. Another reason for a difference is the sampling rate used during the path or image generation process. If the sampling rate is increased the reconstructability of the image may also increase. The necessity of reconstructability is dependent on the intended use of the computed spine. If the use is looking at the line structure of the spine, then reconstructability may not be required. The discrete general ribbon algorithm exhibits a side effect which limits the minimality of the thinned image. The side effect is the addition of extraneous points which can yield small spurs in the resulting spine. These points can be removed in the first postprocessing segment but at the detriment of removing points that are necessary for perceptual validity. An example of an added spur can be see in Fig. 10c, a drawing of a pool table. The added spur is on the right side of the image at the end of the lower pool cue. While the continuous general ribbon algorithm does have a formal specification of the desired output, the discrete general ribbon algorithm does not exhibit a formal specification of the desired output. 6. SIGNIFICANCE OF GENERAL RIBBON THINNING ALGORITHMS There are three specific advantages of the General Ribbon Thinning Algorithms. The first two relate to Lam et al.’s general problems with thinning algorithms: the lack of a formal specification of the correct output and, the failure to preserve the geometric properties of the input. The continuous General Ribbon Thinning Algorithm has a formal specification of the correct output and both the continuous and discrete algorithms are able to preserve the geometric properties of the input. The third advantage is the ability to handle the intersection of stroke objects with stroke objects, blob objects with blob objects, and even stroke objects with blob objects. This is accomplished by
the use of algorithms based on the General Ribbon model that can distinguish between stroke and blob input and can remove blob input from an image [26]. REFERENCES 1. I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta, Processing of binary images of handwritten text documents, Pattern Recognition 29(7), 1996, 1161–1177. 2. I. S. I. Abuhaiba, S. A. Mahmoud, and R. J. Green, Recognition of handwritten cursive arabic characters, IEEE Trans. Pattern Anal. Mach. Intell. 16(6), 1994, 664–672. 3. C. Arcelli, L. Cordella, and S. Levialdi, Parallel thinning of binary pictures, Electron. Lett. 11, 1975, 148–149. 4. C. Arcelli and G. Sanniti di Baja, On the sequential approach to medial line transformation, IEEE Trans. Systems Man Cybernet. SMC-8(2), 1978, 139–144. 5. O. Baruch, Line thinning by line following, Pattern Recognition Lett. 8, 1988, 271–276. 6. G. Boccignone, A. Chianese, L. P. Cordella, and A. Marcelli, Recovering dynamic information from static handwriting, Pattern Recognition 26(3), 1993, 409–418. 7. I. Chakravarty, A generalized line and junction labeling scheme with applications to scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 1, 1979, 202–205. 8. H. I. Choi, S. W. Choi, H. P. Moon, and N. S. Wee, New algorithm for medial axis transform of plane domain, Graph. Models Image Process. 59(6), 1997, 463–483. 9. N. Chuei, T. Y. Zhang, and C. Y. Suen, New algorithms for thinning binary images and Chinese characters, Comput. Process. Chinese Oriental Lang. 2, 1986, 169–179. 10. E. R. Davis and A. P. N. Plummer, A new method for the compression of binary picture data, in Proceedings 5th Int. Conf. Pattern Recognition, pp. 1150–1152, 1980. 11. Y. Ding and T. Y. Young, Complete shape from imperfect contour: A rulebased approach, Comput. Vision Image Understand. 70, 1998, 197–211. 12. M. Ejiri, T. Miyatake, S. Kakumoto, and H. Matsushiam, Automatic recognition of design drawings and maps, in Proceedings 7th ICPR, Vol. 2, pp. 1296–1305, 1984. 13. V. Govindaraju, Locating human faces in photographs, Int. J. Comput. Vision 19(2), 1996, 129–146. 14. V. Govindaraju and S. Srihari, Separating handwritten text from non-textual interference, in From Pixels to Features II (J. Simon and S. Impedovo, Eds.), pp. 17–28, Elsevier, Amsterdam, 1992. 15. R. M. Haralick, Performance characterization in image analysis: Thinning, a case in point, Pattern Recognition Lett. 13, 1993, 5–12. 16. C. J. Hilditch, Comparison of thinning algorithms of a parallel processor, Image Vision Comput. 1, 1983, 115–132. 17. B. K. Jang and R. T. Chin, Analysis of thinning algorithms using mathematical morphology, IEEE Trans. Pattern Anal. Mach. Intell. 12, 1990, 541–551. 18. L. Ji and J. Piper, Fast homotopy-preserving skeletons using mathematical morphology, IEEE Trans. Pattern Anal. Mach. Intell. 14, 1992, 653– 664. 19. C. Jorgensen, D. K. W. Walters, A. Zhang, and R. K. Srihari, Creating a Web-based image database for benchmarking image retrieval systems, in Human Vision and Electronic Imaging IV (B. E. Rogowitz and T. N. Pappas, Eds.), Vol. 3644, pp. 534–541, SPIE, 1999. 20. G. Kim and V. Govindaraju, A lexicon-driven approach to handwritten word recognition for real-time applications, Pattern Anal. Mech. Intell. 19(4), Apr. 1997.
GENERAL RIBBON-BASED THINNING ALGORITHMS
277
21. L. Lam, C. Y. Suen, and S. W. Lee, Thinning methodologies—A comprehensive survey, IEEE Trans. Pattern Anal. Mach. Intell. 14, 1992, 869–885.
30. A. Rosenfeld and L. S. Davis, A note on thinning, IEEE Trans. Systems Man Cybernet. 25, 1976, 226–228.
22. S. H. Lee, R. M. Haralick, and M. C. Zhang, Understanding objects with curved surfaces from a single perspective view of boundaries, Artif. Intell. 26, 1985, 145–169.
31. A. Rosenfeld and A. C. Kak, Digital Picture Processing, 2nd ed., pp. 266– 267, Academic Press, San Diego, 1982.
23. J. Malik, Interpreting line drawings of curved objects, Int. J. Comput. Vision 1, 1987, 73–103. 24. A.-R. Mansouri, A. S. Malowany, and M. D. Levine, Line detection in digital pictures: A hypothesis prediction/verification paradigm, Comput. Vision Graph. Image Process. 40, 1987, 95–114. 25. P. A. Maragos and R. W. Schafer, Morphological skeleton representation and coding of binary images, IEEE Trans. Acoust. Speech Signal Process. 34, 1986, 1228–1244. 26. E. H. Milun, D. K. W. Walters, Y. Li, and B. Antanacio, General ribbons: A model for stylus-generated images, Comput. Vision Image Understand. 76, 1999, 259–266. 27. R. Nevatia and K. R. Babu, Linear feature extraction and description, Comp. Graph. Image Process. 13, 1980, 257–269. 28. T. Pavlidis, A vectorizer and feature extractor for document recognition, Comput. Vision Graphics Image Process. 35, 1986, 111–127. 29. A. Rosenfeld, Picture Processing by Computer, pp. 143–146, Academic Press, San Diego, 1969.
32. P. L. Rosin and G. A. W. West, Segmenting curves into elliptic arcs and straight lines, in ICCV, Vol. 90, pp. 75–78, 1990. 33. J. Serra, Image Analysis and Mathematical Morphology, Academic Press, New York, 1982. 34. R. W. Smith, Computer processing of line images: A survey, Pattern Recognition 20, 1987, 7–15. 35. S. Suzuki and K. Abe, Binary picture thinning by an iterative parallel twosubcyle operation, Pattern Recognition 10, 1987, 297–307. 36. D. Walters, K. Ganapathy, and F. vanHuet, An orientation-based representation for contour analysis, in Spatial Vision in Humans and Machines (L. Harris and M. Jenkin, Eds.), Cambridge Univ. Press, Cambridge, UK, 1992. 37. D. K. W. Walters, Selection and use of image primitives for generalpurpose computer vision algorithms, Comput. Vision Graphics Image Process. 37(3), 1987, 261–298. 38. A. B. Wang, K. C. Fan, and J. S. Huang, Recognition of handwritten chinese characters by modified relaxation methods, Image Vision Comput. 12(8), 1994, 509–522.