Pattern Recognition 38 (2005) 673 – 689 www.elsevier.com/locate/patcog
A new shape decomposition scheme for graph-based representation Duck Hoon Kima , Il Dong Yunb,∗ , Sang Uk Leea a School of Electrical Engineering and Computer Science, Seoul National University, Seoul 151-742, Republic of Korea b School of Electronics and Information Engineering, Hankuk University of F S, Yongin 449-791, Republic of Korea
Received 9 February 2004; received in revised form 4 October 2004; accepted 4 October 2004
Abstract Nowadays, the part-based representation of a given shape plays a significant role in shape-related applications, such as those involving content-based retrieval, object recognition, and so on. In this paper, to represent both 2-D and 3-D shapes as a relational structure, i.e. a graph, a new shape decomposition scheme, which recursively performs constrained morphological decomposition (CMD), is proposed. The CMD method adopts the use of the opening operation with the ball-shaped structuring element, and weighted convexity to select the optimal decomposition. For the sake of providing a compact representation, the merging criterion is applied using the weighted convexity difference. Therefore, the proposed scheme uses the split-and-merge approach. Finally, we present experimental results for various, modified 2-D shapes, as well as 3-D shapes represented by triangular meshes. Based on the experimental results, it is believed that the decomposition of a given shape coincides with that based on human insight for both 2-D and 3-D shapes, and also provides robustness to scaling, rotation, noise, shape deformation, and occlusion. 䉷 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Shape decomposition; Part-based representation; Morphological operation; Weighted convexity; Split-and-merge approach
1. Introduction Shape analysis is a fundamental problem in image processing and computer vision, since shape features provide a powerful clue to the identity and functionality of an object. More specifically, shape usually contains perceptual information, and thus human beings recognize particular objects mainly from their shapes. This distinguishes shape from other visual features, i.e. color, texture or motion, and many applications, such as content-based retrieval system and object recognition system, are likely to use shape features via a
∗ Corresponding author. Tel.: +82 31 330 4260; fax: +82 31 330 4120. E-mail addresses:
[email protected] (D.H. Kim),
[email protected] (I.D. Yun),
[email protected] (S.U. Lee).
descriptor [1]. Recently, the supply and demand for 3-D objects have increased significantly with the technological development of computer vision and computer graphics [2]. In addition, the part-based representation, which decomposes a given shape into canonically meaningful parts, is increasingly being used as a perceptual shape descriptor [3] due to the growing demand for multimedia retrieval with human insight [4]. Therefore, it is important to develop part-based representations for both 2-D and 3-D shapes, since this allows relational structures, i.e. graphs, to be generated directly for further shape-related applications. Note that the main application of part-based representation includes content-based retrieval, metamorphosis, simplification, and so on [3]. So far, various part-based representations, which can be converted into graphs, have been developed, and they are categorized into morphology-based decomposition [5–8] and partition-based decomposition [9,10]. Morphology-
0031-3203/$30.00 䉷 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2004.10.003
674
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
Fig. 1. The block diagram of the proposed scheme.
based decomposition has been developed based on the morphological skeleton transform [11] and morphological shape decomposition [12]. Morphology-based decomposition includes a well-developed mathematical structure, which can be directly applied to 3-D shapes by voxelization. However, it is sensitive to scaling, rotation, and noise, since most structuring elements (SEs) cannot approximate well their target shapes in a continuous domain. Partition-based decomposition implies that a given shape is divided into proper connected components by connecting points on the shape’s boundary, and this system consists of the neckand-limb-based scheme [9] and the convexity-based scheme [10]. The neck-and-limb-based scheme is derived from a conceptual framework that is guided by constraints associated with recognition, and is mainly motivated by psychophysics. Compared with morphology-based decomposition, this scheme is invariant to scaling and rotation, as well as being robust to noise. However, its performance is mainly influenced by curve evolution and the detection of convex and concave arcs, which are not simple tasks for either 2-D or 3-D shape. In addition, this scheme involves heavy computational complexity due to the need to compute each part in reaction–diffusion space. The convexity-based scheme, which decomposes a given shape in order to maximize the weighted convexity, provides a simple and efficient means of creating an intuitive description. However, a means of specifying the number of parts to be decomposed is required, and the extension from the contours used for 2-D shapes to the surfaces used for 3-D shapes is still an ongoing research topic.
In this paper, a new shape decomposition scheme is proposed in order to provide a part-based representation for both 2-D and 3-D shapes. The proposed scheme stands on two psychological rationales: one is that a human being recognizes an object by its part structure [13], and the other is that the parts are generally defined as being either convex or nearly convex shapes [9,10,14]. To convert these rationales into a practical application, the proposed scheme combines the advantages of morphology-based decomposition, such as simple and intuitive operations, with those of partitionbased decomposition, such as the perceptually valid measure, convexity [10]. Actually, the proposed scheme recursively performs the constrained morphological decomposition (CMD) based on the opening operation with ball-shaped SEs and weighted convexity. Note that the parts to be decomposed are rendered convex or nearly convex by using the ball-shaped SE, since it is convex itself and rotation invariant. Then a merging criterion employing the weighted convexity difference, which determines whether adjacent parts are merged or not, is adopted for the sake of providing a compact representation. These procedures are summarized as the block diagram in Fig. 1. More specifically, the proposed scheme consists of three stages, the initial decomposition stage (IDS),the recursive decomposition stage (RDS), and the iterative merging stage (IMS). Compared with existing part-based representations, the proposed scheme can provide a more compact representation by virtue of the IMS with the notion of the convexity also being adopted consistently in the IDS and RDS. This alleviates one problem that arises in morphology-based
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
decomposition, i.e. there is no proper measure for the successive merging procedures. Especially, the use of the ballshaped SE in the proposed scheme dispenses with the need to select the proper type of SE, which is the other problem that arises in morphology-based decomposition, as described in Ref. [7]. In addition, the number of parts can be automatically determined by performing the morphological operation iteratively, and therefore it is not necessary to select the points to be connected on the boundary of a given shape. More specifically, the proposed scheme overcomes the shortcomings associated with partition-based decomposition, such as those described in Ref. [10]. From the experimental results for various, modified 2-D shapes, as well as some 3-D shapes, the proposed scheme is found to be robust to scaling, rotation, noise, shape deformation, and occlusion, by virtue of the ball-shaped SE, and it can be directly applied to 3-D shapes via the process of voxelization [15]. Moreover, it can decompose arbitrary shapes into canonically meaningful parts, with only three fixed thresholds for both 2-D and 3-D shapes. This implies that, to the best of our knowledge, the proposed scheme represents the first unified decomposition method for both 2-D and 3-D shapes. This paper is organized as follows: We begin in Section 2 with a preliminary explanation of the mathematical concepts of morphology and convexity. In Section 3, we describe the constrained morphological decomposition, which is then performed recursively with the split criterion in Section 4. Then the merging criterion required for providing the compact representation is presented in Section 5, and Section 6 presents various experimental results for both 2-D and 3-D shapes. Finally, Section 7 concludes this paper.
2. Preliminaries Assume that a binary image lies on the boundary and interior of the shape. In this context, a binary image is an array of “0”s and “1”s, where “0” represents the exterior, while “1” represents the boundary and interior. Note that 2-D and 3-D binary images are composed of pixels and voxels, respectively. In this section, the mathematical concepts of morphology and convexity, adopted in the CMD, are described. 2.1. Basic definitions of mathematical morphology The language of mathematical morphology is set theory. Therefore, mathematical morphology offers a unified and powerful approach to numerous problems in image processing and computer vision [16]. More specifically, the effects of basic morphological operations, which can be implemented very effectively by parallel processing, provide simple and intuitive interpretations, using geometric terms of the shape, size, and location [7]. For two images, M and S, and a point, u, the translation of M by u is defined as
675
follows: (M)u = {m + u | m ∈ M}.
(1)
Then there are two basic morphological operations, the dilation of M by S and the erosion of M by S, which are defined as follows, respectively: M⊕S= (M)s , (2) s∈S
MS =
(M)−s .
(3)
s∈S
Next, there exist two fundamental operations based on the dilation and erosion, i.e. the opening of M by S (M ◦ S) and the closing of M by S (M • S), which are defined as follows, respectively: M ◦ S = (MS) ⊕ S,
(4)
M • S = (M ⊕ S)S.
(5)
2.2. Basic definitions of convexity M is deemed to be convex if and only if the line segment, pq, is completely contained in M for any pair of points, p, q ∈ M. The convex hull of M, H(M), is the smallest convex set that contains M. In other words, it is the intersection of all convex sets that contain M [17]. The convexity of M, C(M), is defined as: C(M) =
N (M) , N (H(M))
(6)
where N (·) refers to the number of “1”s. Since the numerator is equal to or less than the denominator, C(·) is in the interval (0, 1] (1 for perfect convexity). In this paper, the process of maximizing the convexity, which has been widely used in the field of shape-related analysis [10], is a basic philosophy used to measure the properness of the decomposition result. 3. Constrained morphological decomposition In this section, the notion of CMD, based on the mathematical concepts of morphology and convexity, is proposed. The CMD is composed of two steps: The first step consists of generating candidate decompositions for the next step, using the opening operation, as a function of the size of the ball-shaped SE. The second step is to select the best candidate in the form of the output of the CMD based on the weighted convexity. 3.1. Generating parts using opening operation The ball-shaped SE, S(k), is 1 if u − o k, S(k) = 0 if u − o > k,
(7)
676
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
Fig. 2. The example of the CMD: (a) the rabbit in the form of a 120 × 120 binary image, (b) M ◦ S(12), (c) M − M ◦ S(12), (d) PM (12) after extracting the connected components in four neighborhoods, (e) the result after absorbing the negligible parts using the residual criterion by tR = 2, and (f) the result after the flattening procedure.
where u and o refer to a point and the central point of the binary image, respectively, and · refers to the Euclidean distance. Note that the natural number, k, represents the radius of the ball-shaped SE. Let us assume that M is a binary image that describes a 2-D or 3-D shape, and M◦S(k) and M−M◦S(k) are derived from the opening operation with S(k). Fig. 2 (a)–(c) show a rabbit in the form of a 120 × 120 binary image, M ◦ S(12), and M−M◦S(12), respectively. After the opening operation with S(k), the connected components (in 4 neighborhoods for 2-D or in 6 neighborhoods for 3-D) are extracted from M ◦ S(k) and M − M ◦ S(k). Then the set of binary images, PM (k) = {Mi (k)|i = 1, . . . , Ik }, is composed, where Mi (k), called a part, refers to a binary image and Ik is the number of parts. More specifically, the set of parts extracted from M ◦ S(k) and that extracted from M − M ◦ S(k) are called the body class and branch class, respectively, and they are used to form the residual criterion in the next paragraph Ik and the split criterion in Section 4. Note that i=1 Mi (k) is equal to M, and any pair of binary images in PM (k) is disjoint. Fig. 2(d) shows PM (12) after extracting the connected components. Apparently, there exist negligible parts in the branch class, enclosed by a circle in Fig. 2(d), which should be absorbed for the sake of compactness. Then two problems arise from absorbing the negligible parts: One is how to determine whether a part is negligible, and the other is which parts absorb them. For the ith part in the branch class, Mi (k), the part in the body class, which shares the maximum number of adjacent elements (pixels in 2-D or voxels in 3-D) with Mi (k), is selected. To decide whether Mi (k) is to be
Fig. 3. An example of computing R(·) for the two parts in Fig. 2(d): In the case of the part at the top left, Mi1 (k), and the part at the bottom left, Mi2 (k), R(Mi1 (k)) = 1 and R(Mi2 (k)) = 3.4, respectively. Note that the slanted pixels in Mi1 (k) and Mi2 (k) belong to V(Mi1 (k)) and V(Mi2 (k)), respectively.
absorbed or not, the ratio R(Mi (k)) =
N (Mi (k)) N (V(Mi (k)))
(8)
is introduced, where V(Mi (k)) is a binary image composed of the elements, which are adjacent to M − Mi (k), contained in Mi (k). Since the numerator is equal to or greater than the denominator, R(·) lies in the interval [1, ∞). In other words, R(·) implies the notion of height, i.e. the number of elements of Mi (k) divided by the number of elements, which are adjacent to other parts of Mi (k). Fig. 3 shows an example of computing R(·) for two parts enclosed by circles, which belong to the branch class
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
677
Fig. 4. Excerpts of the candidates for PM and its weighted convexity, Cw (PM (k)): (a) Cw (M) = 0.750, (b) Cw (PM (6)) = 0.836, (c) Cw (PM (9)) = 0.931, (d) Cw (PM (11)) = 0.935, (e) Cw (PM (12)) = 0.939, (f) Cw (PM (13)) = 0.918, (g) Cw (PM (26)) = 0.822, (h) Cw (PM (28)) = 0.822, and (i) Cw (PM (29)) = 0.829.
in Fig. 2(d). In the case of the part at the top left of Fig. 3, Mi1 (k), and the part at the bottom left of Fig. 3, Mi2 (k), R(Mi1 (k)) = N(Mi1 (k))/N(V(Mi1 (k))) = 1/1 = 1 and R(Mi2 (k)) = N(Mi2 (k))/N(V(Mi2 (k))) = 34/10 = 3.4, respectively. Note that the slanted pixels in Mi1 (k) and Mi2 (k) belong to V(Mi1 (k)) and V(Mi2 (k)), respectively. Finally, the residual criterion is R(Mi (k)) tR ,
(9)
where tR is a pre-specified threshold. When Mi (k) satisfies Eq. (9), i.e. the height of Mi (k) is equal to or less than tR , it is absorbed. Fig. 2(e) shows the result after applying the residual criterion to the parts in the branch class as determined by tR = 2. As shown in Fig. 2(e), the boundary between the parts is not straight, due to the opening operation with the ballshaped SE. Sometimes, this yields unexpectedly peculiar parts when the recursive decomposition stage described in Section 4 is performed. Therefore, an additional procedure is required to flatten the boundary between the parts. When there exist two adjacent parts, Mi1 (k) in the branch class and Mi2 (k) in the body class, a convex hull can be generated
for the elements, which are adjacent to those of Mi2 (k), of Mi1 (k). Then the flattening procedure implies that Mi1 (k) absorbs the elements of Mi2 (k), which are included in this convex hull. Fig. 2(f) shows the result after the flattening procedure. Note that a similar procedure to that described above is adopted in Ref. [7] to fill the undesirable cut caused by morphological operations. For the sake of convenience, we will use the same notations for parts, such as PM (k), Mi (k), and Ik , after absorbing the negligible parts and performing the flattening procedure. As an example, Fig. 2(f) becomes PM (12) = {Mi (12)|i = 1, . . . , I12 } with I12 = 6, where PM (12) implies that this set of binary images results from the opening operation with S(12) and the number of parts in PM (12), I12 , is equal to 6. 3.2. Automatic selection using weighted convexity Note that the generation of parts for a given shape is performed for all possible k’s, k = 1, . . . , K, where K + 1 is the radius of the smallest ball-shaped SE for which the body class becomes empty. In this context, PM (k),
678
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
this class generally represents minor details. Therefore, a split criterion is introduced, especially for the branch class, to determine whether a part in the branch class is to be split or not.
1
weighted convexity
0.95 0.9
4.1. Description of split criterion
0.85 0.8 0.75 0.7 0
5
10
15 k
20
25
30
Fig. 5. A plot of Cw (PM (k)) versus k.
k = 1, . . . , K, are candidates for the output of the CMD. Now, it is necessary to define a measure, in order to choose the best candidate (k value). In this paper, the weighted convexity for PM (k), namely, Cw (PM (k)) =
Ik N(Mi (k)) i=1
N(M)
C(Mi (k)),
(10)
N({j |N(A(PM , [Mi ]j )) = 1}) > 1,
is introduced as a decision measure. Then the output of the CMD for M, PM , is PM (kmax ), where kmax = arg
max
k∈{1,...,K}
Cw (PM (k)).
For the additional decomposition of a part in the branch class to be meaningful, it must be empirically observed that the CMD of that part yields at least two parts and each of them has only one adjacent part. Let us refer to this observation as the split criterion. When a part in the branch class satisfies the split criterion, it is split. In order to provide a more intuitive description, the recursive CMD with the split criterion will be performed until no more parts can be split. Let us refer to this as the RDS, and denote the output ˆ M = {M ˆ i |i = 1, . . . , Iˆ} where M ˆi of the RDS for M as P refers to a part and Iˆ is the number of parts. Now, let us summarize this notion by rigorous formulation. Assume that PM , the output of the CMD for M, has I parts (Mi , i = 1, . . . , I ) and PMi , the output of the CMD for Mi , has J parts ([Mi ]j , j = 1, . . . , J ). Then the split criterion for Mi in the branch class can be expressed mathematically as
(11)
In brief, PM yields the maximum weighted convexity among the candidates. More specifically, PM = {Mi |i = 1, . . . , I } where Mi refers to a part and I is the number of parts. Figs. 4 and 5 show excerpts of the candidates for the PM of the rabbit and a plot of Cw (PM (k)) versus k, respectively. In this case, PM (12) of Fig. 4(e) becomes PM since it yields the maximum weighted convexity, 0.939.
4. Recursive decomposition stage for more intuitive description The CMD is performed once for a given shape in the first stage of the proposed scheme. Let us refer to it as the IDS. Although the output of the IDS is usually adequate for an intuitive description, it generally requires refinements, which depends on the complexity of the input shape. For example, Fig. 6(a) shows the output of the IDS for the rabbit, and in this case it would be better if M3 , which contains two ears, may be to be split. Therefore, in this paper, the CMD will be performed recursively, in order to provide a more intuitive description. In this context, the recursive CMD would be terminated when no more parts can be split. However, if this were the case, it is likely that too many meaningless parts would be generated from the CMD in the branch class since
(12)
where N(·) is the number of elements in the set and A(PM , [Mi ]j ) refers to the set of the parts, which are adjacent to [Mi ]j , in (PM − {Mi })∪ PMi . When Mi satisfies Eq. (12), i.e. the CMD for Mi yields at least two parts and each of them has only one adjacent part, it is split. 4.2. Examples of the split criterion and RDS Fig. 7 shows an example of applying the split criterion to M2 , M3 , and M5 in the branch class of Fig. 6(a). More specifically, the sets of parts in Fig. 7(a)–(c) correspond to (PM − {M2 }) ∪ PM2 , (PM − {M3 }) ∪ PM3 , and (PM − {M5 }) ∪ PM5 , respectively. Note that the parts enclosed by the circle in Fig. 7 refer to PMi , i.e. the additional decomposition of Mi , where N(PM2 ) = 2, N(PM3 ) = 3, and N(PM5 ) = 2. In the case of M2 , there are no parts having only one adjacent part since N(A(PM , [M2 ]1 ))=2 and N(A(PM , [M2 ]2 )) = 2, i.e. N({j |N(A(PM , [M2 ]j )) = 1}) = 0. However, in the case of M3 , there are two parts having only one adjacent part since N(A(PM , [M3 ]1 )) = 3, N(A(PM , [M3 ]2 )) = 1, and N(A(PM , [M3 ]3 )) = 1, i.e. N({j |N(A(PM , [M3 ]j )) = 1}) = 2. Similarly, in the case of M5 , N({j |N(A(PM , [M5 ]j )) = 1}) = 1. Finally, according to Eq. (12), M3 is to be split, and M2 and M5 are not to be split. Fig. 6 presents the RDS procedure for the rabbit. Fig. 6(a) shows the output of the IDS for the rabbit, consisting of one part in the body class, M1 , and five parts in the branch
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
679
Fig. 6. An example of the RDS for the rabbit: (a) PM , (b) the result obtained after performing the CMD once for each part in PM , (c) the result obtained after performing the CMD once for M1 and M3 , and (d) the result obtained after performing the CMD once for [M1 ]1−2 and [M3 ]1–3 .
Fig. 7. An example of applying the split criterion to Fig. 6(a): (a) (PM −{M2 })∪ PM2 , (b) (PM −{M3 })∪ PM3 , and (c) (PM −{M5 })∪ PM5 .
class, M2–6 . Fig. 6(b) is the result obtained after performing the CMD once for each part in Fig. 6(a), where M1–3 and M5 are split, and M4 and M6 are not. Here, M1 belongs to the body class, and, as shown in Fig. 7, M3 is satisfied with Eq. (12) but M2 and M5 are not. Therefore, M1 and M3 are split, and M2 and M5 are not. Fig. 6(c) is the result obtained after performing the CMD once for M1 and M3 . More specifically, it consists of two parts in the body class, [M1 ]1 and [M3 ]1 , and seven parts in the branch class, M2 , M4–6 , [M1 ]2 , and [M3 ]2–3 . Note that M2 and M4–6 are free from additional decomposition since it has already been determined that they do not require to be split. Fig. 6(d) is the result obtained after performing the CMD once for [M1 ]1–2 and [M3 ]1–3 , where [M3 ]2–3 are split but [M1 ]1–2 and [M3 ]1 are not. Actually, [M3 ]2–3 are not to be split since they are not satisfied with Eq. (12). Therefore, ˆ M , for the Fig. 6(c) refers to the final result of the RDS, P rabbit.
5. Iterative merging stage for compact representation Although the output of the RDS is intuitive, it sometimes requires post-processing in order to provide a compact representation. For example, we present Fig. 8(a) and Fig. 8(c) as the output of the RDS for the natural and man-made objects, respectively. Since the RDS performs the recursive CMD in a greedy manner, it can be observed that there are over-decomposed parts from the point of view of compact-
ness. Therefore, a final merging procedure is required after the RDS. In this paper, the weighted convexity difference (WCD) and merging criterion are introduced as decision measures for the merging procedure and for evaluating the suitability of merging parts, respectively, where the WCD refers to the difference of the weighted convexity during merging the parts. 5.1. Description of WCD and merging criterion ˆ M , the output of the RDS for M, has Iˆ Assume that P ˆ parts (Mi , i = 1, . . . , Iˆ). Then the WCD is defined for two ˆ M , and is given by ˆ i and M ˆ i in P adjacent parts, M 1 2 ˆ i ,M ˆ i ) = Cw ({M ˆ i ,M ˆ i }) − C(M ˆ i ∪M ˆ i ). D(M 1 2 1 2 1 2
(13)
Note that the WCD is the difference between the weighted ˆ i , and the convexity ˆ i and M convexity of the two parts, M 1 2 ˆ ˆ of the merged part, Mi1 ∪ Mi2 . Then the merging criterion associated with the WCD is ˆ i ) min tL , tG , ˆ i ,M (14) D(M 1 2 f ˆ i )]/N(M), and tL and tG refer ˆ i ) + N (M where f = [N (M 1 2 to thresholds for the WCD and the WCD multiplied by f, respectively. Especially, tG implies the allowable convexity reduction when the two adjacent parts are merged by considering the weighted convexity of all parts in a given shape. In ˆ i ,M ˆi ) other words, satisfying Eq. (14) implies that D(M 1 2
680
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
ˆ M ) = 0.971, (b) the output of the IMS for the Fig. 8. An example of the IMS: (a) the output of the RDS for the natural object when Cw (P ˜ M ) = 0.970, (c) the output of the RDS for the man-made object when Cw (P ˆ M ) = 0.988, and (d) the output of natural object when Cw (P ˜ M ) = 0.986. the IMS for the man-made object when Cw (P
Fig. 9. An example of performing the IMS for Fig. 8(c).
ˆ i ,M ˆ i ) is equal is equal to or less than tL , and also f D(M 1 2 ˆ i ,M ˆi ) to or less than tG . It is worthy to note that f D(M 1 2 ˆ M ) and Cw (P ˆM∪ is equal to the difference between Cw (P ˆ M and P ˆ M ∪ {M ˆi ∪M ˆ i } − {M ˆ i ,M ˆ i }),where P ˆi ∪ {M 1 2 1 2 1 ˆ i ,M ˆ i } are the decomposition results before and ˆ i } − {M M 2 1 2 ˆ i , respectively. Finally, the mergˆ i and M after merging M 1 2 ing procedure is also performed iteratively until no more adjacent parts are satisfied with Eq. (14). In this context, ˆ i and M ˆ i , of which it is applied to two adjacent parts, M 1 2 ˆ ˆ f D(Mi1 , Mi2 ) is the minimum value of this function for all adjacent parts. Let us refer to it as the IMS, and denote ˜ M = {M ˜ i |i = 1, . . . , I˜} the output of the IMS for M as P ˜ i refers to a part and I˜ is the number of parts. Note where M ˜ M is to that the IMS controls the trade-off, i.e. whether P be the compact representation or that required to obtain the maximum weighted convexity. 5.2. Examples of the IMS with merging criterion Fig. 8 shows the results of the IMS with tL = 0.03 and tG = 0.005 for the natural and man-made objects where tL and tG are determined experimentally. To clarify the IMS procedure with the merging criterion, we present the sequential IMS procedures for Fig. 8(c) in Fig. 9. For the purpose of providing a coherent explanation, we use the same notation for the parts in Fig. 9 as in Fig. 8(c). In addition, Tables 1 and 2 present detailed information for the IMS for the natural and man-made objects, respectively. Note that each row in Tables 1 and 2 refer to the information obtained during each procedure when two ad-
Table 1 ˆi )+ One step of the IMS for the natural object (f = N (M 1 ˆ i )/N (M)) N (M 2 Procedure
ˆ i ,M ˆi ) D(M 1 2
ˆ i ,M ˆi ) f D(M 1 2
From Fig. 8(a) to (b) No change
0.009 0.013
0.001 0.009
Table 2 ˆi )+ Four steps of the IMS for the man-made object (f = N (M 1 ˆ N (Mi2 )/N (M)) Procedure
ˆ i ,M ˆ i ) f D(M ˆ i ,M ˆi ) (i1 ,i2 ) pair D(M 1 2 1 2
From Fig. 8(c) to 9(a) From Fig. 9(a) to (b) From Fig. 9(b) to (c) From Fig. 9(c) to 8(d) No change
(1,7) (1,6) (1,5) (1,4) (1,2)
−0.002 −0.001 0.002 0.002 0.077
−0.001 −0.001 0.002 0.002 0.073
ˆ i and M ˆ i , for which f D(M ˆ i ,M ˆ i ) is a jacent parts, M 1 2 1 2 minimum, are selected to be merged. In the case of the natural object, the two adjacent parts enclosed by a circle in Fig. 8(a), are merged since they satisfy Eq. (14), i.e. 0.009 tL and 0.001 tG as presented in the first row of Table 1. Then the merging procedure is terminated, since the part enclosed by a circle in Fig. 8(b) and its adjacent part are not satisfied with Eq.(14), i.e. 0.013 tL and 0.009 > tG as presented in the second row of Table 1. This implies that
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
681
Fig. 10. Experimental results for the nine 2-D shapes in Ref. [7].
the part enclosed by the circle in Fig. 8(b) is not merged when considering the trade-off between compactness and maximality. Similarly, in the case of the man-made object, the merging procedure proceeds as presented in Table 2. Note that the merging procedure from Fig. 8(c) to Fig. ˆ 1, M ˆ 7) 9(a) increases the weighted convexity since f D(M in Fig. 8(c) is negative, i.e. −0.001. Similarly, the merging procedure from Fig. 9(a) and (b) increases the weighted convexity. Finally, Fig. 8(b) and (d) are the outputs of the IMS for Fig. 8(a) and (c), respectively. More specifically, the nine parts in Fig. 8(a) become eight parts in Fig. 8(b) with a reduction of the weighted convexity by 0.001, and the seven parts in Fig. 8(c) become three parts in Fig. 8(d) with a reduction of the weighted convexity by 0.002.
6. Experimental results In this section, we present experimental results for various 2-D shapes given in Refs. [7,9,18]. We also examine the robustness of the proposed scheme against scaling, rotation, noise, shape deformation, and occlusion with modified shapes. The methodology and experimental results of 3-D shape decomposition are also presented. Although the CMD is performed recursively, the proposed scheme can
yield the decomposition result in a few iterations by virtue of the split criterion. In addition, it is worthy to note that the proposed scheme worked well in all of the experiments, despite the fact that the thresholds for the residual criterion and merging criterion were always set to fixed values of tR = 2, tG = 0.005, and tL = 0.03, respectively. All of the experiments were conducted on an IntelTM Pentium IV 2.4 GHz with 1024 MB memory and with Microsoft TM Windows 2000 Professional. 6.1. Experimental results for 2-D shape A quantitative evaluation for the part-based representation is very difficult, thus, a qualitative one is considered for various 2-D shapes, as described in Ref. [9], where a qualitative assessment of the perceived parts by a majority of 14 subjects is used to evaluate the decomposition performance. In this context, the purpose of our experiments consists of presenting the subjective performance of the proposed scheme and verifying its robustness against various modifications. To perform a subjective evaluation for the proposed scheme, we applied the proposed scheme to the 2-D shapes given in Refs. [7,9,18]. Fig. 10 shows the decomposition results for nine 2-D shapes, provided in the form of 40 × 40 binary images, and Table 3 presents the execution time in
682
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
Table 3 The elapsed time (in seconds) of the proposed scheme for the nine shapes in Fig. 10 Fig. 10
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
IDS + RDS IMS Total
0.031 0.000 0.031
0.047 0.016 0.063
0.047 0.000 0.047
0.031 0.000 0.031
0.078 0.000 0.078
0.047 0.000 0.047
0.031 0.016 0.047
0.062 0.000 0.062
0.047 0.000 0.047
Fig. 11. Experimental results for the twelve 2-D shapes in Ref. [9].
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
683
Table 4 The elapsed time (in seconds) of the proposed scheme for the 12 shapes in Fig. 11 Fig. 11
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
IDS + RDS IMS Total
0.828 0.031 0.859
0.071 0.125 0.796
1.359 0.094 1.453
0.782 0.046 0.828
0.781 0.078 0.859
0.782 0.109 0.891
1.578 0.094 1.672
0.968 0.016 0.984
0.718 0.047 0.765
1.156 0.281 1.437
2.531 0.062 2.593
2.328 0.063 2.391
Fig. 12. Experimental results for the twelve 2-D shapes in Ref. [18].
684
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
Fig. 13. Experimental results for scaled 2-D shapes: (a) the scaled shape of Fig. 11(l) with ratio 0.5; (b) the scaled shape of Fig. 11(l) with ratio 1.5; (c) the scaled shape of Fig. 11(c) with ratio 0.5; and (d) the scaled shape of Fig. 11(c) with ratio 1.5.
Fig. 14. Experimental results for rotated 2-D shapes: (a) the rotated shape of Fig. 11(l) with 9◦ rotation; (b) the rotated shape of Fig. 11(l) with 45◦ rotation; (c) the rotated shape of Fig. 11(c) with 9◦ rotation; and (d) the rotated shape of Fig. 11(c) with 45◦ rotation.
Fig. 15. Experimental results for noise-corrupted 2-D shapes: (a) the shape of Fig. 11(l) with 5% noise; (b) the shape of Fig. 11(l) with 10% noise; (c) the shape of Fig. 11(c) with 5% noise; and (d) the shape of Fig. 11(c) with 10% noise.
seconds. Note that the proposed scheme yields reasonable decomposition results, similar to those of Ref. [7], and also its computational complexity is quite low. Especially, the number of parts is significantly reduced by virtue of the IMS using the merging criterion, compared with Ref. [7]. On the other hand, Fig. 10(b) has two redundant parts in the upper region of the truck. Actually, the ability of the proposed scheme to extract the exact polygon-shaped part is somewhat unsatisfactory, due to the characteristics of the ball-shaped SE. Fig. 11 shows the decomposition results for twelve 2-D shapes, provided in the form of 120 × 120 binary images, and Table 4 presents the execution time in seconds. Note that the decomposition results are observed
to be in good agreement with the perceived parts, as shown in Ref. [9], and the time performance of the proposed scheme is quite good. Especially, the proposed scheme is able to extract the tail part for Fig. 11(i), (j), and (k), in contrast to Ref. [9]. On the other hand, Fig. 11(a) and (b) are somewhat over-decomposed since the concavities in the boundary of a given shape are not considered. Fig. 12 shows the decomposition results for the twelve 2-D shapes given in Ref. [18]. Although there are a few over-decomposed cases, which are similar to those depicted in Fig. 11(a) and (b), most of the decomposition results seem to be intuitive. To demonstrate its robustness, we applied the proposed scheme to the modified versions of the two 2-D shapes
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
685
Fig. 16. Experimental results for the deformed and occluded 2-D shapes in Ref. [19].
given in Ref. [9] and four 2-D shapes given in Ref. [19]. Figs. 13–15 show the decomposition results for the scaled, rotated, and noise-corrupted 2-D shapes of Fig. 11(l) and (c), respectively. In this experiment, the noise for the 2-D shape is generated as follows [7]: The boundary and exterior, which is adjacent to the boundary pixels, have a certain probability of undergoing the transition between “0” and “1”. Then the transition of the pixels is determined by the generation of a random number with probabilities such as 5% and 10%, as shown in Fig. 15. All of the perturbed shapes in Figs. 13–15 yield the same decomposition results, as shown in Fig. 11(l) and (c). Fig. 16 shows the decomposition results for the deformed and occluded 2-D shapes given in Ref. [19]. More specifically, Fig. 16(b)–(d), (f)–(h), (j)–(l), and (n)–(p) are the deformed and occluded 2-D shapes of Fig. 16(a), (e),
(i), and (m), respectively. Note that all of the deformed and occluded shapes yielded the same or reasonable decomposition results, as shown in Fig. 16(a), (e), (i), and (m). From Figs. 13–16, it is believed that the proposed scheme is quite robust, not only to scaling, rotation, and noise, but also to shape deformation and occlusion. 6.2. Experimental results for 3-D shape In general, the part-based representation of a 3-D shape, represented by triangular meshes with real coordinates, is not a simple task since these meshes have an irregular grid structure. In this paper, the process of voxelization [15], which is independent of the extent of the 3-D shape and can be obtained in any desired resolution, is employed to
686
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
Fig. 17. The 3-D shape decomposition procedure for the cow.
regularize the structure of the 3-D shape. Then 3-D binary image, composed of voxels, includes the 3-D shape, and thus the proposed scheme for 2-D shapes can be applied directly to 3-D shapes. Fig. 17 presents the procedure used for the 3-D shape decomposition of the cow. Fig. 17(a) and (b) show the cow represented by rendered meshes and voxels, respectively. Then Fig. 17(c)–(e) show the outputs of the IDS, RDS, and IMS, respectively. Finally, Fig. 17(f) shows the graph-based representation, where the ellipsoidal node represents the volume and variation along the principal axes of the corresponding part, and the edge refers to a connected line segment between the parts, respectively. Note that the cow seems to be decomposed into nine perceptual parts, comprising a body, a head, two horns, a tail, and four legs. Fig. 18 shows the decomposition results and graph-based representations for various 3-D shapes, where the left, middle, and right columns refer to the 3-D shapes with rendered meshes, the decomposition results with rendered voxels, and the graphbased representations, respectively. Table 5 presents the detailed information for Figs. 17 and 18. Note that the graphbased representation, generated directly from the proposed scheme, seems to coincide with human insight. Therefore, the proposed scheme is believed to provide an alternative means of describing both 3-D shapes and 2-D shapes.
7. Conclusion In this paper, a new decomposition scheme for both 2-D and 3-D shapes was presented. To begin with, the constrained morphological decomposition (CMD) method,
Table 5 Model information and the decomposition results of the proposed scheme for various 3-D shapes 3-D shape
Vertices Meshes The resolution Parts of 3-D binary image
Phone 338 Pliers 272 Welder 12391 Cow 2904 Toydog 1944 Ant 4346 Bicycle 2288 Bi-fuselage plane 1554 Helicopter 8198
672 520 24541 5804 3808 8216 4484 2752 2709
69 × 22 × 20 117 × 73 × 9 82 × 70 × 14 51 × 34 × 16 54 × 45 × 26 105 × 57 × 33 77 × 56 × 50 69 × 130 × 22 100 × 87 × 32
3 5 8 9 10 11 14 20 20
based on the opening operation and weighted convexity, was proposed. The proposed scheme performs the CMD method once for a given shape, then recursively for each part using the split criterion. Finally, the use of the merging criterion was adopted by considering the change of the weighted convexity in the form of the weighted convexity difference. From the experimental results, it was found that the proposed scheme yields an intuitive description and provides robustness to scaling, rotation, noise, shape deformation, and occlusion. Moreover, it was verified that the proposed scheme could be applied to both 2-D and 3-D shapes. Especially, the proposed scheme alleviates several problems in existing part-based representations, such as the choice of the type of structuring element (SE), the detection of convex and concave arcs, and the selection of the number of parts to be decomposed. Therefore, it can be concluded that the proposed scheme is suitable for the generation of part-based
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
687
Fig. 18. The decomposition results and graph-based representations for various 3-D shapes: (a) the phone, (b) the pliers, (c) the welder, (d) the toydog, (e) the ant, (f) the bicycle, (g) the bi-fuselage plane, and (h) the helicopter.
representations, and is in line with the current tendency to develop perceptual shape descriptors compatible with a representation based on human insight and to adopt graph matching techniques in shape-related applications such as content-based retrieval and object recognition. On the other hand, the proposed scheme uses the opening operations with the ball-shaped SE, and thus the computational complexity could be higher than that of the
existing morphology-based decomposition methods, although the computational complexity is quite low, as shown in Tables 3 and 4. This problem can be alleviated using the fast method [20,21] or some iterative methods to approximate the ball-shaped SE [7,8]. Also, the possibility of applying the proposed scheme to real images should be examined, in order to develop a practical application, such as a content-based image retrieval system. Most
688
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689
Fig. 18. (continued).
decomposition methods can be applied to real images, i.e. gray or color images, using appropriate segmentation techniques, as described in Ref. [14]. In this context, we are currently considering one real situation, the 3-D VRML model retrieval system. Figs. 17 and 18 show the graph-based representations, which are quite easy to build and edit in the form of a query, and which are obtained directly from the proposed scheme. Note that there exists a reasonable coincidence between the representation based on human insight and the graph-based representation. Therefore, if suitable graph matching techniques were available for evaluating the similarity, content-based retrieval for a 3-D VRML database with this new functionality of shape retrieval, called query-by-sketch, could easily be implemented by using the developed graph-based representation.
References [1] S. Ullman, High Level Vision, MIT Press, Cambridge, MA, 1997.
[2] E. Paquet, M. Rioux, Nefertiti: a query by content system for three-dimensional model and image databases management, Image Vision Comput. 17 (2) (1999) 157–166. [3] E. Zuckerberger, A. Tal, S. Shlafman, Polyhedral surface decomposition with applications, Comput. Graph. 26 (5) (2002) 733–743. [4] Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: a power tool for interactive content-based image retrieval, IEEE Trans. Circuits Syst. Vid. Technol. 8 (5) (1998) 644–655. [5] D. Wang, V. Haese-Coat, J. Ronsin, Shape decomposition and representation using recursive morphological operation, Pattern Recogn. 28 (11) (1995) 1783–1792. [6] C. Arcelli, L. Serino, From discs to parts of visual form, Image Vision Comput. 15 (1) (1997) 1–10. [7] J. Xu, Morphological decomposition of 2-D binary shapes into convex polygons: a heuristic algorithm, IEEE Trans. Image Process. 10 (1) (2001) 61–71. [8] J. Xu, Efficient morphological shape representation with overlapping disk components, IEEE Trans. Image Process. 10 (9) (2001) 1346–1356. [9] K. Siddiqi, B.B. Kimia, Parts of visual form: computational aspects, IEEE Trans. Pattern Anal. Mach. Intell. 17 (3) (1995) 239–251.
D.H. Kim et al. / Pattern Recognition 38 (2005) 673 – 689 [10] P.L. Rosin, Shape partitioning by convexity, IEEE Trans. Syst. Man Cybern. A 30 (2) (2000) 202–210. [11] P.A. Maragos, R.W. Schafer, Morphological skeleton representation and coding of binary images, IEEE Trans. Acoust. Speech Signal Process. 34 (5) (1986) 1228–1244. [12] I. Pitas, A.N. Venetsanopoulos, Morphological shape decomposition, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1) (1990) 38–45. [13] A.P. Pentland, Recognition by parts, Proceedings of International Conference on Computer Vision, London, England, June 1987, pp. 612–620. [14] L.J. Latecki, R. Lakämper, Convexity rule for shape decomposition based on discrete contour evolution, Comput. Vision Image Understand. 73 (3) (1999) 441–454. [15] D.H. Kim, I.D. Yun, S.U. Lee, Graph representation by medial axis transform for 3D image retrieval, Proceedings of SPIE on Three-Dimensional Image Capture and Applications IV, San Jose, USA, January 2001, pp. 223–230.
689
[16] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Addison-Wesley, Reading, MA, 1992. [17] M. de Berg, M. van Kreveld, M. Overmars, O. Schwarzkopf, Computational Geometry: Algorithms and Applications, Springer, Berlin, 2000. [18] Clip art CD-ROM, ART explosion 600,000 images, Nova Developement Corporation, CA, USA. [19] T.B. Sebastian, P.N. Klein, B.B. Kimia, Recognition of shapes by editing shock graphs, Proceedings of the International Conference on Computer Vision, vol. 1, Vancouver, Canada, July 2001, pp. 755–762. [20] L. Vincent, Morphological transformations of binary images with arbitrary structuring elements, Signal Process. 22 (1) (1991) 3–23. [21] N. Nikopoulos, I. Pitas, A fast implementation of 3-D binary morphological transformations, IEEE Trans. Image Process. 9 (2) (2000) 283–286.
About the Author—DUCK HOON KIM received the B.S. and M.S. degrees in Electrical Engineering and Computer Science from Seoul National University, Seoul, Korea, in 1998 and 2000, respectively, where he is currently working toward the Ph.D. degree in Electrical Engineering and Computer Science. His research interests are in the areas of computer vision, computer graphics, and multimedia application, especially for 2-D and 3-D object representation/retrieval/recognition. About the Author—IL DONG YUN received the B.S., M.S., and Ph.D. degrees from Seoul National University, Seoul, Korea, in 1989, 1991, and 1996, respectively. He is currently an associate professor of the School of Electronics and Control Engineering, Hankuk University of F S, Yongin, Korea. In 1996–1997, he was a senior researcher in the Daewoo Electronics. His current research interests are on object recognition, 3-D object modeling and retrieving, and medical imaging. About the Author—SANG UK LEE received his B.S. degree from Seoul National University, Seoul, Korea, in 1973, M.S. degree from Iowa State University, Ames in 1976, and Ph.D. degree from University of Southern California, Los Angeles, in 1980, all in Electrical Engineering. From 1980 to 1981, he was with the General Electric Company, Lynchburg, VA, working on the development of the digital mobile radio. From 1981 to 1983, he was a Member of Technical Staff, M/A-COM Research Center, Rockville, MD. In 1983, he joined the Department of Control and Instrumentation Engineering at Seoul National University as an Assistant Professor, where he is now a Professor at the School of Electrical Engineering and Computer Science. Currently, he is also affiliated with the Automation and Systems Research Institute and the Institute of New Media and Communications at Seoul National University. His current research interests are in the areas of image and video signal processing, digital communication, and computer vision. He served as an Editor-in-Chief for the Transaction of the Korean Institute of Communication Science from 1994 to 1996. Dr. Lee is currently a Member of the Editorial Board of both the Journal of Visual Communication and Image Representation and the Journal of Applied Signal Processing, and an Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology. He is a member of Phi Kappa Phi.