Farthest point distance: A new shape signature for Fourier descriptors

Farthest point distance: A new shape signature for Fourier descriptors

ARTICLE IN PRESS Signal Processing: Image Communication 24 (2009) 572–586 Contents lists available at ScienceDirect Signal Processing: Image Communi...

1MB Sizes 0 Downloads 89 Views

ARTICLE IN PRESS Signal Processing: Image Communication 24 (2009) 572–586

Contents lists available at ScienceDirect

Signal Processing: Image Communication journal homepage: www.elsevier.com/locate/image

Farthest point distance: A new shape signature for Fourier descriptors Akrem El-ghazal a,, Otman Basir a, Saeid Belkasim b a b

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada N2L 3G1 Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA

a r t i c l e i n f o

abstract

Article history: Received 7 August 2008 Received in revised form 15 February 2009 Accepted 7 April 2009

Shape description is an important task in content-based image retrieval (CBIR). A variety of techniques have been reported in the literature that aims to represent objects based on their shapes. Each of these techniques has its pros and cons. Fourier descriptor (FD) is one of these techniques a simple, yet powerful technique that offers attractive properties such as rotational, scale, and translational invariance. Shape signatures, which constitute an essential component of Fourier descriptors, reduce 2-D shapes to 1-D functions and hence facilitate the process of deriving invariant shape features using the Fourier transform. A good number of shape signatures have been reported in the literature. These shape signatures lack important shape information, such as corners, in their representations. This information plays a major role in distinguishing between different shapes. In this paper, we present the farthest point distance (FPD), a novel shape signature that includes corner information to enhance the performance of shape retrieval using Fourier descriptors. The signature is calculated at each point on a shape contour. This signature yields distances calculated between the different shape corners, and captures points within the shape at which the human focuses visual attention in order to classify shapes. To reach a comprehensive conclusion about the merit of the proposed signature, the signature is compared against eight popular signatures using the well-known MPEG-7 database. Furthermore, the proposed signature is evaluated against standard boundaryand region-based techniques: the curvature scale space (CSS) and the Zernike moments (ZM). The FPD signature has demonstrated superior overall performance compared with the other eight signatures and the two standard techniques. & 2009 Elsevier B.V. All rights reserved.

Keywords: Fourier descriptors Image retrieval Shape signatures

1. Introduction The ease and convenience of capturing and transmitting digital images between digital cameras and image databases is a contributing factor in the immense growth of image databases. These databases cover a wide range of

 Corresponding author.

E-mail addresses: [email protected] (A. El-ghazal), [email protected] (O. Basir), [email protected] (S. Belkasim). 0923-5965/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2009.04.001

applications including military, environmental, astronomy, transportation, aviation, medical, and multimedia. The storage format of such image data is relatively standardized; however, the effective retrieval of images from such databases remains a significant challenge. Typically, images in a database are retrieved based on either textual information or content information. Early retrieval techniques were based on textual annotation of images. Images were first annotated with text, then searched based on their textual tags. However, text-based techniques have many limitations due to their reliance on

ARTICLE IN PRESS A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

manual annotation—a process that is tedious and errorprone, especially for large data sets. Furthermore, the rich content typically found in images and the subjectivity of the human perception make the task of describing images using words a difficult, if not impossible, task. To overcome these difficulties, content-based image retrieval (CBIR) was proposed [1]. This approach to image retrieval relies on the visual cues of images, rather than textual annotations, to search for images, and therefore has the potential to respond to more specific user queries. CBIR techniques use such visual contents as color, texture, and shape to represent and index images. The increasing interest in using shape features of objects for CBIR is not surprising, since shape is a more intrinsic property of objects than color and texture, and given the considerable evidence that natural objects are recognized primarily based on their shape [2,3]. A survey of users on the cognition aspects of image retrieval indicates that users are more interested in retrieval based on shape than on color and texture [4]. However, retrieval based on shape content remains a more difficult task than that based on other visual features [2]. During the last decade, significant progress has been made in both the theoretical and the practical research aspects of shape-based image retrieval [5,6]. There are mainly two approaches to shape representation, namely, the region-based approach and the boundary-based approach (also known as contour-based approach). Region-based techniques often use moment descriptors to describe shapes. These descriptors include geometrical moments [7,8], Zernike moments (ZM) [9,10], pseudoZernike moments [11], Legendre moments [9], and Tchebichef moments [12]. Other notable region-based techniques include generic Fourier descriptor (FD) [13], compound image descriptor [14], shape matrix [15], and the grid technique [16]. Although region-based approaches are global in nature and can be applied to generic shapes, they often involve intensive computation and fail to distinguish between objects that are similar [17]. In many applications the internal content of the shape is not as important as its boundary. Boundary-based techniques tend to be more efficient for handling shapes that are describable by their object contours [17]. Many boundary-based techniques have been proposed in the literature, including Fourier descriptors [18–20], curvature scale space (CSS) [21–23], wavelet descriptors [24,25], contour displacement [26], chain codes [27], autoregressive models [28], Delaunay triangulation technique [29], and multi-resolution polygonal shape descriptors [30]. Recently, dynamic programming (DP) has been adopted in order to achieve a high accuracy rate using shape boundaries [31–35]. Even though the DP-based techniques generally offer better performance than other techniques that do not use DP, the DP-based techniques suffer from being computationally expensive, making them impractical for large databases. Fourier descriptors have proven to be better than other boundary-based techniques in many applications [18–20,36,37]. The traditional FDs are based on applying

573

the Fourier transform to a shape signature. Many shape signatures have been used in Fourier descriptor techniques. However, the complex coordinate is the most frequently used signatures in the literature. Recent work shows that in shape-based image retrieval, the radial distance (RD) signature outperforms the complex coordinates (CC) and other signatures [38]. In order to increase the capability of the Fourier-based technique to capture local features, Eichmann et al. [39] have used the short Fourier transform (SFT). The SFT is not suitable for image retrieval because the matching process using SFT is computationally more expensive than the traditional FDs. Invariance to affine transforms allows considerable robustness in the case of rotating shapes in all three dimensions. Arbter et al. [40,41] have used a complex mathematical analysis and proposed a set of normalized descriptors that are invariant under any affine transformation. Also, Oirrak et al. [42] have used one-dimensional Fourier series coefficients to derive affine invariant descriptors. Zhang and Lu [38] have shown that even though the affine Fourier descriptors [40] was proposed to target affined shape distortion, it does not perform well on the standard affine invariance retrieval set of the MPEG-7 database. This is because the affine Fourier descriptors are designed to work on a polygonal shape under affine transformation and are not designed for a non-rigid shape [38]. Most of the Fourier-based techniques utilize the magnitude of the Fourier transform and ignore the phase information in order to achieve rotation invariance as well as make the descriptors independent from the starting point. However, Bartolini et al. [33] have described a technique in which the phase information is exploited. Kunttu et al. [20] have introduced a multiscale Fourier descriptor for shape-based image retrieval. These descriptors are presented in multiple scales by adopting the wavelet and Fourier transform, which improves the shape retrieval accuracy of the traditional Fourier descriptors. Recently, El-ghazal et al. have described curvature-based Fourier descriptors (CBFD) for shape retrieval. The invariant descriptors of the CBFD technique are derived from the 2-D Fourier transform of the curvature-scale image obtained from the image contour [43]. In general, Fourier descriptors is a promising boundary-based approach for shape-based image retrieval, as the FDs are based on the well-known Fourier theory, making them easy to compute and simple to normalize and interpret. In addition, the computational efficiency and compactness of FDs allow them to be well suited for online image retrieval. To derive the FDs of an image, the 2-D image is converted to 1-D signature. Many signatures have been proposed in the literature [18,38,44]. The complex coordinates, the radial distance, and the triangular centroid area (TCA) are some notable signatures available to derive FDs. Fourier descriptors derived from different signatures can have significantly different effects on the result of retrieval [38]. In this paper, we propose a novel signature, namely farthest point distance (FPD), and compare it with other frequently used shape signatures. The paper is organized as follows: Section 2 gives a brief description of commonly used shape signatures.

ARTICLE IN PRESS 574

A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

Section 3 introduces the proposed signature. Section 4 explains how the Fourier transform along with a normalization scheme is applied to the shape signatures. Section 5 presents comparative studies to compare the proposed signature with eight commonly used shape signatures. In addition, the proposed signature is compared with two notable standard techniques: the curvature scale space and the Zernike moments. Conclusions derived from the study and suggestions for future work are presented in Section 6.

in the following sections. Moreover, the basic idea for each of the eight commonly used signatures is graphically depicted in Fig. 1. 2.1. Radial distance The radial distance represents the distance between the boundary points (x(u), y(u)) and the centroid (xc, yc) of the shape [37,38] qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rðuÞ ¼ ððxðuÞ  xc Þ2 þ ðyðuÞ  yc Þ2 Þ (1)

2. Shape signatures

The concept of using the centroid (xc, yc) is to render the signature invariant for translation. The centroid is computed as follows:

A shape signature z(u) is a 1-D function representing 2D areas or boundaries, usually describing a unique shape and capturing the perceptual feature of the shape. Shape signatures are either real or complex. Brief descriptions of the most commonly used shape signatures are presented

xc ¼

1 1 NX xðuÞ; N u¼0

y

yc ¼

1 1 NX yðuÞ N u¼0

(2)

a b

CLD(u)

x

y

x1(u),y1(u)

P2=(x2,y2)

x2(u),y2(u)

P2

ϕ x

P1

o

o

P1=(x1,y1)

Imaginary

y

Imaginary x1(u),y1(u)

x

θ

RD(u

)

RD(u

)

CC(u)

Real

x2(u),y2(u) ϕ Real

Fig. 1. The basic idea of eight commonly used signatures (a) the original shape; (b) radial distance RD(u); (c) chord-length distance ,CLD(u); (d) angular function, AF(u); (e) triangular centroid area, TCA(u); (f) triangular area representation, TAR(u); (g) complex coordinates, CC(u); (h) polar coordinates, PC(u); and (i) angular radial coordinates, ARC(u).

ARTICLE IN PRESS A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

Other common names for this signature are the centroid distance and the radius vector. The basic idea of the radial distance signature is graphically depicted in Fig. 1b. 2.2. Chord-length distance (CLD) The chord-length distance (CLD) is derived from shape boundary without using any reference point. It is the distance between a and another boundary point b such that ab is perpendicular to the tangent vector at a, as shown in Fig. 1c. If there are two candidates, the one with its chord within the shape is chosen [38]. 2.3. Angular function (AF) The Angular function j(u) represents changes in the shape boundary directions. These changes are important to the human visual system and can be used as a shape signature. The angular function j(u) at different points of a shape boundary is defined as   yðuÞ  yðu  wÞ jðuÞ ¼ arctan (3) ðxðuÞ  xðu  wÞ where w is a step of selected length. The normalized variant of angular function is defined by Zahn and Roskies [18]. The basic idea of the angular function signature is graphically described in Fig. 1d. 2.4. Triangular centroid area The triangular centroid area formed by two boundary points (x1(u), y1(u)), (x2(u), y2(u)) and the centre of the object is changed along with the change in the points of the boundary. This area is denoted (see Fig. 1e) as a shape signature and can be calculated as follows [38]: TCAðuÞ ¼ 12jx1 ðuÞy2 ðuÞ  x2 ðuÞy1 ðuÞj

(4)

2.5. Triangular area representation (TAR) The triangular area representation signature is computed by calculating the area formed by three points on the shape boundary [45]. The TAR signature is different from the triangular centroid area signature, which calculates the area formed by two boundary points and the centre of the object. In the TAR signature, the area of the triangle having two sides of equal length s and formed by three points P(u–s), P(u) and P(u+s) is calculated as follows[45]: TARðu; sÞ ¼ 12½px ðuÞpy ðu  sÞ þ px ðu þ sÞpy ðu  sÞ þ px ðu  sÞpy ðuÞ  px ðu þ sÞpy ðuÞ  px ðu  sÞpy ðu þ sÞ þ px ðuÞpy ðu þ sÞ

(5) In the case of tracing the contour in a counter-clockwise direction, the convex, concave, and straight line have a positive, negative and zero area, respectively [45]. Fig. 1f depicts the three different types of area of a TAR signature.

575

2.6. Complex coordinates The complex coordinates signature is formed by treating each coordinate pair ((x(u), y(u)), u ¼ 0,1,2, y, N) of pixels on xy-plane of a particular shape as a complex number as follows [37,46]: CCðuÞ ¼ ðxðuÞ  xc Þ þ jðyðuÞ  yc Þ

(6)

The complex coordinate is translation invariant because of the subtraction of the centroid from the boundary coordinates of the shape. Another frequently used name for this signature is position function. The basic idea of the complex coordinates signature is graphically described in Fig. 1g. 2.7. Polar coordinates (PC) The polar coordinates signature is formed by combining the radial distance signature RD(u) and the polar angle y(u) signature as shown in Fig. 1h. The result of the combination is another complex-valued signature named polar coordinate signature [44] PCðuÞ ¼ RDðuÞ þ jyðuÞ

(7)

2.8. Angular radial coordinates (ARC) The angular radial coordinates signature is similar to the polar coordinates signature; however, the angular function j(u) is used instead of the polar angle [44] ARCðuÞ ¼ RDðuÞ þ jjðuÞ

(8)

Fig. 1i depicts the basic idea of generating of the above signature. 3. Farthest point distance signature In this section, we present farthest point distance, a novel technique that exploits some differential properties of shapes, such as corner points and transition details. The farthest point distance is developed to overcome some of the shortcomings of existing techniques, such as ignoring distances between corners. The value of the signature at a given point a is defined as the distance between a and the point farthest from it, say b. The signature is calculated by adding the Euclidean distance between point a and the centroid c to that between the centroid c and the farthest point. Assuming that shape boundary coordinates (x(u), y(u)), u ¼ 0,1, y, N–1 have been extracted, the FPD signature at boundary point (x(u), y(u)) is calculated as follows [47]: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi FPDðuÞ ¼ ð½xðuÞ  xc 2 þ ½yðuÞ  yc 2 Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ ð½xfp ðuÞ  xc 2 þ ½yfp ðuÞ  yc 2 Þ (9) where (xfp(u), yfp(u)) is the farthest point from (x(u), y(u)), and (xc, yc) is the centroid of the shape. Fig. 2 depicts how the distance from point a to its farthest point b is calculated. This signature captures

ARTICLE IN PRESS 576

A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

distances between corners. Transition points and corners are elements of focus to the human visual system. In most shape-matching techniques, corner points play a major role. Fig. 3 depicts examples of FPD signatures for three randomly selected classes. It can be seen from the figure that shapes within the same class have similar FPD signatures. 4. Generation of Fourier descriptors The direct use of shape signatures in spatial domain, as when applying dynamic programming [33,35], leads to a high matching cost because of the complex normalization of rotation invariance. Therefore, the discrete Fourier transform (DFT) is used to simplify the matching stage in the retrieval process and reduce the noise sensitivity of the used signature. The Fourier descriptor is a powerful tool for shape analysis and has many applications [41,48–52]. The idea of the FD is to use the Fourier transformed boundary as a shape feature [18,19,53]. The discrete Fourier transform of an arbitrary signature z(u) is

y

a

x(u),y(u)

x

defined as follows [46]: an ¼

n ¼ 0; 1 . . . N  1

b Fig. 2. The basic concept of the farthest point distance (FPD) signature.

(10)

The coefficients an (n ¼ 0,1, y, N–1) are called the Fourier descriptors of the shape, and are denoted by FDn. The Fourier descriptors are invariant to rotation, scale, and translation. The rotation invariance of the FDs can be established by taking into consideration the magnitude values of the descriptors and ignoring the phase information. Scale invariance of the FDs for real-valued signatures can be achieved by dividing the magnitude of the first half descriptors by the DC components (FD0) [54] F¼

  jFDN=2 j jFD1 j jFD2 j ; ;...; jFD0 j jFD0 j jFD0 j

(11)

The reason for choosing the FD0 as a factor in scale normalization is that it represents the average energy of the signature. Moreover, the FD0 is usually the largest coefficient, and consequently the rang of the values of the normalized descriptors should be [0 1] [38]. For the complex-valued signatures, the DC component depends only on the position of the shape, so it cannot be used to describe shapes. For the scale normalization, the magnitude of the other descriptors is divided by the FD1 descriptor as follows [19]: F¼

xfp, yfp

X 1 N1 zðuÞej2pnu=N N u¼0



 jFD2 j jFD3 j jFDN1 j ; ;...; jFD1 j jFD1 j jFD1 j

(12)

The FPD signature is translation invariant because it is obtained with respect to the centroid of the shape. Proving that all the other signatures are translation invariant is a straightforward process. A systematic review of various

Fig. 3. The FPD signatures for shapes from three randomly selected classes.

ARTICLE IN PRESS A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

signatures and their invariant properties are summarized in [55]. The similarity measure between two shapes indexed with M normalized Fourier descriptors is the Euclidean distance D between the normalized Fourier descriptors of the query image Fq and the normalized Fourier descriptors of an image from the database Fd vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M uX q d q d (13) DðF ; F Þ ¼ t ðf i  f i Þ2 i¼1

5. Comparative study To obtain a comprehensive comparison we compare the proposed signature with eight commonly used signatures that used to derive the Fourier descriptors. Moreover, the proposed signature is compared with the two notable standard techniques: the curvature scale space and the Zernike moments. These two techniques are used in our comparison because in MPEG-7 standard, the curvature scale space and Zernike moment descriptor have been adopted as the contour-based shape descriptor and region-based shape descriptor, respectively [56]. 5.1. MPEG-7 databases Due to the lack of a standard database, the evaluation of the different shape signatures is not an easy task. Researchers in this field tend to develop their own databases, which are often limited in size or application scope or both. The MPEG-7 developers have set up a database of a reasonable size and generality [17]. It consists of three main sets: set A, set B, and set C. Set A consists of two subsets, A1 and A2, which are used to test invariance to scaling and rotation, respectively. Subset A1 includes 420 shapes: 70 primary shapes and 5 shapes derived from each primary shape with scale factors ranging from 0.1 to 2. Subset A2 includes 420 shapes: 70 primary shapes and 5 subset shapes generated by rotating



36°

577

the primary shape with angles ranging from 91 to 1501. Sample shapes from set A of the MPEG-7 database are shown in Fig. 4. Set B consists of 1400 images that are classified into 70 classes, each class having 20 images. Set B is used to test for similarity-based retrieval performance, and to test the shape descriptors for robustness to various arbitrary shape distortions that include rotation, scaling, arbitrary skew, stretching, defection, and indentation. Samples of shapes from set B of the MPEG-7 database are shown in Fig. 5. Set C consists of 200 affine transformed Bream fish and 1100 marine fish that are unclassified. The 200 Bream fish are frames extracted from a short video clip, with a Bream fish swimming. This set is used to test the shape descriptors for robustness to non-rigid object distortions. Usually, the first frame of the video is used as a query, and the number of the Bream shapes in the top 200 retrieved shapes is counted. However, in our experiment the 200 Bream fish are designated as queries in order to obtain a comprehensive comparison. Samples of the images from this database are depicted in Fig. 6. In this paper, a noise database (set D) is created to test the performance of the proposed technique in presence of noise. This set consists of 420 shapes: 70 primary shapes and 5 shapes derived from each primary shape by adding random Gaussian noise to the boundary of the primary shape. The signal-to-noise ratios for the distorted shapes are 40, 35, 30, 25, and 20 dB. Sample shapes from the proposed set D database are shown in Fig. 7. 5.2. Evaluation measure To evaluate the performance of the different techniques with respect to image retrieval, a performance measure is required. The precision and recall measures are the most commonly used measures and deemed appropriate for measuring retrieval performance on classified datasets. If A is the number of relevant retrieved shapes, B the total number of retrieved shapes, and C the

45°

90°

150°

The rotated versions of the primary shape

A primary shape

0.1

0.2 0.25 0.3 The scaled versions of the primary shape

2.0

Fig. 4. A primary shape from set A of the MPEG-7 database and its rotated and scaled versions.

ARTICLE IN PRESS 578

A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

Fig. 5. Samples of shapes from set B of the MPEG-7 database.

Fig. 6. Samples of shapes from set C of the MPEG-7 database.

Fig. 7. Samples of shapes from set D of the proposed distorted database.

number of relevant shapes in the whole used database, then the precision and recall are defined as A/B and A/C, respectively. Precision measures the retrieval accuracy, whereas recall measures the capability to retrieve relevant items from the database [38]. The precision value of a specific recall is the average of the precision values of all the database shapes for that recall.

5.3. Farthest point distance signature performance comparison 5.3.1. Performance comparison of the FPD signature with other signatures The proposed farthest distance signature is compared with the most popular and best performing

ARTICLE IN PRESS A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

signatures [38]. These signatures include the radial distance, the triangular centroid area, the triangular area representation, the complex coordinate, the chord-length distance [38], the angular function, the polar coordinates [44], and the angular radial coordinates signatures [44]. Set B is selected to evaluate the performance of the proposed signature against other signatures since it includes all possible situations in shape distortions and variability. To evaluate the performance of the proposed signature, experiments were conducted using set B of the MPEG-7 database. All shapes in the following experiments are resampled to have 128 points. Selecting a small number of points will affect the retrieval performance for the proposed method as well as the other techniques. On the other hand, if the selected number of points is too large, the retrieval process will require more processing time and more storage space. The 128-sampling points has been carefully selected to be the smallest rate that can be used without introducing sampling distortion. For consistency and fair comparison, the number of descriptors is limited to 63 for all methods. From Table 1, it can be seen that the proposed FPD signature’s performance is the highest, whereas the AF signature’s performance is the lowest. The RD and FPD signatures show comparable results in the case of low recall; however, in the case of high recall, the FPD performs better than the RD. This improvement is due to the tendency of the FPD to capture farthest corners. The high performance of the FPD enables the retrieval of complex shapes as well as simple ones. The AF, PC, and ARC signatures do not perform as well as the FPD, RD, and ARC because j(u) and y(u), which are utilized by the AF, PC, and ARC signatures, are very sensitive to the change in shape boundaries. The CC and TAR signatures capture local information of the shape boundary, while FPD, RD, and TCA signatures capture both local and global information. The advantage of the TAC signatures over the FPD and the RD signatures is its robustness to affine transform; however, the FPD and RD signatures outperform the TAC signature.

Table 1 The average precision for low and high recalls for the FPD and other signatures using set B. Signature

The proposed signature (FPD) RD TCA CC PC ARC TAR CLD AF

Low recall

High recall

The average precision for recall rates p50% (%)

The average precision for recall rates 450% (%)

75.82

42.13

75.69 73.40 64.76 64.40 58.93 58.70 57.80 57.39

41.77 38.50 22.59 35.12 26.83 23.54 24.00 27.88

579

5.3.2. Performance comparison of the FPD signature with the CSS and ZM techniques In another set of experiments, the proposed farthest point distance signature is combined with four simple global descriptors (SGD): solidity (S), circularity (C), eccentricity (E), and aspect ratio (Ar) [17]. The simple global descriptors enhance the ability of the proposed signature to capture global shape information. The distance (DFPD), obtained by Eq. (13), of the Fourier descriptors obtained from the FPD signature is directly added to the average distance (DSGD) of the simple global descriptors. The total distance between a query image (q) and an image from the database (d) is expressed follows: Dðq; dÞ ¼ DFPD ðq; dÞ þ DSGD ðq; dÞ

(14)

where DSGD ðq; dÞ ¼

DS ðq; dÞ þ DC ðq; dÞ þ DE ðq; dÞ þ DAR ðq; dÞ 4

(15)

DS(q,d) ¼ |Sq–Sd|/max(Sd) is the solidity distance, DC(q,d) ¼ |Cq–Cd|/max(Cd) is the circularity distance, DE(q,d) ¼ |Eq–Ed|/max(Ed) is the eccentricity distance, and DAR(q,d) ¼ |ARq–ARd|/max(ARd) is the aspect ratio distance. The combined signatures are then compared with the Zernike moments [10], and the curvature scale space (in this case the CSS is combined with the four simple global descriptors to maintain consistency) [22]. Since the proposed signature, CSS and ZM techniques utilize different criteria for normalizing shapes with respect to scale and rotation, all sets of the MPEG-7 database were used to obtain a comprehensive comparison. The number of the features for the CSS is not constant, while the number of features for the proposed method and Zernike moments technique can be specified in advance. However, selecting a small number of features will affect retrieval performance for the proposed method and the Zernike moments technique. On the other hand, if the selected number of features is too large, the system will require more processing time and more storage space. Since the proposed signature is a real-valued signature and all image contours in the database are resampled to have 128 points, the maximum number of the Fourier descriptors derived from the proposed signature is 63 descriptors. Zhang and Lu [38] have found that 10 Fourier descriptors are sufficient to describe a shape. Thus, in the first experiment, the performance of the first 14 features of the proposed technique (10 FDs plus 4 simple global descriptors) is compared with the performance of the Zernike moments technique with 14 features, corresponding to the Zernike moments up to sixth order. Table 2 shows the number of features for Zernike moments at each order and the accumulated number of features up to each order. The recall precision curves of retrieval (for set B) obtained by the proposed and Zernike moments techniques for 14 features are shown in Fig. 8. It is obvious from Fig. 8 that the performance of the Zernike moments technique is much lower than that of the proposed technique. The performance of the Zernike moments technique can be improved using a higher order of Zernike moments.

ARTICLE IN PRESS 580

A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

Table 2 The number of features for Zernike moments at each order and the accumulated number of features up to each order. Order (n)

Zernike moments of order n with repetition m (Anm)

The number of moments in each order n

The accumulated number of features up to each order

0 1

A0, 0 A1, 1

1 1

2 3 4 5 6 7 8 9

A2, 0, A2, 2 A3, 1, A3, 3 A4, 0, A4, 2, A4, 4 A5, 1, A5, 3, A5, 5 A6, 0, A6, 2, A6, 4, A6, 6 A7, 1, A7, 3, A7, 5, A7, 7 A8, 0, A8, 2, A8, 4, A8, 6, A8, 8 A9, 1, A9, 3, A9, 5, A9, 7, A9, 9

2 2 3 3 4 4 5 5

Not counted because it is used for scale invariance Not counted because it is used for translation invariance 2 4 7 10 14 18 23 28

100 90 80

Precision

70 60 50 40 30 20 10

The proposed technique with 14 features The ZM technique with 14 features

0 0

10

20

30

40

50 Recall

60

70

80

90

100

Fig. 8. Precision–recall curves of the proposed and ZM techniques with 14 features.

To select a number of features that provide a good compromise between the retrieval performance and the dimension of the features, the average retrieval rates of the first top 20 retrieved shapes of the MPEG-7 database (set B) is calculated at different numbers of features for the proposed and the Zernike moments techniques. Fig. 9, which shows these average retrieval rates, reveals that the average retrieval rates of the proposed technique are higher than those of the Zernike moments technique at the same number of features. Moreover, it is interesting that there is no considerable improvement in the performance of the proposed technique after the first 15 features (this includes the four simple global descriptors). Also, Fig. 9 reveals that there is no considerable improvement in the performance of the ZM technique after the 28 features .To compromise between the number of features and the performance of the Zernike moments technique,

the first 28 features (corresponding to the ninth order of Zernike moments) are used in the next experiments. The recall and precision curves using the four sets of the MPEG-7 database and the created noisy database (set D) are plotted in Figs. 10–14, and the average precision rates for low and high recalls are shown in Tables 3–7. In addition, three different screen shots for three queries from set B of the MPEG-7 database are given in Fig. 15 where the top left shape of each screen shot is the query shape and the star symbol is used to indicate the irrelevant shape. The proposed signature outperforms the CSS technique, as can be deduced from Figs. 10–14. Fig. 10 shows that the ZM technique and the proposed signature yield comparable results in the low recall case; however, the proposed signature outperforms the ZM technique in the case of high recall with only 15 features. From Tables 1

ARTICLE IN PRESS A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

581

60

The average retrieval rates

50

40

30

20

10

The proposed technique The ZM technique

0 0

10

20

30

40 50 60 70 The Number of features

80

90

100

Fig. 9. The average retrieval rates at different number of features for the proposed and the ZM techniques.

100 90 80

Precision %

70 60 50 40 30

The proposed technique ZM technique CSS technique

20 10 0

10

20

30

40

50 60 Recall %

70

80

90

100

Fig. 10. Precision–recall curves of the FPD+SGD, CSS+SGD, and ZM techniques using set B.

and 3 we can see that the performance of the Fourier descriptors has been improved by adding only four simple global descriptors. These simple global descriptors enhance the ability of the proposed signature to capture global shape information.

Performance under scale invariance is tested using the database from subset A1, where the CSS gives the lowest accuracies in high and low recalls, as shown in Fig. 11, and ZM gives the highest accuracy in high recall. The FPD technique gives a result comparable to that of the ZM

ARTICLE IN PRESS 582

A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

100 90 80

Precision %

70 60 50 40 30 20 The proposed technique ZM technique

10

CSS technique

0 0

10

20

30

40

50 60 Recall %

70

80

90

100

Fig. 11. Precision–recall curves of the FPD+SGD, CSS+SGD, and ZM techniques using subset A1.

100 90 80

Precision %

70 60 50 40 30 20 The proposed technique ZM technique

10

CSS technique

0 0

10

20

30

40

50 60 Recall %

70

80

90

100

Fig. 12. Precision–recall curves of the FPD+SGD, CSS+SGD, and ZM techniques using subset A2.

technique in low recall. The ZM technique gives the best result in the scale invariance test because scale normalization is implied limiting the image to the unit circle.

As shown in Fig. 12, the combined FPD signature and ZM techniques give almost perfect results in the case of the rotation invariance test, whereas the CSS gives the

ARTICLE IN PRESS A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

583

100 90 80

Precision %

70 60 50 40 30 20 The proposed technique

10

ZM technique CSS technique

0 0

10

20

30

40

50 60 Recall %

70

80

90

100

Fig. 13. Precision–recall curves of the FPD+SGD, CSS+SGD, and ZM techniques using set C.

100 90 80

Precision %

70 60 50 40 30 20 The proposed technique ZM technique

10

CSS technique

0 0

10

20

30

40

50 60 Recall %

70

80

90

100

Fig. 14. Precision–recall curves of the FPD+SGD, CSS+SGD, and ZM techniques using set D.

lowest results. The CSS technique has compact features; however, its matching algorithm is very complex and fails to distinguish between objects within the same class but having different rotations.

In the case of the non-rigid object distortions database (set C), all three techniques give high accuracies in low recall as shown in Fig. 13 and Table 6. However, in the case of high recall, the CSS has the lowest accuracy, while the

ARTICLE IN PRESS 584

A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

Table 3 The average precision rates of low and high recalls for the proposed, ZM and CSS techniques using set B.

Table 7 The average precision rates of low and high recalls for the proposed, ZM and CSS techniques using set D.

Method

Method

The proposed technique ZM technique CSS technique

Low recall

High recall

The average precision for recall rates p50% (%)

The average precision for recall rates 450% (%)

81.16

49.15

80.88 78.62

43.94 41.81

Table 4 The average precision rates of low and high recalls for the proposed, ZM and CSS techniques using subset A1. Method

The proposed technique ZM technique CSS technique

Low recall

High recall

The average precision for recall rates p50% (%)

The average precision for recall rates 450% (%)

98.60

93.23

99.69 96.18

98.18 82.52

Table 5 The average precision rates of low and high recalls for the proposed, ZM and CSS techniques using subset A2. Method

The proposed technique ZM technique CSS technique

Low recall

High recall

The average precision for recall rates p50% (%)

The average precision for recall rates 450% (%)

100 100 99.25

99.89 100 95.55

The proposed technique ZM technique CSS technique

Low recall

High recall

The average precision for recall rates p50% (%)

The average precision for recall rates 450% (%)

99.85

96.08

99.09 96.65

93.91 73.77

technique gives the best performance in all cases. This is due to the fact that the proposed technique uses small number of Fourier descriptors (the first 11 descriptors) that correspond to low frequencies and ignores the higher order of the Fourier descriptors that are more sensitive to noise. From Fig. 15, it is noteworthy that not only has the proposed signature given better accuracies, but also all the irrelevant retrieved shapes have been ranked in the last row of each of the three examples. To compare the computational efficiency of the three techniques, the processing time in the matching stage on set B of the MPEG-7 database is computed using the same processor and software for all three. Matlab (version 7.0), running on Pentium IV CPU 2.6 GHz PC with memory 1.5 GB, has been used as a testing platform. Table 8 shows the number of features and the average processing time for each query in the matching stage for the proposed, ZM, and CSS techniques. Data in Table 7 reveal that the proposed technique has the lowest feature vector size and gives the lowest processing time, whereas the CSS having the highest time. Because many factors should be considered in order to align two peaks of the CSS features, the time of the CSS is high compared with the proposed and ZM techniques.

6. Conclusions Table 6 The average precision rates of low and high recalls for the proposed, ZM and CSS techniques using set C. Method

The proposed technique ZM technique CSS technique

Low recall

High recall

The average precision for recall rates p50% (%)

The average precision for recall rates 450% (%)

97.97

92.23

98.02 97.95

95.25 88.1

FPD technique shows a result similar to that of the ZM technique. In the case of noisy shapes it is clear that the performance each technique decreases by reducing the signal-to-noise ratio as shown in Fig. 14. The proposed

This paper presents a new shape signature for Fourier descriptors. The proposed signature is evaluated against several commonly used shape signatures. The performance of the proposed signature is examined based on several experiments using standard databases. The experimental results demonstrate that the proposed signature and the radial distance signature yield comparable results; however, the proposed signature (FPD) performs better in the case of high recall. This improvement is due to the fact that the FPD signature tends to capture corner information for each object, which are extreme shape points at which we naturally focus our visual attention. Furthermore, to enhance the ability of the proposed signature to capture global shape information, it is combined with four simple global descriptors. The performance of the signature is then compared with that of two commonly used techniques: the curvature

ARTICLE IN PRESS A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

585

Query shape

The Proposed Technique

ZM

CSS

Query shape

Query shape

Fig. 15. Retrieval of fork, apple, and cow shapes from set B of the MPEG-7 database (a) the proposed technique, (b) ZM technique, and (c) CSS technique.

Table 8 The average time required for each query in the matching stage using set B of the MPEG-7 database. Method

The number of features

The average time required for each query (s)

The proposed technique ZM technique CSS technique

15

0.0014040

28 Depends on the number of shape concavities (the average is 20)

0.0017645 2.1640000

scale space and the Zernike moments. The results show that the proposed signature outperforms the curvature scale space in both high and low recall in all sets of the MPEG-7 database. The results also show that the proposed signature performs better than the Zernike moments for the low recall and high recall in the most challenging database (set B), while maintaining comparable results for the low and high recalls in the other sets. Moreover, the feature size (15 descriptors) used by the proposed signature is almost half the size of the features (28 descriptors) used by Zernike moments technique. This low feature size renders the proposed signature computationally more efficient for large databases. The computational complexity will be investigated further in a future publication.

To make our results more conclusive, eight popular shape signatures are comprehensively studied and evaluated against the proposed FPD signature. The proposed FPD signature is also combined with four simple global descriptors and compared with the two standard region and boundary-based techniques using standard databases. The proposed FPD technique satisfies the six principles set by MPEG-7; good retrieval accuracy, compact features, general application, low computation complexity, robust retrieval performance, and hierarchical coarse to fine representation. The overall results demonstrate the effectiveness of the farthest point distance signature for image retrieval applications. References [1] K. Hirata, T. Kato, Query by visual example-content-based image retrieval, in: Third International Conference on Extending Database Technology. 1992, pp. 56–71. [2] D. Mumford, Mathematical theories of shape: do they model perception? in: SPIE Conference on Geometric Methods in Computer Vision. vol. 1570 San Diego, California, 1991, pp. 2–10. [3] I. Biederman, Recognition-by-components: a theory of human image understanding, Psychological Review 94 (1987) 115–147. [4] L. Schomaker, E.d. Leau, L. Vuurpijl, Using pen-based outlines for object-based annotation and image-based queries, Visual Information and Information Systems (1999) 585–592. [5] Z. Wang, Z. Chi, D. Feng, Shape based leaf image retrieval, in: IEE Proceedings of the Vision Image and Signal Processing. vol. 150, 2003, pp. 34–43. [6] M.E. Celebi, Y.A. Aslandogan, A comparative study of three moment-based shape descriptors, in: International Conference

ARTICLE IN PRESS 586

[7] [8] [9] [10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18] [19]

[20]

[21]

[22] [23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

A. El-ghazal et al. / Signal Processing: Image Communication 24 (2009) 572–586

on Information Technology: Coding and Computing. vol. 1, 2005, pp. 788–793. M. Hu, Visual pattern recognition by moment invariants, IRE Transactions on Information Theory IT–8 (1962) 115–147. J. Flusser, On the independence of rotation moment invariants, Pattern Recognition 33 (2000) 1405–1410. M. Teague, Image analysis via the general theory of moments, Journal of the Optical Society of America 70 (1980) 920–930. A. Khotanzad, Invariant image recognition by Zernike moments, IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (1990) 489–497. S.O. Belkasim, M. Shridhar, M. Ahmadi, Pattern recognition with moment invariants: a comparative study and new results, Pattern Recognition 24 (1991) 1117–1138. R. Mukundan, S.H. Ong, P.A. Lee, Image analysis by Tchebichef moments, IEEE Transactions on Image Processing 10 (2001) 1357–1364. D. Zhang, G. Lu, Shape-based image retrieval using generic Fourier descriptor, Signal Processing: Image Communication 17 (2002) 825–848. S. Li, M.-C. Lee, Effective invariant features for shape-based image retrieval, Journal of the American Society for Information Science and Technology 56 (2005) 729–740. A. Goshtasby, Description and discrimination of planar shapes using shape matrices, IEEE Transactions on Pattern Analysis and Machine Intelligence 7 (1985) 738–743. G. Lu, A. Sajjanhar, Region-based shape representation and similarity measure suitable for content based image retrieval, Multimedia Systems 7 (1999) 165–174. F. Mokhtarian, M. Bober, Curvature Scale Space Representation: Theory Application and MPEG-7 Standardization, first ed., Kluwer Academic Publishers, Dordrecht, 2003. T. Zahn, R.Z. Roskies, Fourier descriptors for plane closed curves, IEEE Transactions on Computers 21 (1972) 269–281. T.P. Wallace, P.A. Wintz, An efficient three dimensional aircraft recognition algorithm using normalized Fourier descriptors, Computer Graphics and Image Processing 13 (1980) 99–126. I. Kunttu, L. Lepisto, J. Rauhamaa, A. Visa, Multiscale Fourier descriptors for defect image retrieval, Pattern Recognition Letters 27 (2006) 123–132. F. Mokhtarian, A. Mackworth, Scale-based description and recognition of planar curves and two-dimensional shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1986) 34–43. S. Abbasi, F. Mokhtarian, J. Kittler, Curvature scale space image in shape similarity retrieval, Multimedia Systems 7 (1999) 467–476. S. Abbasi, F. Mokhtarian, J. Kittler, Enhancing CSS-based shape retrieval for objects with shallow concavities, Image and Vision Computing 18 (2000) 199–211. G. Chauang, C. Kuo, Wavelet descriptor of planar curves: theory and applications, IEEE Transactions on Image Processing 5 (1996) 56–70. R.B. Yadav, N.K. Nishchal, A.K. Gupta, V.K. Rastogi, Retrieval and classification of shape-based objects using Fourier, generic Fourier, and wavelet-Fourier descriptors technique: a comparative study, Optics and Lasers in Engineering 45 (2007) 695–708. T. Adamek, N.E. O’Connor, A multiscale representation method for nonrigid shapes with a single closed contour, IEEE Transactions on Circuits and Systems for Video Technology 14 (2004) 742–753. S. Junding, W. Xiaosheng, Chain code distribution-based image retrieval, in: International Conference on Intelligent Information Hiding and Multimedia Signal Processing China, 2006, pp. 139–142. S.R. Dubois, F.H. Glanz, An autoregressive model approach to twodimensional shape classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1986) 55–65. Y. Tao, G. WI, Delaunay triangulation for image object indexing: a novel method for shape representation, in: Seventh SPIE Symposium on Storage and Retrieval for Image and Video Databases San Jose, CA, 1999, pp. 631–642. E. Attalla, P. Siy, Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching, Pattern Recognition 38 (2005) 2229–2241. E.G.M. Petrakis, A. Diplaros, E. Milios, Matching and retrieval of distorted and occluded shapes using dynamic programming, IEEE

[32]

[33]

[34] [35]

[36]

[37]

[38] [39]

[40] [41]

[42]

[43]

[44]

[45]

[46] [47]

[48]

[49]

[50]

[51] [52] [53]

[54]

[55] [56]

Transactions on Pattern Analysis and Machine Intelligence 24 (2002) 1501–1516. N. Arica, F.T.Y. Vural, BAS: a perceptual shade descriptor based on the beam angle statistics, Pattern Recognition Letters 24 (2003) 1627–1639. I. Bartolini, P. Ciaccia, M. Patella, WARP: accurate retrieval of shapes using phase of Fourier descriptors and time warping distance, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005) 142–147. L.J. Lateckia, R. Lakaempera, D. Wolter, Optimal partial shape similarity, Image and Vision Computing 23 (2005) 227–236. N. Alajlan, I.E. Rube, M.S. Kamel, G. Freeman, Shape retrieval using triangle-area representation and dynamic space warping, Pattern Recognition 40 (2007) 1911–1920. E. Persoon, K.S. Fu, Shape discrimination using Fourier descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1986) 388–397. D.S. Zhang, G. Lu, A comparative study of curvature scale space and Fourier descriptors, Journal of Visual Communication and Image Representation 14 (2003) 41–60. D.S. Zhang, G. Lu, Study and evaluation of different Fourier methods for image retrieval, Image and Vision Computing 23 (2005) 33–49. G. Eichmann, C. Lu, M. Jankowski, R. Tolimieri, Shape representation by Gabor expansion, in: Hybrid Image and Signal Processing II. vol. 1297, 1990, pp. 86–94. K. Arbter, Affine-invariant Fourier descriptors, in: From Pixels to Feature, Elsevier Science, Amsterdam, The Netherlands, 1989. K. Arbter, W.E. Snyder, H. Burkhardt, G. Hirzinger, Application of affine-invariant Fourier descriptors to recognition of 3-D objects, IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (1990) 640–647. A.E. Oirrak, M. Daoudi, D. Aboutajdin, Affine invariant descriptors using Fourier series, Pattern Recognition Letters 23 (2002) 1109–1118. A. El-ghazal, O. Basir, S. Belkasim, A novel curvature-based shape Fourier descriptor, accepted in the 15th IEEE International Conference on Image Processing, 2008. I. Kunttu, L. Lepisto¨, Shape-based retrieval of industrial surface defects using angular radius Fourier descriptor, IET Image Processing 1 (2007) 231–236. I. El-Rube, N. Alajlan, M. Kamel, M. Ahmed, G. Freeman, MTAR: a robust 2D shape representation, International Journal of Image and Graphics 6 (2006) 421–443. R.C. Gonzalez, R.E. Woods, Digital Image Processing, AddisonWesley, Reading, MA, 2002. A. El-Ghazal, O. Basir, S. Belkasim, A new shape signature for Fourier descriptors, in: the 14th IEEE International Conference on Image Processing San Antonio, Texas, USA, 2007, pp. 161–164. S. Derrode, M. Daoudi, F. Ghorbel, Invariant content-based image retrieval using a complete set of Fourier-Mellin descriptors, in: the IEEE International Conference on Multimedia Computing and Systems, 1999, pp. 877–881. A.S. Aguado, M.E. Montiel, M.S. Nixon, Parameterising arbitrary shapes via Fourier descriptors for evidence-gathering extraction, Computer Vision and Image Understanding 69 (1998) 202–221. B.S. Reddy, B.N. Chatterji, An fft-based technique for translation, rotation, and scale-invariant image registration, IEEE Transactions on Image Processing 5 (1996) 1266–1271. B. Pinkowski, Multiscale Fourier descriptors for classifying semivowels in spectrograms, Pattern Recognition 26 (1993) 1593–1602. G. Granlund, Fourier preprocessing for hand print character recognition, IEEE Transactions on Computers 21 (1972) 195–201. E. Persoon, K.S. Fu, Shape discrimination using Fourier descriptors, IEEE Transactions on System, Man, and Cybernetics 7 (1977) 170–179. H. Kauppinen, T. Seppa¨nen, M. Pietika¨inen, An experimental comparison of autoregressive and Fourier-based descriptors in 2D shape classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (1995) 201–207. V.V. Kindratenko, On using functions to describe the shape, Journal of Mathematical Imaging and Vision 18 (2003) 225–245. D. Zhang, G. Lu, Evaluation of MPEG-7 shape descriptors against other shape descriptors, Multimedia Systems 9 (2003) 15–30.