Pattern Recognition 46 (2013) 2202–2219
Contents lists available at SciVerse ScienceDirect
Pattern Recognition journal homepage: www.elsevier.com/locate/pr
A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments S. Padam Priyal n, Prabin Kumar Bora Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati 781039, Assam, India
a r t i c l e i n f o
a b s t r a c t
Article history: Received 5 August 2011 Received in revised form 28 December 2012 Accepted 30 January 2013 Available online 8 February 2013
Static hand gesture recognition involves interpretation of hand shapes by a computer. This work addresses three main issues in developing a gesture interpretation system. They are (i) the separation of the hand from the forearm region, (ii) rotation normalization using the geometry of gestures and (iii) user and view independent gesture recognition. The gesture image comprising the hand and the forearm is detected through skin color detection and segmented to obtain a binary silhouette. A novel method based on the anthropometric measures of the hand is proposed for extracting the regions constituting the hand and the forearm. An efficient rotation normalization method that depends on the gesture geometry is devised for aligning the extracted hand. These normalized binary silhouettes are represented using the Krawtchouk moment features and classified using a minimum distance classifier. The Krawtchouk features are found to be robust to viewpoint changes and capable of achieving good recognition for a small number of training samples. Hence, these features exhibit user independence. The developed gesture recognition system is robust to similarity transformations and perspective distortions. It can be well realized for real-time implementation of gesture based applications. & 2013 Elsevier Ltd. All rights reserved.
Keywords: Hand extraction Hand gesture Krawtchouk moments Minimum distance classifier Rotation normalization Skin color detection View and user-independent recognition
1. Introduction Human–computer interaction (HCI) is an important activity that forms an elementary unit of intelligence based automation. The very common HCI is based on the use of simple mechanical devices such as the mouse and the keyboard. Despite familiarity, these devices inherently limit the speed and naturalness of the interaction between man and the machine. The ultimate goal of HCI is to develop interactive computer systems that are nonobtrusive and emulate the ‘natural’ way of interaction among humans. The futuristic technologies in intelligent automation attempt to incorporate communication modalities like speech, hand writing and hand gestures with HCI. The development of hand gesture interfaces finds successful applications in sign-totext translation systems, robotics, video/dance annotations, assistive systems, sign language communication, virtual reality and video based surveillance. The hand gesture interfaces are based on the hand shape (static gesture) or the movement of the hand (dynamic gesture). The HCI interpretation of these gestures requires proper means by which the dynamic and/or static configurations of the hand could be
n
Corresponding author. Tel.: þ91 361 258 2502; fax: þ 91 361 258 2542. E-mail addresses:
[email protected],
[email protected] (S. Padam Priyal),
[email protected] (P.K. Bora). 0031-3203/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.patcog.2013.01.033
properly defined to the machine. Hence, computer vision techniques in which one or more cameras are used to capture the hand images have evolved. The methods based on these techniques are called vision based methods. The availability of fast computing and the advances in computer vision algorithms have led to the rapid growth in the development of vision based gestural interfaces. Many reported works on static hand gesture recognition have also focused in incorporating the dynamic characteristics of the hand. However, the level of complexity in recognizing the hand posture is comparatively high and recovering the hand shape is difficult due to variation in size, rotation of the hand and the variation of the viewpoint with respect to the camera. The approaches to hand shape recognition are based on the 3D modeling of the hand or using 2D image models like the image contour and the silhouette. The computational cost in fully recovering the 3D hand/arm state is very high for real-time recognition and slight variations in the model parameters greatly affect the system performance [1]. By contrast, the processing of 2D image models involves low computational cost and high accuracy for a modest gesture vocabulary [1]. Thus, the 2D approaches are well pertinent for real time processing. The general approach to vision based hand shape recognition is to extract a unique set of visual features and match them to a pre-defined representation of the hand gesture. Therefore, the important factor in developing the gesture recognition system is the accurate representation of the hand shapes. This step is usually known as the feature extraction in pattern recognition.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
The features are derived either from the spatial domain or from the transform domain representation of the hand shapes. The extracted features describing the hand gestures can be classified into two groups: contour based features and the region based features. The contour based features correspond to the information derived only from the shape boundary. Common contour based methods that are used for shape description are Fourier descriptors, shape signatures, curvature scale space and chain code representations [2]. Hausdorff distance [3] and shape context [2] are correspondence based matching techniques in which the boundary points are the features representing the shape. The region based features are global descriptors derived by considering the entire pixels constituting the shape region. The common region based methods include moment functions and moment invariants, shape matrices, convex hulls and medial axis transforms [2]. Similarly, the spatial-domain measures like the Euclidean distance, the city-block distance and the image correlation are used for region based matching in which the pixels within the shape region are considered as features. The efficiency of these features is generally evaluated based on the compactness in representation, the robustness to spatial transformations, the sensitivity to noise, accuracy in classification, low computational complexity and the storage requirements [2]. In this context, the moments based representations are preferred mainly due to their compact representation, invariance properties and the robustness to noise [4]. The moments also offer the advantages of reduced computational load and database storage requirements. Hence, the moment functions are among the robust features that are widely used for shape representation and find successful applications in the field of pattern recognition which involves archiving and fast retrieval of images [5]. Recently the discrete orthogonal moments like the Tchebichef moments and the Krawtchouk moments were introduced for image analysis [6,7]. It is shown that these moments provide higher approximation accuracy than the existing moment based representations and are potential features for pattern classification. Hence, in this work we have proposed the classification of static hand gestures using Krawtchouk moments as features. The objective of this work is to study the potentiality of the Krawtchouk moments in uniquely representing the hand shapes for gesture classification. Hence, the experiments are performed on a database consisting of gesture images that are normalized for similarity variations like scaling, translation and rotation. The performance of the Krawtchouk moments is compared with the geometric and the Zernike moments based recognition methods. The other main issues considered in developing the gesture recognition system are the (i) identification of the hand region and (ii) normalization of the rotation changes. The identification of the hand region involves separating the hand from the forearm. The lack of gesture information in the forearm makes it redundant and its presence increases the data size. In most of the previous works, the forearm region is excluded by either making the gesturers to wear full arm clothing or by limiting the forearm region into the scene while acquisition. However, such restrictions are not suitable in real-time applications. The orientation of the acquired gesture changes due to the angle made by the gesturer with respect to the camera and vice-versa. This work concentrates on vision based static hand gesture recognition considering the afore-mentioned problems. In [8], the Krawtchouk moments are introduced as features for gesture classification. The performance of Krawtchouk moments is compared with that of a few other moments like the geometric, the Zernike and the Tchebichef moments. It is shown that the Krawtchouk moments based representation of hand shapes gives high recognition rates. The analysis is performed on hand regions that are manually extracted and corrected for rotation changes.
2203
This work presents a detailed gesture recognition system that evaluates the performance of the Krawtchouk moment features on a database that consists of 4230 gesture samples. We propose novel methods based on the anthropometric measures to automatically identify the hand and its constituent regions. The geometry of the gesture is characterized in terms of the abducted fingers. This gesture geometry is used to normalize for the orientation changes. These proposed normalization techniques are robust to similarity and perspective distortions. The main contributions in this work are:
1. A rule based technique using the anthropometric measures of the hand is devised to identify the forearm and the hand regions. 2. A rotation normalization method based on the protruded/ abducted fingers and the longest axis of the hand is devised. 3. A static hand gesture database consisting of 10 gesture classes and 4230 samples is constructed. 4. A study on the Krawtchouk moment features in comparison to the geometric and the Zernike moments for viewpoint and user invariant hand gesture recognition is performed. The rest of the paper is organized as follows: Section 2 presents a summary of the related works in static hand gesture recognition. Section 3 gives the formulation for Krawtchouk and other considered moments. Section 4 provides an overview of the proposed gesture analysis system in detail. Experimental results are discussed in Section 5. Section 6 concludes the paper mentioning the scope for future work.
2. Summary of the related works The primary issues in hand gesture recognition are: (i) hand localization, (ii) scale and rotational invariance and (iii) viewpoint and person/user independence. Ong and Ranganath [9] presented a thorough review on hand gesture analysis along with the insight into problems associated with it. Earlier works assumed the gestures to be performed in a uniform background. This required a simple thresholding technique to obtain the hand silhouette. For a non-uniform background, skin color detection is the most popular and the general method for hand localization [1,10–14]. The skin color cues are combined with the motion cues [10,12,13] and the edge information [14] for improving the efficiency of hand detection. Segmented hand images are usually normalized for the size, orientation and illumination variations [15–17]. The features can be extracted directly from the intensity images or the binary silhouettes or the contours. In [18,19], the orientation histograms are derived from the intensity images. These histograms represent summarized information on the orientations of the hand image and are shown to be illumination invariant. They are however, rotation variant. Triesch et al. [20] have classified hand postures using elastic graph matching. The system is designed to efficiently identify gestures in a complex background. It is sensitive to geometric distortions that arise due to the variations in the hand anatomy and the viewpoint. The advantages of the matching procedure are its user independence and the robustness to complex environments. In [1,15], local linear embedding (LLE) is introduced to map the high dimensional data to a low dimensional space in such a way as to preserve the relationship between neighboring points. Each point in the low dimensional space is approximated by a linear combination of their neighbors. The approach is invariant to scale and translation but sensitive to rotation. Just et al. [21] used the
2204
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
modified census transform to derive the hand features both in complex and uniform backgrounds. The system does not require a hand segmentation procedure. However, it has to be trained also with the gestures in a complex background including small variations in the scale and rotation. This increases the complexity of the training procedure and demands a larger training set. Amin and Yan [17] derived the Gabor features from the intensity hand images. The feature set is reduced in dimensionality using the principal component analysis (PCA) and the classification is performed with fuzzy C-means clustering. The images are initially scale normalized by resizing them to a fixed size. The rotation correction is achieved by aligning the major axis of the forearm along 901 with respect to the horizontal axis of the image. Similarly, Huang et al. [22] have also employed Gabor-PCA features for representing the hand gestures. They estimate the orientation of the hand gestures using the Gabor filter responses. The estimated angle is used to correct the hand pose into an upright orientation [22]. The Gabor-PCA features are classified using the support vector machine (SVM) classifier. The geometric moment invariants derived from the binary hand silhouettes form the feature set in [23,24]. The direct hand features such as the number of protruded fingers, the distance along with angle between the fingers and the palm region are used for gesture representation in [25–28]. Hu’s moment invariants are used as a feature set in [29]. These methods are sensitive to variations in the hand anatomy and the viewpoint distortions. Chang et al. [30] compute the Zernike and the pseudo-Zernike moment features for representing the hand postures. They decompose the binary hand silhouette into the finger and the palm parts. The decomposition allows the features to be localized with respect to the palm and the finger regions. The Zernike and the pseudo-Zernike moment features are derived separately for both the regions with higher weights given to the finger features during recognition. Gu and Su [31] employed a multivariate piecewise linear decision algorithm to classify the Zernike moment features obtained from the hand postures. The system is trained to be user and viewpoint independent. Boundary-based representations include invariant features derived from the Fourier descriptors (FD), the localized contour sequences and the curvature scale space (CSS). The Fourier descriptors in terms of the discrete Fourier transform (DFT) are obtained from the complex representation of the closed boundary points. These descriptors are one of the efficient representations and are invariant to rotation. However, the matching algorithm is sensitive to the starting point variations in contour extraction [32,33]. In order to compensate for starting point variations, the contour is traced from a fixed point in [33]. The distance between DFT coefficients of two different curves is computed using the modified Fourier descriptor (MFD) method. The recognition efficiency depends on the choice of number of Fourier descriptors. Gupta and Ma [34] derive the localized contour sequences for representing the hand contours. The contour representation is sensitive to shifts in the starting point. Hence, the invariance is incorporated into the classification stage by determining the position of best match using the circular shift operation. The CSS proposed in [35] is an efficient technique for extracting curvature features from an input contour at multiple scales. The CSS consists of large peaks that represent the concavities in the image contour and the method is easily made invariant to translation, scaling and rotation. Kopf et al. [36] and Chang [37] employed the CSS to capture the local features of the hand gesture. Since the human hand is highly flexible, the location of the largest peak in the CSS image will be unstable for the same hand postures, thus affecting the recognition. Also, similar hand contours are not well discriminated. Liang et al. [38] have proposed hand gesture recognition using the radiant projection transform and the Fourier transform.
The method is normalized for rotation, scale and translation variations. However, it is not suitable for gestures with almost same boundary profiles. In [39,40], the static hand gestures are classified through Hausdorff distance matching that involves computing the point-wise correspondence between the boundary pixels of the images to be compared. Their experiments show that Hausdorff distance based matching provide good recognition efficiency. But the major drawback of Hausdorff distance based matching is its computational complexity. Dias et al. [41] present a system known as the open gestures recognition engine (O.G.R.E) for recognizing the hand postures in the Portuguese Sign Language. The histogram of the distances and the angle between the contour edges are used to derive a contour signature known as the pair-wise geometrical histogram. The classification is performed by comparing the pair-wise geometrical histograms representing the gestures. Kelly et al. [42] have derived features from the binary silhouette and the one dimensional boundary profile to represent the hand postures. The binary silhouette is represented using the Hu moments. The size functions are derived from the boundary profile to describe the hand shape. The dimensionality of the size functions is reduced using the PCA. They combine the Hu moments and the eigen space size functions to achieve user independent gesture recognition. From the above study, we infer that the boundary based representations fail in discriminating gestures with almost same boundary profiles and are sensitive to boundary distortions. Therefore, for a visually distinct gesture vocabulary, the moment based approaches can be well explored. Also, processing the intensity image sequences is complex and it increases the computational load. Hence, binary silhouettes of the hand gestures are used for processing instead of the intensity images.
3. Theory of moments The non-orthogonal and the orthogonal moments have been used to represent images in different applications including shape analysis and object recognition. The geometric moments are the most widely employed features for object recognition [9,43]. However, these moments are non-orthogonal and so reconstructing the image from the moment features is very intricate. It is also not possible to decipher the accuracy of such representations. In image analysis, Teague [44] introduced and derived the orthogonal moments with the orthogonal polynomials as the basis functions. The set of orthogonal moments has minimal information redundancy [44,45]. In this class, the Zernike moments defined in the polar domain are based on the continuous orthogonal polynomials and are rotation invariant [43]. For computation, the Zernike moments have to be approximated in the discrete domain and the discretization error increases for higher orders. Hence, the moments based on discrete orthogonal polynomials like the Tchebichef and the Krawtchouk polynomials have been proposed [44]. These moments are defined in the image coordinate space and do not involve any numerical approximation. A few studies are being reported on the accuracy of the Krawtchouk moments in image analysis [7,8,46,45]. Yap et al. [7] introduced the Krawtchouk moments for image representation and verified their performance on character recognition. From the experiments, they conclude that Krawtchouk moments perform better than geometric moments and the other orthogonal moments like the Zernike, the Legendre and the Tchebichef moments. To derive the moments, consider a 2D image f ðx,yÞ defined over a rectangular grid of size ðN þ1Þ ðM þ 1Þ with ðx,yÞ A
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
The Zernike moment Znm of order n is given by
f0,1, . . . ,Ng f0,1, . . . ,Mg. In this work, we consider the following moments to represent f.
Z nm ¼
nþ1
3.1. Geometric moments
N X M X
f ðx,yÞxn ym
p
Z 2p Z 0
1 0
V nnm ðr, yÞf ðr, yÞr dr dy
ð4Þ
The integration in (4) needs to be computed numerically. The magnitude of the Zernike moments Znm is invariant to rotation and hence, are commonly used for rotation invariant gesture representation [30,31].
The geometric moment of order ðn þ mÞ is defined as [43] Gnm ¼
2205
ð1Þ
x¼0y¼0
where n A f0,1, . . . ,Ng and m A f0,1, . . . ,Mg. The geometric moments Gnm are the projection of the image f ðx,yÞ on the 2D polynomial bases xn ym .
3.3. Krawtchouk moments
These moments are defined on the polar coordinates ðr, yÞ, such that 0 r r r 1 and 0 r y r 2p. The complex Zernike polynomial V nm ðr, yÞ of order n Z0 and repetition m is defined as [43]
The Krawtchouk moments are discrete orthogonal moments derived from the Krawtchouk polynomials. The nth order Krawtchouk polynomial at a discrete point x with ð0 op o1, q ¼ 1pÞ is defined in terms of hypergeometric function as [47] 1 K n ðx; p,NÞ ¼ 2 F 1 n,x; N; ð5Þ p
V nm ðr, yÞ ¼ Rnm ðrÞ expðjmyÞ
By definition
3.2. Zernike moments
ð2Þ
For even values of n9m9 and 9m9r n, Rnm ðrÞ is the real-valued radial polynomial given by Rnm ðrÞ ¼
ðn9m9Þ=2 X s¼0
2 F 1 ða,b; c; zÞ ¼
ð1Þs ðnsÞ!rn2s s!ððn þ 9m9Þ=2sÞ!ððn9m9Þ=2sÞ!
K 0 ðx; p,NÞ ¼ 1,
3.2.1. Image representation by Zernike polynomials The Zernike polynomials are defined in the polar domain and hence, the image coordinates (x,y) system needs to be converted to ðr, yÞ coordinates. Let f ðr, yÞ define the image in polar domain. Using the Zernike polynomials of the form V nm ðr, yÞ, the image f ðr, yÞ is approximated as [43]
m ¼ 0 n9m9even
Z nm V nm ðr, yÞ
x , Np
K 2 ðx; p,NÞ ¼ 1
wðx; p,NÞK n ðx; p,NÞK m ðx; p,NÞ ¼ ð1Þn
x¼0
1p n n! d½nm p ðNÞn
where d½: is the Kronecker delta function. Assuming rðn; p,NÞ ¼ ð1Þn ðð1pÞ=pÞn n!=ððNÞn Þ, the weighted Krawtchouk polynomial for order n ¼ 0,1, . . . ,N is defined as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wðx; p,NÞ ð8Þ K n ðx; p,NÞ ¼ K n ðx; p,NÞ rðn; p,NÞ
ð3Þ
R00
1
1
R11
0
R20
R22 −1 0 0.5 1 R31
0.5
n=6, m=0
n=12, m=0
n=6, m=2
n=12, m=2
n=6, m=4
n=12, m=4
R33
0
R40 −0.5 −1 −1.5
0
0.2
x x1 2þ Np ðN1Þp
ð7Þ
1.5
Rnm − Radial polynomial
n¼0
K 1 ðx; p,NÞ ¼ 1
The orthogonality property is given by [7,47] N X
n X
ðaÞv ¼ aða þ 1Þ . . . ða þ v1Þ
and so on. The set of ðN þ 1Þ Krawtchouk polynomials forms a complete orthogonal basis with a binomial weight function N wðx; p,NÞ ¼ ð6Þ px ð1pÞNx x
where d½: is the Kronecker delta function.
n max X
and
Thus
The plots of the radial polynomials Rnm ðrÞ for different orders of n and repetition m are given in Fig. 1(a). The 2D complex Zernike polynomials V nm ðr, yÞ obtained for different values of n and m are shown in Fig. 1(b). The complex Zernike polynomials satisfy the orthogonality property Z 2p Z 1 p V nnm ðr, yÞV lk ðr, yÞr dr dy ¼ d½nld½mk n þ1 0 0
f ðr, yÞ ffi
n X ðaÞv ðbÞv zv ðcÞv v! v¼0
0.4 0.6 ρ − Radius
0.8
1
Fig. 1. (a) 1D Zernike radial polynomials Rnm ðrÞ. (b) 2D complex Zernike polynomials V nm ðr, yÞ (real part).
2206
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
p1 can be tuned to shift the ROI horizontally and p2 shifts the ROI vertically. Like in the 1D case, the direction of shift depends on the signs of Dp1 and Dp2 . Therefore, with the proper choice of p1 and p2, a subimage corresponding to the desired ROI can be represented by the Krawtchouk moments. This wavelet-like property gives the capability of ROI feature extraction to the Krawtchouk moments. By comparing the plots of 2D polynomial functions in Figs. 1(b) and 2(b), we can infer that the Zernike polynomials have wider supports. Therefore, the Zernike moments characterize the global shape features. The support of the Krawtchouk polynomials varies with its order. The lower order polynomials have compact supports and the higher orders have more wider supports. Therefore, the lower order Krawtchouk moments capture the local features and the higher order moments represent the global characteristics. It can also be noted that the lower order Krawtchouk polynomials have relatively high spatial frequency characteristics and hence, the local variations corresponding to the edges are well defined at the lower orders. In the case of Zernike moments, the spatial frequency increases only with the order. Hence, it requires higher orders to represent the edges and the lower orders to characterize the average shape information. The Krawtchouk moments obtained in (11) are used as features in the proposed feature-based hand gesture classification system. The geometric moments and the Zernike moments as given in (1) and (4) respectively are also used as features for comparative analysis.
and hence, the orthogonality condition in (7) becomes N X
K n ðx; p,NÞK m ðx; p,NÞ ¼ d½nm
x¼0
Thus the weighted Krawtchouk polynomials form an orthonormal basis. The constant p can be considered as a translation parameter that shifts the support of the polynomial over the range of x. For p ¼ 0:5 þ Dp, the support of weighted Krawtchouk polynomial is approximately shifted by NDp [7]. The direction of shifting depends on the sign of Dp. The polynomial is shifted in þ x direction when Dp is positive and vice versa. The plots of 1D weighted Krawtchouk polynomials of order n ¼0,1 and 2 for N ¼64 and three values of p are shown in Fig. 2(a). The Krawtchouk polynomials are calculated recursively using the relations [47] pðnN1ÞK n ðxÞ ¼ ðx þ 12p þ2pnnNpÞK n1 ðxÞðp1Þðn1ÞK n2 ðxÞ
ð9Þ where pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi K 0 ðx; p,NÞ ¼ wðx; p,NÞ
and
x K 1 ðx; p,NÞ ¼ 1 pN
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wðx; p,NÞ rð1; p,NÞ
3.3.1. Image representation by Krawtchouk polynomials The separability property can be used to obtain the 2D Krawtchouk bases and the image f ðx,yÞ is approximated by the sum of weighted Krawtchouk polynomials as f ðx,yÞ ffi
n max m max X X
Q nm K n ðx; p1 ,NÞK m ðy; p2 ,MÞ
4. Gesture recognition using Krawtchouk moments
ð10Þ
n¼0m¼0
The proposed gesture recognition system is developed by broadly dividing the procedure into three phases. They are: (1) hand detection and segmentation, (2) normalization and (3) feature extraction and classification. A description of these tasks is presented below. Fig. 3 shows a schematic representation of the proposed gesture recognition system.
where K n ðx; p1 ,NÞK m ðy; p2 ,MÞ is the ðn þ mÞth order 2D weighted Krawtchouk basis with 0 o p1 o 1, 0 op2 o1 and the coefficient Qnm is called the ðn þmÞth order Krawtchouk moment. Fig. 2(b) shows the plots of 2D Krawtchouk bases for N ¼ M ¼ 104, p1 ¼ p2 ¼ 0:5 and different values of n and m. Using the orthogonality property, the Krawtchouk moments of order ðn þmÞ are obtained as Q nm ¼
N X M X
K n ðx; p1 ,NÞK m ðy; p2 ,MÞf ðx,yÞ
4.1. Hand detection and segmentation
ð11Þ
This phase detects and segments the hand data from the captured image. The hand regions are detected using the skin color pixels. The background is restricted such that the hand is the largest object with respect to the skin color.
x¼0y¼0
The appropriate selection of p1 and p2 enables to extract the local features of an image at the region-of-interest (ROI). The parameter
p=0.25 Order: n =1
0.3 0.2 0.1 40
f(x)
0 40 x
x
60
20
40
60
40
60
40
60
x
n=12, m=2
n=6, m=4
n=12, m=4
p=0.5
0.3
0.2
0.1
f(x)
0.2
0.2
f(x)
f(x)
n=6, m=2
0 −0.2
20
60
n=12, m=0
0.2
−0.2 20
n=6, m=0 Order: n = 2
0.2 f(x)
f(x)
Order: n =0
0 −0.2
20
40
60
0 −0.2
20
x
40
20
60
x
x
0.3 0.2 0.1
0.2 f(x)
0.2 f(x)
f(x)
p=0.75 0 −0.2 20
40 x
60
0 −0.2
20
40 x
60
20 x
Fig. 2. (a) 1D weighted Krawtchouk polynomials for different values of p and N ¼64. (b) 2D weighted Krawtchouk polynomials for p1 ¼ p2 ¼ 0:5 and N ¼ M ¼ 104.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
2207
Fig. 3. Schematic representation of the gesture recognition system.
Fig. 4. Results of hand segmentation using skin color detection. (a) Acquired images. (b) Skin color regions. (c) Segmented gesture images.
Teng et al. [1] have given a simple and effective method to detect skin color pixels by combining the features obtained from the YCbCr and the YIQ color spaces. The hue value y is estimated from the Cb–Cr chromatic components by Cr ð12Þ y ¼ tan1 Cb The in-phase color component I is calculated from the RGB components as I ¼ 0:596R0:274G0:322B
ð13Þ
Their experiments conclude about the range of y and the in-phase color component I for Asian and European skin tones. The pixels are grouped as skin color pixels if 1051 r y r 1501 and 30 rI r 100. Fig. 4(b) illustrates the skin color detection using this method for the hand gesture images shown in Fig. 4(a). The detection results in a binary image which may also contain other objects not belonging to the hand. Since, the hand is assumed to be the largest skin color object, the other components are filtered by comparing the area of
the detected binary objects. The resultant is subjected to morphological closing operation with a disk-shaped structuring element in order to obtain a well defined segmented gesture image. 4.2. Normalization techniques This is an essential phase in which the segmented image is normalized for any geometrical variations in order to obtain the desired hand gesture. The important factors to be compensated in this step are
1. The presence of forearm region. 2. The orientation of the object. The recognition efficiency can be improved through proper normalization of the gesture image. Hence, a robust normalization method based on the gesture geometry is proposed for extracting the hand region and orientation correction.
2208
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
4.2.1. Proposed method for rule based hand extraction Consider a binary image f defined over a grid B of size ðN þ1Þ ðM þ 1Þ. B is composed of two complementary regions R and R representing the gesture (object) and the background respectively. Thus R ¼ fðx,yÞ9ðx,yÞ A B
and
f ðx,yÞ ¼ 1g
ð14Þ
and the complementary region R is given by R ¼ B\R
ð15Þ
The boundary dR of the gesture region is defined by the set of pixels in R that are adjacent to at least one pixel in the region R. It is represented as
dR ¼ fðx,yÞ9ðx,yÞ A R and ðx,yÞ is adjacent to a pixel in Rg
ð16Þ
The gesture region R can be partitioned in to three subregions. They are (a) Rfingers (fingers), (b) Rpalm (palm) and (c) Rforearm (forearm). Hence R ¼ Rfingers [ Rpalm [ Rforearm
ð17Þ
such that Rfingers \ Rpalm ¼ | Rfingers \ Rforearm ¼ | Rpalm \ Rforearm ¼ |
ð18Þ
Fig. 5 illustrates these elementary regions comprising the gesture object R. Based on the anatomy, the palm and the forearm can be considered as continuous smooth regions. The forearm extends outside the palm and its width is less than that of the palm region. Conversely, the region containing the fingers is discontinuous under abduction. Also, the width of a finger is much smaller than that of the palm and the forearm. Therefore, the geometrical variations in the width and the continuity of these subregions in the gesture image are used as cues for detection.
(a) Computation of width (1) The variation in the width along the longest axis of the gesture image is calculated from the distance map obtained using the Euclidean distance transform (EDT). The EDT gives the minimum distance of an object pixel to any pixel on the boundary set dR. The Euclidean distance between a boundary pixel ðxb ,yb Þ A dR and an object pixel ðx,yÞ A R is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð19Þ dðxb ,yb Þ,ðx,yÞ ¼ ðxxb Þ2 þðyyb Þ2 The value of the EDT, Dðx,yÞ for the object pixel (x,y) is computed as Dðx,yÞ ¼
min
ðxb ,yb Þ A dR
dðxb ,yb Þ,ðx,yÞ
(2) The straightforward implementation of EDT defined through (19) and (20) is computationally expensive. Therefore, the conventional approach to fast EDT based on the Voronoi decomposition of the image proposed in [48] is employed. A study on the several other algorithms proposed for reducing the computational complexity of EDT is discussed in [49]. (b) Verification of region continuity (3) The continuity of the subregions after detection is verified through connected component labeling preceded by morphological erosion. The erosion operation with a small structuring element is performed to disconnect the weakly connected object pixels. The structuring element considered is a disk operator with radius 3. The resultant is verified to be a continuous region if there is only one connected component. If there is more than one connected component, the detected region is verified as discontinuous. The geometrical measurements along the finger regions vary with the users and they get altered due to geometric distortions. However, the measures across the palm and the forearm can be generalized and their ratios are robust to geometric distortions. The palm is an intactly acquired part that connects the fingers and the forearm. Since Rpalm lies as an interface between Rfingers and Rforearm, the separation of palm facilitates the straightforward detection of the other two regions. Hence, the anthropometry of palm is utilized for detecting the regions in the gesture image. 4.2.1.1. Anthropometry based palm detection. The parameters of the hand considered for palm detection are the hand length, palm length and the palm width that are as illustrated in Fig. 6(a). The anthropometric studies in [50–52] present the statistics of the above mentioned hand parameters. From these studies, we infer that the minimum value of the ratio of palm length ðLpalm Þ to palm width ðW palm Þ is approximately 1.322 and its maximum value is 1.43. Similar observations were made from our photometric experiments. Fig. 6(b) gives the histogram of the different palm length to palm width ratio obtained through our experimentation. This ratio will be utilized to approximate the palm region as an ellipse. Considering all the variations of this ratio, we take Lpalm ¼ 1:5 W palm
Based on the geometry, we approximate the palm region Rpalm as an elliptical region with Major axis length ¼ 1:5 Minor axis length
apalm ¼
Lpalm 2
ð23Þ
bpalm ¼
W palm 2
ð24Þ
Therefore apalm ¼ 1:5 bpalm
Fig. 5. Pictorial representation of the regions composing the binary image f. R denotes the gesture region and R denotes the background region.
ð22Þ
Assuming apalm as the semi-major axis length and bpalm as the semi-minor axis length, we can write
ð20Þ
The values of Dðx,yÞ at different (x,y) are used to detect the subregions of R.
ð21Þ
ð25Þ
Using (25), it can be inferred that all the pixels constituting Rpalm will lie within the ellipse of semi-major axis length apalm. Therefore, the palm center and the value of apalm have to be estimated for detecting the palm region. Computing the palm center. Given that the boundary of Rpalm is an ellipse, its center is known to have the maximum distance to the nearest boundary. Therefore, the center of Rpalm is computed using the EDT in (20). The pixels (x,y) with EDT values Dðx,yÞ greater than a threshold z are the points belonging to the neighborhood of the center of the palm. This neighborhood is
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
2209
Fig. 6. (a) Hand geometry. (b) Histogram of the experimental values of palm length ðLpalm Þ to palm width ðW palm Þ ratio calculated for 140 image samples taken from 23 persons.
defined as C ¼ fðx,yÞ A R9Dðx,yÞ 4 zg
ð26Þ
The center ðxc ,yc Þ is defined as the palm centroid and given by ðxc ,yc Þ ¼ ðbX e,bY eÞ
ð27Þ
where X¼
1 X xi , 9C9 ðx ,y Þ A C i
i
Y¼
1 X y 9C9 ðx ,y Þ A C i i
i
9C9 is the cardinality of C and be denotes rounding off to the nearest integer. The threshold z is selected as maxðDðx,yÞ Þt. The offset t is considered to compensate for the inaccuracies due to viewing angles. For small values of t, the centroid may not correspond to the exact palm center and for large values of t will tend to deviate the centroid from the palm region. The optimal value of t is experimentally chosen as 2. Computing the semi-major axis length. From the geometry, it can be understood that the nearest boundary points from the palm centroid correspond to the end points of the minor axis. Hence, the EDT value at ðxc ,yc Þ is the length of the semi-minor axis and therefore bpalm ¼ Dðxc ,yc Þ
ð28Þ
From (25), it follows that the length of the semi-major axis can be given as apalm ¼ 1:5 Dðxc ,yc Þ
ð29Þ
Detecting the palm. In order to ensure proper detection of the palm, the finger regions (Rfingers) are sheared from the segmented object through the morphological opening operation. The structuring element is a disk with radius dr empirically chosen as bpalm dr ¼ 1:5
ð30Þ
The resultant is considered as the residual and will be referred as the oddment. The oddment is generally composed of the palm region and may or may not contain the forearm. This implies A DR. Therefore, the oddment A can be defined as
is a For R with no forearm region, Rforearm ¼ | and A ¼ Rpalm . Rpalm part of A that is approximated as an elliptic region. Thus 9 8 ðxo ,yo Þ A A and > > > > = < 2 2 x x y y ð31Þ Rpalm ¼ ðxo ,yo Þ o c þ o c r1 > > > > apalm bpalm ; :
4.2.1.2. Detection of forearm. The forearm is detected through the abstraction of the palm region Rpalm from the gesture image R. The abstraction separates the forearm and the finger regions, such that R is modified as R^ ¼ R\Rpalm ¼ Rfingers [ Rforearm
ð32Þ
As in the case of palm detection, the finger region is removed from R^ through the morphological opening operation. The structuring element is a disk with its radius calculated from (30). The resultant is a forearm region and has the following characteristics:
1. The resultant Rforearm DA and the region enclosing Rforearm is continuous. 2. The width of the wrist crease is considered as the minimum width of the forearm region. From the anthropometric measures in [51], the minimum value of the ratio of the palm width to wrist breadth is obtained as 1.29 and the maximum value is computed as 1.55. Using this statistics, the empirical value for the width of the forearm is chosen as W forearm 4
2bpalm 1:29
ð33Þ
4.2.1.3. Identifying the finger region. Having detected the palm and the forearm, the remaining section of the gesture image R will contain the finger region if it satisfies the following conditions:
Rfingers JA. The region enclosing Rfingers is marked by irregular boundary, if more than one finger is abducted.
The width of a finger (maximum EDT value in this section) A ¼ Rpalm [ Rforearm
is much less than that of the palm and the forearm.
2210
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
Fig. 7. Illustration of the rule based region detection and separation of the hand from the gesture image f. The intensity of the background pixels is assigned as 0 and the object pixels are assigned the maximum intensity value 1.
where x is the user defined minimum permissible difference. The finite difference approximations
Experimentally W finger r
bpalm 2
ð34Þ
A procedural illustration of the proposed rule-based method for detecting the hand region from the input image is shown in Fig. 7. After detecting the hand region, the pixels belonging to the forearm Rforearm are removed from the gesture image. 4.2.2. Proposed approach to orientation correction The orientation of the hand can be assumed to imply the orientation of the gesture. In the case of static gestures, the information is conveyed through the finger configurations. Since the human hand is highly flexible, it is natural that the orientation of the oddment might not be the orientation of the fingers. Hence, the major axis of the gesture is not sufficient to estimate the angular deviation that is caused by the fingers. Therefore, in order to align a gesture class uniformly, the orientation with respect to the abducted fingers is utilized. If the number of abducted fingers is less than 2, the orientation correction is achieved using the oddment. 4.2.2.1. Orientation correction using finger configuration. The normalization of rotation changes based on the finger configuration is achieved by detecting the tip of the abducted fingers. For this purpose, the boundary points ðxb ,yb Þ are ordered as a contiguous chain of coordinates using the 8-connectivity. Any one of the boundary pixels that is not enclosed within the region containing fingers is used as the starting point and the ordering is performed in the clockwise direction. Suppose that z is the length of the boundary measured by the number of pixels, then a distance curve g(z) is generated by computing the Euclidean distances between the palm centroid ðxc ,yc Þ and the boundary pixel ðxb ,yb Þ at z using (19). The curve g is smoothed using cubic-spline smoothing [53]. The resultant is a smooth curve consisting of peaks that correspond to the finger tips of the hand gesture. These peaks are detected by computing the first and the second order derivatives of g using the finite difference approximations. Thus, gðzÞ is considered to be a peak if 9g 0 ðzÞ9 o x
and
g 00 ðzÞ o 0
ð35Þ
g 0 ðzÞ C
gðz þ 1Þgðz1Þ 2
ð36Þ
gðz þ 1Þ þgðz1Þ2gðzÞ 4
ð37Þ
and g 00 ðzÞ C
are used to implement (35). In some cases, a few peaks may correspond to the palm region. These points are easily eliminated by verifying their presence in the oddment A. The 2D coordinate positions of the detected peaks are utilized to find a representative peak for each abducted finger. The distance curve corresponding to a gesture and the detected finger tips are shown in Fig. 8. Let L be the total number of detected peaks and gi , i ¼ 1, . . . ,L define the position vectors of the detected points with respect to ðxc ,yc Þ, indexed from left to right. These vectors gi are referred as the finger vectors and the central finger vector g^ is computed from 8 if L is odd > < gðL þ 1Þ=2 ð38Þ g^ ¼ gL=2 þ gL=2 þ 1 > otherwise : 2 The gestures are assumed to be perfectly aligned if the vector g^ is at 901 with respect to the horizontal axis of the image. Otherwise, the segmented gesture image is to be rotated by 901+g^ .
4.2.2.2. Orientation correction using the oddment. The geometry of the oddment A is utilized to correct the orientation of the gestures with only one abducted finger and the gestures like the fist. The shape of the oddment can be well approximated by an ellipse and hence, the orientation of its major axis with respect to the horizontal axis of the image gives the approximate rotation angle of the hand gesture.
4.2.3. Normalization of scale and spatial translation The scale of the rotation corrected gesture region is normalized and fixed to a pre-defined size through the nearest neighbor interpolation/down sampling technique. The spatial translation is corrected by shifting the palm centroid ðxc ,yc Þ to the center of the image.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
Therefore, the resultant is the segmented gesture image that is normalized for transformations due to rotation, scaling and translation. 4.3. Feature extraction and classification The normalized hand gesture data obtained are represented using the Krawtchouk moments calculated using (11). For comparative studies, the geometric and the Zernike moments that are defined through (1) and (4) respectively, are employed as features for representing the normalized hand gestures. The features representing the gestures are classified using the minimum distance classifier. Consider zs and zt as the feature vector of the test image and the target image (in the trained set) respectively. Then, the classification of zs is done using the minimum distance classifier defined as dt ðzs ,zt Þ ¼
T X
ðzsj ztj Þ2
ð39Þ
j¼1
Match ¼ arg min ðdt Þ ftg
where t is the index of signs in the trained set and T is the dimension of the feature vectors.
2211
In our experiment, the segmentation overload is simplified by capturing the images under uniform background. However, the foreground is cluttered with other objects and the hand is ensured as the largest skin color object within the field-of-view (FOV). Except for the size, there were no restrictions imposed on the color and texture of the irrelevant cluttered objects. Also, the FOV was sufficiently large enabling the users to perform gestures more naturally without interfering their gesturing styles. The gestures are captured by varying the viewpoint. 5.1. Determination of viewpoint In the field of imaging, the viewpoint refers to the position of the camera with respect to the object of focus [54]. Therefore, in the context of our experiment, we define the viewing angle as the angle made by the camera with the longest axis of the hand. The optimal choice of viewing angle or the viewpoint is determined by the amount of perspective distortion. The distortion is caused if the focal plane is not parallel to the object’s surface and/or not in level with the center of the object. This refers that the camera is not at equidistant from all the parts of the object [55]. Hence, the viewpoint is assumed to be optimum if the camera is placed parallel to the surface of the hand. For our experimental setup, the optimum viewing angle is determined to be 901. Fig. 9(a) illustrates the setup for gesture acquisition from the optimal viewpoint and Fig. 9(b) illustrates the variation of the viewing angles with respect to the hand.
5. Experimental results and discussion 5.2. About the database The gestures in the database are captured using the RGB Frontech e-cam. The camera has a resolution of 1280 960 and is connected to an Intel core-II duo 2 GB RAM processor.
The database consists of two sets of gesture data. The first dataset composed of gestures collected from a perfect viewpoint,
Fig. 8. Description for finger tip detection using the peaks in the distance curve
denotes g^ .
Fig. 9. (a) A schematic representation of the experimental setup at optimal view angle. (b) Illustrations of the view angle variation between the camera and the object. The object is assumed to lie on the x2y plane and the camera is mounted along the z-axis. The view angle is measured with respect to the x2y plane.
2212
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
Fig. 10. Gestures signs in database.
which means that the angle of view is perpendicular to the imaging surface. The second dataset contains gestures captured by varying the view angles. The testing is performed realtime on the gestures collected under controlled environment. The database is constructed by capturing 4230 gestures from 23 users. The data contains 10 gesture signs with 423 samples for each user. The gesture signs taken for evaluation are shown in Fig. 10. The images are collected under three different scales, seven orientations and the view angles at 451, 901, 1351, 2251 and 3151. Of the 4230 images, 2260 gestures are taken at 901 and the remaining 1970 at varying view angles. We refer the dataset taken at 901 as Dataset 1 and the remaining data as Dataset 2. The Dataset 1 consists of gestures that vary due to similarity transformations of rotation and scaling. Dataset 2 consists of gestures that are taken at different view angles and scales. Due to the viewing angles, the gestures undergo perspective distortions and the view angle variation also imposes orientation changes. Thus, the gestures in Dataset 2 accounts for both perspective (view angle) and similarity (orientation and scale) distortions. Also, the gestures in Dataset 1 are collected cautiously such that there is no self-occlusion between the fingers. But, while collecting the samples in Dataset 2 the precautions were not taken to control self-occlusion which might occur due to either the user’s flexibility or the view angle variation. The portioning of the database allows to study in detail the efficiency of the user and the view independent gesture classification.
5.3. Gesture representation using orthogonal moments The potential of the orthogonal moments as features for gesture representation is compared on the basis of their accuracy in reconstruction. Let f denote a normalized binary gesture image for which the Zernike and the Krawtchouk moments are derived using (4) and (11) respectively. Accordingly, f^ be the image reconstructed from the corresponding moments through (3) and (10). The image reconstructed from the moments is binarised through thresholding. The accuracy in reconstruction is measured using the mean square error (MSE) and the structural similarity (SSIM) index. The MSE between the images f and f^ is computed as MSE ¼
N X M X 1 ðf ðx,yÞf^ ðx,yÞÞ2 ðN þ 1ÞðM þ 1Þ x ¼ 0 y ¼ 0
ð40Þ
The SSIM index between the images f and f^ is computed blockwise by dividing the images into L blocks of size 11 11. For l A f1,2, . . . ,Lg, the SSIM between the lth block of f and f^ is
evaluated as [56] SSIMðf , f^ Þl ¼
ð2mf mf^ þc1 Þð2sf f^ þc2 Þ ðm2f þ m2^ þ c1 Þðs2f þ s2^ þc2 Þ f
ð41Þ
f
where mf and mf^ denote the mean intensities, s2f and s2^ denote f the variances and sf f^ denotes the covariance. The constants c1 and c2 are included to avoid unstable results when the means and the variances are very close to zero. For our experiments, the constants are chosen as c1 ¼0.01 and c2 ¼0.03. The average of the block-wise similarity gives the SSIM index representing the overall image quality. The value of SSIM index lies on [ 1,1] and a larger value means high similarity between the compared images. The analysis with respect to a gesture image is presented in Fig. 11 for illustrating the performance of the orthogonal moments in gesture representation. The normalized gesture image considered for analysis is shown in Fig. 11(a). A few examples of the corresponding image obtained through reconstruction from different number of Zernike and Krawtchouk moments are shown in Fig. 11(b) and (c) respectively. From these reconstructed images, it is observed that the perceptual similarities between the original and the reconstructed images are more in Krawtchouk moment based representation. The comparative plots of the MSE and the SSIM index values obtained for image reconstruction by varying the number of moments are shown in Fig. 11(d) and (e) respectively. The MSE value obtained for the Krawtchouk moment based representation is less than the Zernike moment based representation and the corresponding SSIM index values show that the Krawtchouk moments have comparatively high similarity to the original image. It is also noted that the computation of the Zernike moments becomes numerically unstable for higher orders. Hence, the reconstruction error increases with the increase in the order. In the case of Krawtchouk moments, the reconstruction error decreases with the increase in the order. The Krawtchouk moments closely approximate the original image and the edges are better defined in the Krawtchouk based approach. This is expected; because the Krawtchouk polynomials (as shown in Fig. 2) are localized and have relatively high spatial frequency components. On the other hand, the Zernike polynomials (as shown in Fig. 2) are global functions with relatively less spatial frequency components. As a result, the Zernike moments are not capable of representing the local changes as efficiently as the Krawtchouk moments. From the results in Fig. 11 it can be inferred that the Krawtchouk moments offer better accuracy than the Zernike moments in gesture representation. Therefore, it suggests that the Krawtchouk moments are potential features for better gesture representation.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
2213
Fig. 11. Illustration of performance evaluation of gesture representation using Zernike and Krawtchouk moments at various orders. (a) Original image. (b) Examples of images reconstructed from Zernike moments. N.M. denotes number of moments. (c) Examples of images reconstructed from Krawtchouk moments. N.M. denotes number of moments. (d) Comparative plot of MSE vs number of moments. (e) Comparative plot of SSIM index vs number of moments.
5.4. Gesture classification The orders of the orthogonal moments are selected experimentally based on the accuracy in reconstruction and the orders of the geometric moments are chosen based on the recognition performance. The maximum orders of the geometric moments, the Zernike moments and the Krawtchouk moments were fixed at 14 ðn ¼ 7 and m ¼ 7Þ, 30 and 80 ðn ¼ 40 and m ¼ 40Þ respectively. The parameters p1 and p2 of the Krawtchouk polynomials are fixed at 0.5 each to ensure that the moments are emphasized with respect to the centroid of the object. The resolution of the image is fixed at 104 104 with the scale of the hand object normalized to 64 64 through the nearest neighbor interpolation/down sampling method. The experiments for analyzing the user independence and view invariance are as follows.
5.4.1. Verification of user independence In gesture classification, user independence refers to the robustness to user variations that include variations in the hand geometry and the flexibility of the fingers. In order to perform the experiment, the training and the testing samples are taken from Dataset 1. As stated earlier, the gestures in Dataset 1 are collected at the optimum view angle such that the gestures do not undergo perspective distortion. Therefore, the variations among the gesture samples in Dataset 1 are only due to the user variations. For this reason, the experiments for verifying the user-invariance are performed using only the samples in Dataset1. The user independence of the features is verified by varying the number of users considered while training. The number of users considered in forming the training set for experimentation are varied as 23, 15 and 7. Thus, the largest training dataset
consists of 230 gestures with 23 training samples per gesture sign. Some examples of the gestures contained in the training set are shown in Fig. 12. The classification is performed on 2030 testing samples that are collected from 23 users. The detailed scores of the gesture classification results in Table 1 with respect to the samples from Dataset 1 are given in Tables 2–4. In the case of geometric moments, the rate of misclassification increases vastly as the number of users in the training set decreases. Further, the decline in the classification accuracies indicates that the geometric moments provide the least user independence. The gesture-wise classification results of the geometric moments obtained for varying number of users in the training set are tabulated in Table 2(a)–(c). From these results, it is observed that most of the mismatch has occurred between the gestures that are geometrically close. For example, gesture 3 is mostly misclassified as gesture 2, gesture 1 is misidentified as gesture 7, gesture 2 is recognized as either gesture 1 or gesture 3 and gesture 7 is matched as gesture 2. It is also observed that in the case of geometric moments there is poor perceptual correspondence between the mismatched gestures. This is because the geometric moments are global features and they only represent the statistical attributes of a shape. The Zernike moments offer better classification rate than the geometric moments even as the number of users considered for training decreases. From the comprehensive scores of the classification results given in Table 3(a)–(c), it is understood that the accuracy of Zernike moments is mainly reduced due to the confusion among the gestures 1, 8 and 9. Since the Zernike polynomials are defined in the polar domain, the magnitude of the Zernike moments for shapes with almost similar boundary profile will also be approximately same. Hence, the misclassification in the case of Zernike moments occurred between the gestures that have almost similar boundary
2214
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
Fig. 12. Examples of the training gestures taken from Dataset 1.
Table 1 Verification of user independence. Comparison of classification results obtained for varying number of users in the training set. The number of testing samples in Dataset 1 is 2030. No. of users in the training set
23 15 7
No. of training samples/ gestures
23 15 7
Classification results for the testing samples from Dataset 1 based on
Table 2 Comprehensive scores of the overall classification results in Table 1 for geometric moments with different number of users in the training set and 203 testing samples/gesture taken from Dataset 1. I/P
O/P 0
Geometric moments based representation
Zernike moments based representation
Krawtchouk moments based representation
82.07 78.28 72.66
90.89 89.75 86.65
95.42 95.12 93.15
% CC—percentage of correct classification.
profile. From the samples shown in Fig. 12, it can be noted that the gestures 1, 8 and 9 have almost same boundary profile and hence, are frequently mismatched. In the case of Krawtchouk moments, the mismatch has occurred between the gestures with coinciding regions. With respect to the shape, some gesture signs in the dataset can be considered as the subsets of other signs in the context of spatial distribution of its pixels. To show that, in Krawtchouk moment based representation the confusion has occurred between the gestures with almost same spatial distribution of the pixels, a simple analysis is performed by comparing the spatial distribution of the boundary pixels. If the boundary pixels exhibit higher correspondence, so will be the regions within the boundaries. Fig. 13 illustrates a few examples from the misclassifications in gesture classes 1 and 9. It can be verified that the spatial distribution of the pixels in the test gestures coincides highly with the matches obtained through Krawtchouk moments based classification. As per the results in Table 4(a)–(c) the gestures 1, 6 and 9 are frequently misclassified as gesture 7. Some examples of these misclassifications along with the corresponding training gestures are shown in Fig. 14. Similarly, gesture 0 is a subset of all the gesture signs contained in the database. As a result, many gestures are mismatched with gesture 0. From the results in Table 1, it is confirmed that the Krawtchouk moment features represent the gestures more accurately than each of the geometric and the Zernike moment based features. Particularly, the performance of Krawtchouk moments is consistent even for a small number of users in the training set. In the case of the geometric and the Zernike moments, the classification rate with respect to each gesture has varied with the varying number of samples while training. Therefore, it is concluded that among the moment based representations the Krawtchouk moments are more robust features for user independent gesture recognition.
1
2
3
4
5
6
7
8
9
(a) Confusion matrix for 23 training samples/gestures 0 175 0 1 8 1 0 0 1 0 184 0 0 0 0 0 2 0 18 163 13 4 2 0 3 4 1 53 126 12 0 0 4 1 1 12 18 160 5 2 5 0 0 0 0 13 179 7 6 0 1 0 1 10 5 156 7 0 1 17 2 3 0 3 8 9 3 3 5 13 0 2 9 0 0 1 4 0 0 2
1 19 1 1 0 0 27 172 1 12
8 0 1 0 3 0 0 5 167 0
9 0 1 6 1 4 3 0 0 184
(b) Confusion matrix for 15 training samples/gestures 0 171 5 1 12 3 0 0 1 0 185 0 0 0 0 1 2 0 18 157 19 6 0 0 3 5 1 65 108 19 0 0 4 1 1 15 15 158 5 3 5 0 0 0 0 22 165 13 6 0 5 0 0 16 6 150 7 0 1 10 2 3 0 5 8 11 2 4 2 22 0 3 9 3 2 1 8 0 0 2
1 17 1 0 0 0 23 175 1 25
7 0 1 0 2 0 1 7 158 0
3 0 1 5 3 3 2 0 0 162
(c) Confusion matrix 0 168 4 1 0 177 2 0 26 3 19 0 4 1 2 5 0 2 6 0 1 7 0 1 8 13 4 9 2 2
1 25 2 1 0 0 38 171 0 41
8 0 1 0 2 0 1 6 143 0
3 0 2 8 3 3 1 0 0 138
for 7 training samples/gestures 1 14 4 0 0 0 0 0 0 1 135 29 8 0 0 54 100 20 0 1 2 19 159 5 10 0 0 22 155 21 0 2 30 1 129 15 2 4 0 4 3 2 34 0 4 2 9 4 0 5
5.4.2. Verification of view invariance The view angle variations during gesture acquisition lead to perspective distortions and may sometimes cause self-occlusion. The self-occlusion can also be due to the poor flexibility of the gestures. The study on view invariance verifies the robustness of the methods towards the effects of viewpoint changes. In order to study the view invariance property of the considered methods, the initial experiment is performed by considering the training set taken from Dataset 1. We refer the training samples from Dataset 1 as Training set-I. Considering that the gestures in Dataset 1 are taken at a view angle of 901, the experiments also include the samples from Dataset 1 while testing. Therefore, the testing set consists of 3600 samples that
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
2215
include 2030 samples from Dataset 1 and 1570 samples from Dataset 2. The classification results obtained using the Training set-I are tabulated in Table 5. The comprehensive gesture-wise classification scores are given in Table 6(a)–(c). From the results in Table 5, it is evident that among the moment based representations, the Krawtchouk moments offer better classification accuracy. It is known that the perspective
distortion affects the boundary profile and the geometric attributes of a shape. Hence, the geometric moments are insufficient for recognizing the gestures under view angle variation. Similarly, the Zernike moments are sensitive to boundary distortions and as a result the performance of the Zernike moments is low for the gesture samples from Dataset 2. From the detailed scores in Table 6(b), it is observed that the maximum misclassification in
Table 3 Comprehensive scores of the overall classification results in Table 1 for Zernike moments with varying number of users in the training set and 203 testing samples/gestures taken from Dataset 1.
Table 4 Comprehensive scores of the overall classification results in Table 1 for Krawtchouk moments with different number of users in the training set and 203 testing samples/gestures taken from Dataset 1.
I/P
O/P 0
I/P 1
2
3
7
8
9
(a) Confusion matrix for 23 training samples/gestures 0 182 3 0 0 0 0 0 1 0 177 0 0 0 0 2 2 0 3 198 2 0 0 0 3 0 0 0 201 0 0 0 4 1 0 0 0 198 1 3 5 0 1 0 0 0 197 3 6 0 0 0 0 0 0 180 7 0 1 2 0 0 0 1 8 1 12 1 0 0 0 1 9 2 11 0 0 0 0 1
0 17 0 1 0 0 22 196 15 0
14 3 0 0 0 0 0 3 162 35
4 4 0 1 0 2 1 0 11 154
(b) Confusion matrix for 15 training samples/gestures 0 184 2 0 0 0 0 0 1 0 176 0 0 0 0 2 2 0 3 197 2 0 0 1 3 0 0 0 201 0 0 0 4 1 1 0 0 197 1 3 5 0 1 0 0 0 197 3 6 0 0 0 0 0 0 190 7 0 3 2 0 0 0 4 8 3 12 0 0 0 0 1 9 3 19 0 0 0 0 0
0 18 0 1 0 0 12 188 15 4
12 5 0 0 0 0 0 5 160 44
(c) Confusion matrix for 7 training samples/gestures 0 181 3 0 0 0 0 1 1 0 179 1 0 0 0 0 2 0 5 189 3 0 0 1 3 0 0 0 200 1 0 0 4 1 0 0 1 196 0 5 5 0 2 0 0 3 189 7 6 0 0 0 0 0 0 186 7 0 0 2 0 0 0 2 8 7 18 0 0 0 0 1 9 4 14 0 1 0 0 1
0 16 0 1 0 0 17 192 29 30
11 3 1 0 0 0 0 3 135 41
Test gesture gesture‘1’
4
5
6
Obtained match gesture‘7’
7
8
9
(a) Confusion matrix for 23 training samples/gestures 0 198 2 0 0 0 0 0 1 0 186 0 0 0 0 0 2 0 2 201 0 0 0 0 3 2 0 0 200 1 0 0 4 4 0 0 1 198 0 0 5 0 0 0 0 0 194 3 6 0 0 0 0 1 0 175 7 0 2 1 0 0 0 1 8 2 2 1 0 0 0 0 9 5 0 0 0 0 0 0
1 16 0 0 0 0 25 193 1 3
2 0 0 0 0 0 0 6 197 0
0 1 0 0 0 6 2 0 0 195
5 2 0 1 0 2 1 1 12 133
(b) Confusion matrix for 15 training samples/gestures 0 197 2 0 0 0 0 0 1 0 185 2 0 0 0 0 2 0 1 202 0 0 0 0 3 2 0 0 200 0 0 1 4 4 0 0 1 198 0 0 5 0 0 0 0 0 193 4 6 0 0 0 0 1 0 178 7 0 4 1 0 0 0 3 8 3 2 1 0 1 0 0 9 4 0 0 2 0 0 0
1 15 0 0 0 0 22 189 1 3
3 0 0 0 0 0 0 6 195 0
0 1 0 0 0 6 2 0 0 194
7 4 4 1 0 2 0 4 13 112
(c) Confusion matrix for 7 training samples/gestures 0 194 4 0 0 0 0 0 1 0 186 2 0 0 0 0 2 0 5 198 0 0 0 0 3 3 1 0 188 2 0 1 4 4 0 0 0 199 0 0 5 0 0 0 0 0 194 5 6 0 0 0 0 2 0 158 7 0 2 2 0 0 0 3 8 4 0 5 0 1 0 2 9 2 0 0 5 0 0 0
0 15 0 3 0 0 42 195 2 6
5 0 0 5 0 0 0 1 189 0
0 0 0 0 0 4 1 0 0 190
Actual match in Training set of gesture‘1’
Comparison between the spatial distribution of boundary points 10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80 20 30 40 50 60 70 80 20 30 40 50 60 70 80 Test data Obtained match Actual match
O/P 0
Test gesture` gesture‘9’
1
2
3
Obtained match gesture‘7’
4
5
6
Actual match in Training set of gesture ‘9’
Comparison between the spatial distribution of boundary points 20 30 40
20
50 60 70
50
80
30 40 60 70 80 20 30 40 50 60 70 80 20 30 40 50 60 70 80 Test data Obtained match Actual match
Fig. 13. Examples of results from Krawtchouk moment based classification. The illustration is presented to show that the Krawtchouk moments depend on the similarity between the spatial distribution of the pixels within the gesture regions. The spatial correspondence between the gestures is analyzed based on the shape boundary. It can be observed that the maximum number of boundary pixels from the test sample coincide more with the obtained match rather than the actual match.
2216
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
Fig. 14. Results from the experiment on user invariance. Examples of the testing samples that are misclassified in Krawtchouk moments based method. The correspondence of the test gesture can be observed to be high with respect to the mismatched gesture rather than the trained gestures within the same class.
Table 5 Experimental validation of view invariance. Comparison of classification results obtained for Training set-I & II. The training sets include gestures collected from 23 users. The number of testing samples in Dataset 1 and Dataset 2 is 2030 and 1570 respectively. Moment based representations
Geometric moments Zernike moments Krawtchouk moments
Training set-I
Training set-II % CC for Dataset 1
% CC for Dataset 2
Overall % CC
77.42
87.39
80.57
84.42
75.48
84.17
94.83
90.32
92.86
86.88
91.69
97.73
95.92
96.94
% CC for Dataset 1
% CC for Dataset 2
Overall % CC
82.07
71.4
90.89 95.42
% CC—percentage of correct classification.
Zernike moments based method is again due to the confusion among the gestures 1, 8, and 9. Similarly, gesture 7 is confused with gesture 2 and gesture 6 is misclassified as gesture 7. Among the moment based representations, the Krawtchouk moments have higher recognition rate for the testing samples from Datasets 1 and 2. Particularly, in the case of Dataset 2 the improvement is almost by 11% for Training set-I and it indicates that the Krawtchouk moments are robust to the view angle variations. By comparing the gesture-wise classification results given in the Table 6(a)–(c), it is observed that the number of misclassifications is notably more for almost all the gestures in Dataset 2. The samples of some of the gestures from Dataset 2 with higher misclassification rate is shown in Fig. 15. It can be understood that the recognition efficiency is reduced mainly due to the selfocclusion between the fingers and the boundary deviations. From Table 5 it should be noted that the classification accuracy is better for the testing samples from Dataset 1. This is because the Training set-I is constructed using the samples taken Dataset 1. This indicates that the performance for Dataset 2 can be improved if the training set also includes samples taken at varied view angles. The comprehensive scores for the results obtained using Training set-I and Training set-II are given in Tables 6 and 7 respectively. From the gesture-wise classification results obtained for Training set-I, it is difficult to perceptually relate the misclassified gestures. However, including more samples from different view points to the training set has improved the distinctness of the gestures. From the results for geometric moments in Table 7(a), it is observed that more misclassification has occurred for gestures 2, 3, 4 and 6. In the case of Zernike moments, the results in Table 7(b) show that gestures 6, 8 and
Table 6 Confusion matrix for the classification results given in Table 5 for Training set-I with 23 training samples/gesture signs and 360 testing samples/gesture signs. I/P
O/P
0 1 2 3 4 (a) Detailed score of geometric moments 0 306 3 1 14 1 1 1 309 2 39 277 22 10 3 24 3 90 203 21 4 6 2 23 35 272 5 1 39 6 2 5 37 7 10 38 13 3 8 21 4 7 7 28 9 1 1 13 1 (b) Detailed score of Zernike moments 0 326 4 1 290 2 2 36 309 3 3 9 1 9 325 4 4 8 3 1 10 328 5 1 1 4 6 2 7 1 8 22 8 8 23 1 9 7 38
5
6
7 1 45 1 1
4 2 10 274 6
2 25 259 7 4 2
2
1 327
(c) Detailed score of Krawtchouk moments 0 354 3 1 319 1 2 7 22 331 3 22 1 8 315 2 4 15 12 330 5 2 321 6 1 7 1 7 3 8 7 3 1 1 9 8
8
1 5 12 299 4 1 2
1 18 303 2
39 276 1 16
9
17 2 2 6 1 4 11 285
36 1 1
24 17 1 4
44 309 23 2
13 15 285 79
1 38
2
1 45 335 2 5
17 1 7 16 4 20 8 2 3 326 6 15 8 6 4 15 2 1 19 232
2 7 1 6 12 346
5 19 5
347
9 have less classification rates. As tabulated in Table 7(c), the maximum cases of misclassification for Krawtchouk moments are due to gesture 6. It is noted that gesture 6 is more prevalently misclassified as gesture 7. As stated earlier, the Zernike polynomials are global functions and hence, the misclassifications in the case of Zernike moments have occurred for the gestures with almost similar boundary profile. The Krawtchouk moments are region based features and are local functions whose support increases with the order of the polynomial. Hence, as explained before, the confusion occurred between the gestures with almost similar spatial distribution of pixels. The plots in Fig. 16 illustrate the classification accuracies of each gesture sign at different view angles. From these plots, it is
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
inferred that the maximum cases of misclassifications have occurred at a view angle of 1351. Further, the results in Fig. 16 corroborate the observations in Table 7.
5.4.3. Improving view invariant recognition In order to improve the view invariant classification rate, the experiments are repeated by including the gestures taken at different view angles in the training set. The extended training set consists of 630 gesture samples that are collected from 23 users. Among those, 230 samples are taken from Dataset 1 and 400 samples from Dataset 2. We refer the extended training set as Training set-II. The classification results are obtained for 3600 samples that contain 2030 samples from Dataset 1 and 1570 Table 7 Confusion matrix for the classification results given in Table 5 for Training set-II with 40 training samples/gesture signs and 360 testing samples/gesture sign. I/P
O/P 0
1
2
3
4
(a) Detailed score of geometric moments 0 323 5 2 12 5 1 328 4 2 3 2 19 305 24 6 3 14 1 60 246 27 4 1 2 11 62 264 5 1 2 14 6 4 9 7 13 10 7 8 4 3 13 20 9 1 3 12 2 (b) Detailed score of Zernike moments 0 341 3 1 327 2 356 2 3 1 4 345 2 4 11 343 5 1 1 6 3 7 3 10 8 30 9 1 8
5
6
1 1 11 330 6
5
7
1 1 2 4 286 5 3 8
8
19 4 1 2 42 322 5 4
4 1 1 1 356
(c) Detailed score of Krawtchouk moments 0 353 4 1 350 1 2 360 3 4 1 4 343 1 4 10 348 1 5 1 358 6 1 7 7 8 2 1 4 9 3 1 2
3 327 12 4
28 330 7
9
6 1 1 2 4 8 2 310
6 13 2 1
1 297 30
8 1 9 5 1 2 325 10 16 1 5 1 2 2 4 22 321
1
2 4
1
1 1
5
25 345 2
3 1 350
5
329 7 1
7 1
1 2
354
2217
samples from Dataset 2. The results are consolidated in Table 5. As expected, the improvement in recognition accuracies for Dataset 2 is desirably higher for Training set-II. The performance of the geometric and the Zernike moments has also improved notably. The performance of the Krawtchouk moment features is consistently superior to that of the other considered moments for both the training sets. The comprehensive scores for the results obtained using Training set-I and Training set-II are given in Tables 6 and 7 respectively. From the gesture-wise classification results obtained for Training set-I, it is difficult to perceptually relate the misclassified gestures. However, including more samples from different view points to the training set has improved the distinctness of the gestures. From the results for geometric moments in Table 7(a), it is observed that more misclassification has occurred for gestures 2, 3, 4 and 6. In the case of Zernike moments, the results in Table 7(b) show that gestures 6, 8 and 9 have less classification rates. As tabulated in Table 7(c), the maximum cases of misclassification for Krawtchouk moments are due to gesture 6. It is noted that gesture 6 is more prevalently misclassified as gesture 7. As stated earlier, the Zernike polynomials are global functions and hence, the misclassifications in the case of Zernike moments have occurred for the gestures with almost similar boundary profile. The Krawtchouk moments are region based features and are local functions whose support increases with the order of the polynomial. Hence, as explained before, the confusion occurred between the gestures with almost similar spatial distribution of pixels. The plots in Fig. 16 illustrate the classification accuracies of each gesture sign at different view angles. From these plots, it is inferred that the maximum cases of misclassifications have occurred at a view angle of 1351. Further, the results in Fig. 16 corroborate the observations in Table 7.
6. Conclusion This paper has presented a gesture recognition system using geometry based normalizations and Krawtchouk moment features for classifying static hand gestures. The proposed system is robust to similarity transformations and projective variations. A rule based normalization method utilizing the anthropometry of hand is formulated for separating the hand region from the forearm. The method also identifies the finger and the palm regions of the hand. An adaptive rotation normalization procedure based on the abducted fingers and the major axes of the hand is proposed. The 2D Krawtchouk moments are used to represent the segmented binary gesture image. The classification is performed using a minimum distance classifier. The experiments are aimed towards
Fig. 15. Samples of the test gestures from Dataset 2 that has less recognition accuracy with respect to all the methods.
2218
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
Fig. 16. Illustration of gesture-wise classification results obtained at different view angles. The training is performed using Training set-II and the results are shown for 290 testing samples (29 samples/gesture sign) at each view angle. (a) Classification results of geometric moments at different view angles. (b) Classification results of Zernike moments at different view angles. (c) Classification results of Krawtchouk moments at different view angles.
analyzing the accuracy of the Krawtchouk moments as features in user and view invariant static hand gesture classification. The experiments are conducted on a large database consisting of 10 gesture classes and 4230 gesture samples. A detailed study on the Krawtchouk moments based classification is conducted in comparison with the geometric moments and the Zernike moments. Based on the results, we conclude that the Krawtchouk moments are robust features for achieving viewpoint invariant and user independent recognition of static hand gestures. Future works based on this paper may involve utilizing the proposed anthropometry based extraction methods for normalizing the user variations in the hand gesture. Despite scale normalization, the user-dependency occurs due to the variation in the aspect ratio of the constituent regions of the hand. The anthropometry based extraction method is capable of separating the finger and the palm regions and hence, it can be used to achieve user-independence by normalizing the aspect ratio of the hand regions. Subsequent research can be extended to studying the efficiency of Krawtchouk moment features in recognizing occluded hand shapes and complex hand gestures as in sign-language communication.
Conflict of interest statement None declared. References [1] X. Teng, B. Wu, W. Yu, C. Liu, A hand gesture recognition based on local linear embedding, Journal of Visual Languages and Computing 16 (2005) 442–454. [2] D. Zhang, G. Lu, Review of shape representation and description techniques, Pattern Recognition 37 (2004) 1–19.
[3] M.-P. Dubuisson, A.K. Jain, A modified Hausdorff distance for object matching, in: Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 1, 1994, pp. 566–568. [4] J.-L. Coatrieux, Moment-based approaches in imaging. Part 2: invariance, IEEE Engineering in Medicine and Biology Magazine 27 (1) (2008) 81–83. [5] J. Flusser, T. Suk, B. Zitova´, Moments and Moment Invariants in Pattern Recognition, John Wiley & Sons, 2010. [6] R. Mukundan, S.H. Ong, P.A. Lee, Image analysis by Tchebichef moments, IEEE Transactions on Image Processing 10 (9) (2001) 1357–1364. [7] P.T. Yap, R. Paramesran, S.H. Ong, Image analysis by Krawtchouk moments, IEEE Transactions on Image Processing 12 (11) (2003) 1367–1376. [8] S.P. Priyal, P.K. Bora, A study on static hand gesture recognition using moments, in: Proceedings of the International Conference on Signal Processing and Communications (SPCOM), IEEE, 2010, pp. 1–5. [9] S.C.W. Ong, S. Ranganath, Automatic sign language analysis: a survey and the future beyond lexical meaning, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (6) (2005) 873–891. [10] K. Imagawa, S. Lu, S. Igi, Color-based hand tracking system for sign language recognition, in: Proceedings of the 3rd International Conference on Automatic Face and Gesture Recognition, IEEE, 1998, pp. 462–467. [11] K. Imagawa, H. Matsuo, R.-i. Taniguchi, D. Arita, S. Lu, S. Igi, Recognition of local features for camera-based sign language recognition system, in: Proceedings of the 15th International Conference on Pattern Recognition, vol. 4, IEEE, 2000, pp. 849–853. [12] S. Akyol, P. Alvarado, Finding relevant image content for mobile sign language recognition, in: Proceedings of the International Conference on Signal Processing, Pattern Recognition and Applications, 2001, pp. 48–52. [13] M.-H. Yang, N. Ahuja, M. Tabb, Extraction of 2D motion trajectories and its application to hand gesture recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (8) (2002) 1061–1074. [14] J.-C. Terillon, A. Piplr, Y. Niwa, K. Yamamoto, Robust face detection and hand posture recognition in color images for human-machine interaction, in: Proceedings of the 16th International Conference on Pattern Recognition, vol. 1, IEEE, 2002, pp. 204–209. [15] S.S. Ge, Y. Yang, T.H. Lee, Hand gesture recognition and tracking based on distributed locally linear embedding, Journal of Image and Vision Computing 26 (12) (2008) 1607–1620. [16] Y. Cui, J. Weng, Appearance-based hand sign recognition from intensity image sequences, Computer Vision and Image Understanding 78 (2) (2000) 157–176.
S. Padam Priyal, P.K. Bora / Pattern Recognition 46 (2013) 2202–2219
[17] M. Amin, H. Yan, Sign language finger alphabet recognition from Gabor-pca representation of hand gestures, in: Proceedings of the International Conference on Machine Learning and Cybernatics, vol. 4, IEEE, 2007, pp. 2218–2223. [18] William T. Freeman, M. Roth, Orientation histogram for hand gesture recognition, in: Proceedings of the 1st International Workshop on Automatic Face and Gesture Recognition, IEEE, 1995, pp. 296–301. [19] H. Zhou, D.J. Lin, T.S. Huang, Static hand gesture recognition based on local orientation histogram feature distribution model, in: Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, vol. 10, IEEE, 2004, p. 161. [20] J. Triesch, C. von der Malsburg, A system for person independent hand posture recognition against complex backgrounds, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (12) (2001) 1449–1453. [21] A. Just, Y. Rodriguez, S. Marcel, Hand posture classification and recognition using modified census transform, in: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, IEEE, 2006, pp. 351– 356. [22] D.-Y. Huang, W.-C. Hub, S.-H. Chang, Gabor filter-based hand pose angle estimation for hand gesture recognition under varying illumination, Expert Systems with Applications 38 (2011) 6031–6042. [23] T. Starner, J. Weaver, A. Pentland, Real-time American sign language recognition using desk and wearable computer based video, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (12) (1998) 1371–1375. [24] N. Tanibata, N. Shimada, Y. Shirai, Extraction of hand features for recognition of sign language words, in: Proceedings of the International Conference on Vision Interface, 2002, pp. 391–398. [25] B. Bauer, K.-F. Kraiss, Towards an Automatic Sign Language System using Subunits, Lecture Notes in Computer Science, vol. 2298, Springer-Verlag, 2001, pp. 64–75. [26] H.-S. Yoon, J. Soh, Y.J. Bae, H.S. Yang, Hand gesture recognition using combined features of location, angle and velocity, Pattern Recognition 34 (7) (2001) 1491–1501. [27] B. Bauer, K. Kraiss, Video based sign recognition using self-organizing subunits, in: Proceedings of the 16th International Conference on Pattern Recognition, vol. 2, IEEE, 2002, pp. 434–437. [28] S. Chandran, A. Sawa, Real-time detection and understanding of isolated protruded fingers, in: Proceedings of the Conference on Computer Vision and Pattern Recognition Workshop, vol. 10, IEEE, 2004, p. 152. [29] A. Chalechale, F. Safaei, G. Naghdy, P. Premaratne, Hand posture analysis for visual based human machine interface, in: Proceedings of the Workshop on Digital Image Computing, 2005, pp. 91–96. [30] C.-C. Chang, J.J. Chen, W.-K. Tai, C.-C. Han, New approach for static gesture recognition, Journal of Information Science and Engineering 22 (2006) 1047–1057. [31] L. Gu, J. Su, Natural hand posture classification based on Zernike moments and hierarchial classifier, in: Proceedings of the International Conference on Robotics and Automaton, IEEE, 2008, pp. 3088–3093. [32] T. McElroy, E. Wilson, G. Anspach, Fourier descriptors and neural networks for shape classification, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 5, 1995, pp. 3435–3438. [33] A. Licsar, T. Sziranyi, User-adaptive hand gesture recognition system with interactive training, Journal of Image and Vision Computing 23 (12) (2005) 1102–1114. [34] L. Gupta, S. Ma, Gesture-based interaction and communication: automated classification of hand gesture contours, IEEE Trans. on Systems, Man and Cybernatics, Part C: Applications and Reviews 31 (1) (2001) 114–120.
2219
[35] F. Mokhtarian, R. Suomela, K.C. Chan, Image point feature detection through curvature scale space, in: IEEE 7th International Conference on Image Processing and Its Applications, vol. 1, IEEE, 1999, pp. 206–210. [36] S. Kopf, T. Haenselmann, W. Effelsberg, Shape-based posture and gesture recognition in videos, in: Proceedings of the SPIE, vol. 5682, 2005, pp. 114–124. [37] C.-C. Chang, Adaptive multiple sets of CSS features for hand posture recognition, Neurocomputing, Elsevier 69 (16–18) (2006) 2017–2025. [38] Z. Liang, N. Sum, M. Cao, Recognition of static human gesture based on radiant projection transform and Fourier transform, in: Proceedings of the 2008 Congress on Image and Signal Processing, vol. 4, IEEE, 2008, pp. 635–640. [39] E. Sa´nchez-Nielsen, L. Anto´n-Canalı´s, M. Herna´ndez-Tejera, Hand gesture recognition for human-machine interaction, Journal of WSCG 12 (1-3) (2004) 395–402. [40] V.S. Rao, C. Mahanta, Gesture based robot control, in: Proceedings of 4th International Conference on Intelligent Sensing and Information Processing (ICISIP), 2006, pp. 145–148. [41] J.M.S. Dias, P. Nande, N. Barata, A. Correia, O.G.R.E.–open gesture recognition engine, A platform for gesture based communication and interaction, Lecture Notes in Artificial Intelligence, Springer-Verlag, vol. 3881, 2006, pp. 129–132. [42] D. Kelly, J. McDonald, C. Markham, A person independent system for recognition of hand postures used in sign language, Pattern Recognition Letters 31 (2010) 1359–1368. [43] R. Mukundan, K.R. Ramakrishnan, Moment Functions in Image Analysis: Theory and Applications, World Scientific Publishing Co.Pte.Ltd., 1998. [44] M.R. Teague, Image analysis via the general theory of moments, Journal of Optic Society of America 70 (1962) 920–930. [45] L. Zhu, J. Liao, X. Tong, L. Luo, B. Fu, G. Zhang, Image analysis by modified Krawtchouk moments, Lecture Notes in Computer Science, Springer-Verlag, vol. 5553, 2009, pp. 310–317. [46] B. Bayraktar, T. Bernas, J. Robinson, B. Rajwa, A numerical recipe for accurate image reconstruction from discrete orthogonal moments, Pattern Recognition 40 (2) (2007) 659–669. [47] R. Koekoek, R. Swarttouw, The Askey-Scheme of hypergeometric orthogonal polynomials and its q-analogue, Technical report, Technische Universiteit Delft Faculty of Technical Mathematics and Informatics Report 98-17, Delft, Netherlands, 1998. [48] H. Breu, J. Gil, D. Kirkpatrick, M. Werman, Linear time Euclidean distance transform algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (5) (1995) 529–533. [49] R. Fabbri, L. da F. Costa, J. C. Torelli, O. M. Bruno, 2D Euclidean distance transform algorithms: a Comparative survey, ACM Computing Surveys 40 (1) (2008) pp. 2:1–2:44. [50] S. Pheasant, Bodyspace: anthropometry, ergonomics and the design of work, Taylor and Francis Routledge, 1996. [51] A. Nag, P. Nag, H. Desai, Hand anthropometry of Indian women, The Indian Journal of Medical Research 117 (2003) 260–269. [52] T. Kanchan, P. Rastogi, Sex determination from hand dimensions of north and south Indians, Journal of Forensic Sciences 54 (3) (2009) 546–550. [53] C. de Boor, A Practical Guide to Splines, revised ed., Springer-Verlag, 2001. [54] F. Hunter, P. Fuqua, Light Science and Magic: An Introduction to Photographic Lighting, 2nd ed., Elsevier, 1997. [55] K. Milburn, Digital Photography Expert Techniques, O’Reilly Media, 2004. [56] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Transactions on Image Processing 13 (4) (2004) 600–612.
S. Padam Priyal received the B.Eng. degree in Electronics and Communication Engineering from Karunya Institute of Technology, Coimbatore, India, in 2002 and the M.Eng. degree in Communication Systems from the Mepco Schlenk Engineering College, Sivakasi, India, in 2004. Currently, she is a Research Scholar in the Department of Electronics and Electrical Engineering, Indian Institute of Technology, Guwahati. Her research interests include computer vision, pattern recognition and image analysis.
Prabin Kumar Bora received the B.Eng. degree in Electrical Engineering from Assam Engineering College, Guwahati, India, in 1984 and the M.Eng. and Ph.D. degrees in Electrical Engineering from the Indian Institute of Sciences, Bangalore, in 1990 and 1993 respectively. Currently, he is a Professor in the Department of Electronics and Electrical Engineering, Indian Institute of Technology, Guwahati. Previously, he was a Faculty Member with Assam Engineering College, Guwahat; Jorhat Engineering College, Jorhat, India; and Jorhat and Gauhati University, Guwahati. His research interests include computer vision, pattern recognition, video coding, image and video watermarking and perceptual video hashing.