Gait recognition for human identification based on ICA and fuzzy SVM through multiple views fusion

Gait recognition for human identification based on ICA and fuzzy SVM through multiple views fusion

Pattern Recognition Letters 28 (2007) 2401–2411 www.elsevier.com/locate/patrec Gait recognition for human identification based on ICA and fuzzy SVM th...

1MB Sizes 2 Downloads 60 Views

Pattern Recognition Letters 28 (2007) 2401–2411 www.elsevier.com/locate/patrec

Gait recognition for human identification based on ICA and fuzzy SVM through multiple views fusion Jiwen Lu a

a,*

, Erhu Zhang

b

School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore b Department of Information Science, Xi’an University of Technology, Xi’an 710048, China Received 1 October 2005; received in revised form 25 July 2007 Available online 19 August 2007 Communicated by G. Sanniti di Baja

Abstract This paper proposes a gait recognition method using multiple gait features representations based on independent component analysis (ICA) and genetic fuzzy support vector machine (GFSVM) for the purpose of human identification at a distance. Firstly, the moving human figures are subtracted using simple background modeling to obtain binary silhouettes. Secondly, these silhouettes are characterized with three kinds of gait representations including Fourier descriptor, wavelet descriptor and pseudo-Zernike moment. Then, ICA and GFSVM classifier are chosen for recognition and the method is tested on two gait databases. Comparative performance between these feature representations is investigated and better performance has been achieved than either one individually. Meanwhile, one multiple views fusion recognition approach on the decision level based on product of sum (POS) rule is introduced to overcome the limitation of most single view recognition methods, which achieves better performance than the traditional rank-based fusion rules. Experimental results show that our method has encouraging recognition accuracy.  2007 Elsevier B.V. All rights reserved. Keywords: Gait recognition; ICA; Genetic fuzzy SVM; Multiple feature representations; Multiple views fusion

1. Introduction The demand for automatic human identification system is strongly increasing and growing in many important applications, especially at a distance and it has recently gained great interest from the pattern recognition and computer vision researchers for it is widely used in many security-sensitive environments such as banks, parks and airports. Biometrics is a new powerful tool for reliable human identification and it makes use of human physiology or behavioral characteristics such as face, iris, fingerprints and hand geometry for identification. However, these biometrics methodologies are either instructive or restricted to many controlled environments. For example, most face recognition methods are capable of recognizing *

Corresponding author. Tel.: +65 67906547. E-mail address: [email protected] (J. Lu).

0167-8655/$ - see front matter  2007 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2007.08.004

only frontal or nearly frontal faces, other biometrics such as fingerprint and iris are no longer applicable when the persons suddenly appear in the surveillance. Therefore, new biometrics recognition methods are strongly needed in many surveillance applications, especially at a distance. As a new behavioral biometric, gait recognition aims at identifying the person by the way he or she walk. Compared with the first generational biometrics such as face, fingerprints and iris which are widely applied in some commercial and low applications, gait has great prominent advantages of being non-contact, non-invasive, unobvious, low resolution requirement and it is the only perceivable biometric feature for human identification at a distance till now though it is also affected by some factors such as drunkenness, pregnancy and injuries involving joints. Unlike face, gait is also difficult to conceal and has great potential applications in many situations especially for human identification at a distance.

2402

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

Although gait recognition is a new research field, there have been some studies and researches in recent literatures (Wang et al., 2003; Foster et al., 2003; Wagg and Nixon, 2004; Mowbray and Nixon, 2003; Yu et al., 2004; Lee, 2002; Lu et al., 2005, 2006; Zhang et al., 2005). Currently, gait recognition approaches can be mainly classified into two classes, namely holistic-based methods (Wang et al., 2003; Mowbray and Nixon, 2003; Yu et al., 2004; Lu et al., 2005, 2006; Zhang et al., 2005) and model-based methods (Foster et al., 2003; Wagg and Nixon, 2004; Lee, 2002). Model-based methods aim to model human body by analysis of the parts of body such as hand, torso, thigh, legs, and foot and perform model matching in each frame of a walking sequence to measure these parameters. As the effectiveness of model-based techniques, especially in human body modeling and parameter recovery from a walking sequence at the current stage is still limited, most existing gait recognition methods are holistic-based, i.e. motion-based. Like pervious holistic-based algorithms, we also consider gait being composed of a sequence of body poses and recognize it by the similarity of these body poses and silhouettes with low computational cost. It is well-known that the use of single gait representation has become the bottleneck in producing high performance. Therefore, an ideal gait-based personal identification system should be able to reliably recognize individuals using all of the available discriminative information. Based on this assumption, this paper proposes a gait recognition method for human identification using a combination of three representations and achieves better performance than either one individually. Meanwhile, an improved combination rule, i.e. product of sum (POS) rule, is proposed, which can achieve high performance than the traditional rank-based fusion rules. Finally, one multiple views information fusion approach on the decision level based on the POS rule is adopted to overcome the limitation of most single view recognition methods. The paper is organized as follows. Section 2 describes gait feature extraction and transformation in details. Section 3 contains the principal of GFSVM and Sections 4 gives the fusion strategy. Experimental results and analyses appear in Section 5. Finally the conclusions of this work are summarized in Section 6. 2. Feature extraction and representation Before training and recognition, each gait sequence involving one walking figure is converted into a sequence of signals which are from the Fourier frequency, wavelet domain and pseudo-Zernike moment at this preprocessing stage. 2.1. Segmentation of human motion Human segmentation is the first step of our method and plays a key role in the whole gait recognition system. To extract these silhouettes of walking figures from the back-

ground, one simple motion detection approach using the median value is adopted to construct the background image from a small portion of video sequence including moving objects. Let P represents a sequence including N frames. The resulting background p(x, y) can be computed as follows: pðx; yÞ ¼ med½p1 ðx; yÞ; p2 ðx; yÞ; . . . ; pN ðx; yÞ ð1Þ The value of p(x, y) is the background brightness to be computed in the location of pixel (x, y) and med represents its median value. Here the value of N is 60 for each sequence in our system. The median value is taken rather than mean value of the pixel intensities over N frames as the mean value will be distorted by the large change in pixel intensities when the person moves past that pixel while the median is unaffected by spurious values. The assumption made in this step is that the person does not stand still over the frames which are analyzed as in that case the background extraction will classify the person as a part of background and there is just only one moving person in our scene. It should be noted that there do not exist a perfect image segmentation algorithm to segment the sequence images effectively at present. Here we adopt traditional histogram method to segment the foreground. For each image, the changing pixels can be detected by a suitable threshold T decided by traditional histogram and then we can easily obtain human silhouette by  1 if jpi ðx; yÞ  pðx; yÞj P T Dxy ¼ i ¼ 1; 2; . . . ; N 0 if jpi ðx; yÞ  pðx; yÞj < T ð2Þ It also should be noted that this process is independent for each color component channels (i.e. Red, Green and Blue) in each frame of gait image. For each given pixel, if one of the three components accords with Eq. (2), it will be determined as a foreground pixel. This assumption will be tenable in most cases except when the direction of the walking figure is consistent with the camera’s view. In this case, though the person walking in the scene, there is little distance variety in the 2D gait sequences. So, another background modeling method named Gaussian background model is adopted here. Supposing a short section of video which does not contain any moving figure has been obtained, the background can be extracted using statistical modeling technique. Here we make a hypothesis that the three color channels are independent, the mean and variance values can be obtained. Supposing lR, lG and lB are the mean values, r2R , r2G and r2B are the variance values, then we can segment the moving figures using through the following operator: 8 1 if jpi ðx; y; rÞ  pðx; y; rÞj P 2rR > > > > > or jpi ðx; y; gÞ  pðx; y; gÞj P 2rG > > < or jp ðx; y; bÞ  pðx; y; bÞj P 2r B i i ¼ 1; 2; . . . ; N Dxy ¼ > 0 if jp ðx; y; rÞ  pðx; y; rÞj < 2r R i > > > > or jpi ðx; y; gÞ  pðx; y; gÞj < 2rG > > : or jpi ðx; y; bÞ  pðx; y; bÞj < 2rB ð3Þ

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

where pi(x, y, r), pi(x, y, g) and pi(x, y, b) are the meaning of red, green and blue light values of their corresponding channels in the location (x, y). After that, there still exist some noises in the foreground. Here, we use one simple shadow elimination method based on RGB space model to eliminate the shadow caused by the light and human motion. For one random pixel in the gait image, one vector Ib = [Rb, Gb, Bb] represents the background color model, and another vector I = [R, G, B] represents the current pixel’s color model, which can be seen in Fig. 1. The difference C between the two vectors can be calculated as follows:   OI  OI b C ¼ arccos ð4Þ jOIj  jOI b j Preset one fixed color difference threshold Tc, for each foreground pixel which is segmented through Eq. (2) or (3), we can eliminate the shadow by  1 if Dxy ¼ 1 and C xy < T c M xy ¼ ð5Þ 0 if Dxy ¼ 0 or C xy P T c After the above shadow elimination step, there may exist some small regions and noises, so several filters such as erosion, dilation, connected component analysis and tradi-

Fig. 1. Color calculation model.

2403

tional edge tracking are be adopted to obtain one connected edge. One example of background subtraction can be seen in Fig. 2 from (a) to (f). One gait databases, named NLPR gait dataset, are selected here as the data set for our gait experiments. 2.2. Gait feature extraction and representation An important factor affecting gait recognition is how to represent human silhouettes and characterize the features. To let our method be insensitive to changes of color and texture of clothes, we only use the binary silhouettes. After obtaining human silhouettes, each frame of these silhouettes is further applied to extract features using Fourier descriptor, wavelet descriptor and pseudo-Zernike moment, respectively. 2.2.1. Extraction of Fourier descriptor features Fourier descriptors have long been established and proved as a good method for representing a two-dimensional shape’s boundary and its major advantage is when representing a shape in Fourier domain, its frequency component can be easily obtained (Wagg and Nixon, 2004; Mowbray and Nixon, 2003; Zhang et al., 2005). The general features of the shape are located in the lower frequencies while the details features are located in the higher frequencies. The steps of using discrete Fourier descriptors to describe human silhouettes are as follows: (1) Each counter is set in the complex plane and its centroid is set as the origin complex. Each point on the counter can be represented by a complex number si = xi + j · yi (i = 0, 1, 2, . . . , N  1), where N is the number of counter points. (2) Select the same number of point to represent each counter and unwrap each counter counterclockwise from the top of the counter and convert in into a

Fig. 2. One example of gait image preprocessing, here is one frame of gait image of NLPR gait database.

2404

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

complex vector [s0, s1, . . . , sN1]. Therefore each gait sequence is transformed into a sequence of complex vectors with the same dimensions and the value of N in our experiment is 256. (3) For a N-length vector, Fourier descriptors can be easily obtained through Fourier transform as an ¼

N 1 X ni si e2jpN ; N i¼1

for i ¼ 0; 1; . . . ; N  1

ð6Þ

ja2 j ja2 j jaN 1 j F ¼ ; ;...; ja1 j ja1 j ja1 j

where W m;n ðtÞ ¼ 2m=2 W ð2m t  nÞ

So Fourier descriptors used for recognition can be represented as 

(3) Each gait sequence is transformed into a sequence of complex vectors with the same dimensions and the value of N in our experiment is 256. (4) Wavelet descriptors can be easily obtained through Z ð9Þ Dm;n ¼ hf ; W m;n i ¼ f ðtÞW m;n ðtÞdt

 ð7Þ

As we know, most energy of human silhouettes is concentrated in low frequency and we can ignore the high frequencies which does not contain much energy. For computational convenience, we only select 64 points of lowest frequency components for the least loss of silhouettes. One example of describe one frame of human gait silhouette using Fourier descriptor can be seen in Fig. 3. 2.2.2. Extraction of wavelet descriptor features Wavelet descriptors have been proved as another good method for representing a two-dimensional shape’s boundary and its major advantage is that has strong robust against rotation, scale and linear transformation (Lu et al., 2006). When one shape is represented in wavelet domain, their spatial and frequency component can be easily obtained synchronously. Now, we apply discrete wavelet descriptors to describe the human silhouettes. (1) Like Fourier descriptor, each point on the counter can be represented by a complex number si = xi + j · yi, (i = 0, 1, 2, . . . , N  1) where N is the number of counter points. (2) Select the same number of points to represent each counters and unwrap each counter counterclockwise from the top of the counter and converted into one dimension vector [d1, d2, . . . , dN], di can be computed as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 ð8Þ d i ¼ ðxi  xc Þ þ ðy i  y c Þ where (xc, yc) is the centroid of the human boundary.

For computational convenience, we only select 32 points of lowest frequency components for the least loss of silhouettes. One example of wavelet descriptor describing human boundary can be seen as Fig. 4. 2.2.3. Extraction of pseudo-Zernike moment features Pseudo-Zernike moments are used in several pattern recognition applications such as feature descriptors of the image shape and have proven to be superior to other moment functions such as Zernike moments in terms of their feature representation capabilities (Kumar and Singh, 2005). Therefore, we apply pseudo-Zernike moments to extract the feature of gait sequences. The kernel of pseudo-Zernike moments is the set of orthogonal pseudoZernike polynomials defined over the polar coordinates inside a unit circle. The two-dimensional pseudo-Zernike moments of order p with repetition q of an image intensity function f(q, h) are defined as Z pþ1 1 Z pq ¼ Rpq ðqÞejqh f ðq; hÞq dq dh ð11Þ p 0 pjqj X ð2p þ 1  sÞ! qps ð1Þs Rpq ðqÞ ¼ s!ðp þ jqj  1  sÞ!ðp  jqj  sÞ! i¼0 ð12Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi where q ¼ x2 þ y 2 , h = tan1(y/x), 1 < x, y < 1 and 0 6 jqj 6 p, p P 0. Since it is easier to work with real functions, Zpq is often split into its real part Z re pq and imaginary part Z im : pq Z Z 2ðp þ 1Þ 1 p Z re ¼ Rpq ðqÞ cosðqhÞf ðq; hÞq dq dh ð13Þ pq p 0 p Z Z 2ðp þ 1Þ 1 p ¼ Rpq ðqÞ sinðqhÞf ðq; hÞq dq dh ð14Þ Z im pq p 0 p where p P 0 and q < 0.

Fig. 3. Feature representation using Fourier descriptor.

ð10Þ

Fig. 4. Feature representation using wavelet descriptor.

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

2405

Table 1 Feature representation using pseudo-Zernike moment

0 1 2 3 4 5 6 7 8

0

1

2

3

4

5

6

7

8

0.3183 0.2185 0.1234 0.0415 0.0398 0.0573 0.0748 0.2154 0.1175

0.1906 0.2936 0.1222 0.0763 0.0585 0.0790 0.1971 0.0509

0.1309 0.0703 0.1250 0.0874 0.1453 0.1796 0.0359

0.2378 0.3276 0.0507 0.0999 0.0319 0.1361

0.0623 0.2060 0.1676 0.0071 0.1462

0.1360 0.1278 0.0575 0.1202

0.1164 0.2492 0.1209

0.0387 0.0685

0.0940

For each frame of human gait silhouettes and one given p, we can obtain the pseudo-Zernike moments values. Table 1 gives the pseudo-Zernike moments values of Fig. 3(a), where the value of p is 8 in our experiments. 2.3. Feature transformation and reduction using ICA There existing many feature reduction and compression methods proposed by early researchers, several typical methods are principal component analysis (PCA) (Kumar and Zhang, 2005; Cao et al., 2003; Karhunen et al., 1998) and Kernel principal component analysis (KPCA), singular value decomposition (SVD) and Hidden Markov Models (Kale, 2002; Kim et al., 2002). At this stage, we will extract and train gait features using ICA. The concept of ICA can be seen as a generational of principal component analysis (PCA) and its basic idea is to represent a set of random variables using basis functions, where the components are statistically independent or as independent as possible (Hyvarinen and Oja, 2000). Let us denote the observed variables xi as a vector with zero-mean random variable X = (x1, x2, . . . , xm)T, the component variables si as a vector S = (s1, s2, . . . , sn)T with the model AS X ¼ AS

ð15Þ

where A is unknown m · n matrix of full rank, called the mixing or feature matrix. The columns of A represent transformed gait features. For reducing computational cost, an algorithm named FastICA (Hyvarinen, 1999). using a fix-point iteration algorithm is introduced. Applying FatstICA on feature transformation, the random variables are the training normalized feature data of gait images. We select 60 contour images for each class to construct the matrix X and make use of the fixed-point algorithm to calculate matrices A and S. Let x0i be a feature data of one contour image, we can construct a training distance set fx01 ; x02 ; . . . ; x0m g with m random variables which are assumed to be linear component of n unknown ICs, denoted by fs01 ; s02 ; . . . ; s0n g. According to the ICA theory, the matrix S contains all the independent components, which are calculated from a set of training distances. The matrix AS can reconstruct the original signal X and we select some ICs from A in

the way that the ratio of the within-class scatter and between-class scatter is minimized to reduce the computational cost and extract the most useful and discriminative ICs (Yuen and Lai, 2002). The method is proposed as follows. If the matrix X contains n individual persons and each person has m frames images, aij represents the entry at the ith row and the jth column. The value SBj, which is called as the mean of within-class distance in the jth column, is then given by n X m X m X 1 2 ðaði1Þmþu;j  aði1Þmþv;j Þ nmðm  1Þ i¼1 u¼1 v¼1

SBj ¼

ð16Þ

The value SIj, which is called as the mean of between-class distance in the jth column: n X n X 1 2 ða0  a0t;j Þ nðm  1Þ s¼1 t¼1 s;j

SBj ¼

ð17Þ

where a0i;j ¼

m 1 X aði1Þmþu;j m u¼1

ð18Þ

In this paper, we employ the ratio of within-class distance and between-class distance to select stable mixing feature from A. The ratio cj is defined as cj ¼

SBj SI j

ð19Þ

From the definition cj, the smaller cj is, the better the classifier will be. Using Eq. (19), we choose the smallest cj and select the top k (k < n) column features from A and S. 3. The principle of GFSVM Gait recognition is a traditional pattern classification problem which can be solved by measuring similarities between the training database and the test sequence. The classification procedure is carried out through three different methods, namely the nearest neighbor (NN), support vector machine (SVM) and genetic fuzzy support machine (GFSVM) classifier derived from the ICs. NN classifier is a very simple classifier and we use the Euclidean distance to evaluate the discriminatory of two gait sequences. SVM

2406

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

classifier is based on structural risk minimization, which is the expectation of the test error for the trained machine. The risk is represented as R(a), R(a) being the parameters of the trained machine (Deniz et al., 2003; Kecman, 2001). Let n being the number of trained patterns and 0 6 g 6 1, with probability 1  g, the following bound on the expected risk holds ffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  h log 2lh þ 1  log g4 RðaÞ 6 Remp ðaÞ þ ð20Þ l Remp(a) is the empirical risk and hs the VC dimension. SVM tries to minimize the second term of (20), for a fixed empirical. As we know, there also exist some problems when the outputs of all SVMs are 1. In this situation, we cannot finish recognition successfully. There are two ways to solve this problem: one is using NN classifier to further recognition, another is using fuzzy SVM. When the outputs of all SVMs are 1, NN–SVM method calculates the distances between the test data and the classification superior plane and classifies the test data to the sort which is the nearest to it is the classification superior plane. This method also has an obvious disadvantage as the distances between the test data and the classification hyperplane are not comparable as they are obtained from different SVMs. Therefore, some improved SVM methods are proposed in recent research, one typical approach is fuzzy support vector machine (FSVM) (Li et al., 2002; Lin and Wang, 2002, 2004). The basic idea of FSVM is as follows: Supposing the training set S : (x1, y1, s1), (x2, y2, s2), . . . , (xl, yl , sl), xi is the training data and xi 2 RN, yi is its original class label, si is the fuzzy membership function and a fuzzy membership r 6 si 6 1 with i = 1, 2, . . . , l, r is a small positive number. Let z(u) denote the corresponding feature space with a mapping u from RN to a feature space Z. Since the fuzzy membership is the attitude of the corresponding point xi toward one class and the parameter is a measure of error in the SVM, the term is a measure of error with different weighting. The optimal hyperplane problem is then regarded as the solution to min subject to :

l X 1 xxþC s i ni 2 i¼1

y i ðx  zi þ bÞ P 1  ni ; i ¼ 1; 2; . . . ; l; ni P 0

where C is a constant. It is noted that a smaller si reduces the effect of parameter ni in (21) such that the corresponding point xi is treated less important. One typical method to solve this quadratic programming (QP) problem is applying Lagrange operator, as many operators are needed and much calculation cost will be brought, therefore, genetic algorithm (GA) is adopted to solve the above nonlinear optimal problem, which can obtain the values of FSVM parameters. There are four typical parameters including population size, selection probability, crossover probability and mutation probability in GA optimization, whose values are 100, 0.8, 0.5 and 0.02, respectively, in this experiment. The chart-flow of GFSVM can be seen as Fig. 5. 4. Fusion strategies As one kind of gait feature is not so robust and steady that recognition using single feature cannot provide enough information for us, therefore, multiple feature fusion recognition is very important for one gait-based identification system. Meanwhile, human gait collected under different views can provide much different information, which is very crucial to gait recognition, thereby gait recognition through different views fusion is also vital, which can overcome the limitation of most single view recognition method. 4.1. Multiple feature fusion The fusion strategy aims at improving the classification performance than single gait representation alone (Kumar and Zhang, 2005). Due to the large and varying dimension of gait feature vectors, the fusion approach at feature level has not been considered in our work. We propose two kinds of level fusion approaches from different kinds of gait features and multiple views. Let DFourier(t, te), DWavelet(t, te) and DMoment(t, te) denote the matching results produced by Fourier descriptor, wavelet descriptor and pseudo-Zernike moment classifiers, respectively. The combined matching score D(t, te) using the fusion rules can be obtained as follows: Dðt; te Þ ¼ CfDFourier ðt; te Þ; DWavelet ðt; te Þ; DMoment ðt; te Þg

ð21Þ

ð22Þ

where C is the selected fusion rule, which represents maximum, sum, product or minimum. t represents the test sample while te is the recognition result on its single level. One

Fig. 5. Block diagram of GFSVM.

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

shortcoming of rank-based fusion rule is the assumption may be poor as this rule requires individual classifiers are independent, which will be poor, especially for the Fourier descriptor- and wavelet descriptor-based features. Therefore sum rule can be better alternative for consolidating matching scores while combining Fourier descriptor- and wavelet descriptor-based features. These consolidated matching scores can be further combined with pseudo-Zernike matching scores using product rule is estimated to perform better on the assumption of independent data representation. Therefore, unlike previous work, we propose one approach to gait recognition by simultaneous use of multiple gait representations with the best pair of fixed combination rules, which perform sum between Fourier descriptor and wavelet descriptor and then perform product with pseudo-Zernike moment feature. The main reason of this fusion method is that Fourier descriptor and wavelet descriptor have some relativity. The block diagram of the fusion method is shown in Fig. 6. 4.2. Multiple views fusion Currently, most gait recognition approaches are view independent, i.e. recognizing human gait from separate

2407

view. As we know, there contains different gait information in different views and there exist many limitation when recognizing human from single view. Therefore, it is necessary to fuse human gait information from different views to improve the efficiency of gait recognition. Here, we perform gait information fusion from multiple views besides multiple feature fusion on the decision level. As pseudoZernike moment is the best from the result of single feature, we select it as feature in different views. The block diagram of it is shown in Fig. 7. 5. Experimental results and analyses 5.1. Gait database Two public gait databases, namely Chinese National Laboratory of Pattern Recognition (NLPR) and Xi’an University of Technology (XAUT) databases are chosen to evaluate the capability of the proposed method. Here NLPR database includes 20 subjects while XAUT database contains 50 subjects, both of them have four sequences for each view angle and there are three angles, namely laterally, obliquely and frontally. In our experiment, we employ two sequences for training and the left two for testing.

Fig. 6. Block diagram of experimental setup for three kinds of gait feature fusion recognition.

Fig. 7. Block diagram of experimental setup for three kinds of gait feature fusion recognition.

2408

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

Fig. 8. Some sample images in NLPR gait database.

Fig. 9. Some sample images in XAUT gait database.

Some sample images in the NLPR, whose sizes is 352 · 240 can be seen in Fig. 8 while some other sample images in the XAUT, whose sizes is 320 · 240 can be seen in Fig. 9.

the averaging associated with the mean shape analysis owing to less severe shape variations in such gait patterns.

5.2. Recognition result of single feature and view

5.3. Fusion recognition result

Three kinds of feature such as Fourier descriptor, wavelet descriptor and pseudo-Zernike moment are extracted and the recognition are implemented in its single feature and view. Here, we use the NLPR and XAUT database to estimate the identification performance of the proposed method. The performance of each classifier using individual feature including Fourier descriptor, wavelet descriptor and pseudo-Zernike moment and GFSVM under different view angles are shown in Fig. 10(a)–(f), respectively, and the correct classification rates (CCR) are summarized in Table 2, where (a)–(c) are the results obtained on NLPR database and (d)–(f) are the results obtained in XAUT database. The goal of this experiment is to find the best gait feature representation among the three feature extraction approaches and the best view angle to collect human gait information. The CCR is calculated as follows:

Firstly, multiple feature fusion is performed in three separate views, i.e. lateral, oblique and frontal views and the fusion is implemented as the rule given in Fig. 6. Here, three kinds of gait features are fused on the decision level based on single decision result. Fig. 11 gives the fusion results in NLPR and XAUT database respectively, which have improved the recognition results greatly. Another useful classification performance measure that is probably more general than CCR is the rank order statistic, which was first introduced by the FERET protocol for the evaluation of face recognition algorithms (Zhang et al., 2003). It is defined as the cumulative probability that the real class of a test measurement is among its top matches. The performance statistics are reported as the cumulative match cores. By varying the decision threshold for the acceptance, various combination pairs of FAR and FRR are obtained. Fig. 12 shows the ROC curve in NLPR and XAUT database, from which we see that the EERs (Equal Error Rate) are about 13%, 16%, and 19% for NLPR database and 14%, 17% and 19% for XAUT database under 0, 45, and 90 views, respectively. Secondly, from the recognition result of single feature, we can see that pseudo-Zernike moment is the best among the there features and we apply it as gait feature and perform multiple views fusion. As there are lots of shape varieties of human gait in lateral and oblique views and the two

CCR ¼ N C =N  100%

ð23Þ

where NC is the total number of correct recognition samples while N is the number of the total gait samples. From the above figure and table, it can be seen that the recognition performance using pseudo-Zernike moment is the best feature among the three features, while wavelet descriptor is the second and Fourier is the worse. Meanwhile, the recognition performance under the frontal walking is better than other two views. This is probably due to

1

1

0.95

0.95

0.9

0.9 Coumulative Match Score

Coumulative Match Score

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

0.85 0.8 0.75 0.7

0.85 0.8 0.75 0.7

0 Degree, NLPR database 45 Degree, NLPR database 90 Degree, NLPR database

0.65

0.6 2

4

6

8

10 Rank

12

14

16

18

20

2

(a) Result of Fourier descriptor under NLPR database

4

6

8

10 Rank

12

14

16

18

20

(d) Result of Fourier descriptor under XAUT database

1

1

0.95

0.95

0.9

0.9 Coumulative Match Score

Coumulative Match Score

0 Degree, XAUT database 45 Degree, XAUT database 90 Degree, XAUT database

0.65

0.6

0.85 0.8 0.75 0.7

0.85 0.8 0.75 0.7

0 Degree, NLPR database 45 Degree, NLPR database 90 Degree, NLPR database

0.65

0 Degree, XAUT database 45 Degree, XAUT database 90 Degree, XAUT database

0.65

0.6

0.6 2

4

6

8

10 Rank

12

14

16

18

20

2

(b) Result of wavelet descriptor under NLPR database 1

1

0.95

0.95

0.9

0.9

0.85 0.8 0.75 0.7

4

6

8

10 Rank

12

14

16

18

20

(e) Result of wavelet descriptor under XAUT database

Coumulative Match Score

Coumulative Match Score

2409

0.85 0.8 0.75 0.7

0 Degree, NLPR database 45 Degree, NLPR database 90 Degree, NLPR database

0.65

0 Degree, NLPR database 45 Degree, NLPR database 90 Degree, NLPR database

0.65

0.6

0.6 2

4

6

8

10 Rank

12

14

16

18

20

(c) Result of pseudo-Zernike moment under NLPR database

2

4

6

8

10 Rank

12

14

16

18

20

(f) Result of pseudo-Zernike moment under XAUT database

Fig. 10. Recognition result using different feature, respectively.

views have some correlative information in one 2D image, we choose sum rule for lateral and oblique views and then combine product rule with frontal view. Here, two fusion strategies including the traditional rank-based fusion rule and the POS rule, which can be seen in Fig. 7, are performed in this experiment and the result is given in Table 3.

From the above results, we can draw some conclusions as follows: (1) From the viewpoint of the top recognition accuracy, the lateral view is the best and the frontal view is the worst.

2410

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

Table 2 The CCR of different classifiers using different features in three gait databases UMD (%)

NLPR (%)

XAUT (%)

Fourier descriptor NN 100 SVM 100 GFSVM 100

82.5 85.8 90.0

77.5 78.3 80.0

Wavelet descriptor NN 100 SVM 100 GFSVM 100

83.3 87.5 90.8

78.3 82.8 83.2

Pseudo-Zernike moment NN 100 SVM 100 GFSVM 100

84.2 88.3 92.5

80.0 84.8 87.3

0 Degree 45 Degree 90 Degree

0.9 0.8 0.7 False Reject Rate

Classifier

1

0.6 EER

0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4 0.5 0.6 0.7 False Accepance Rate

0.8

1

(a) Result of NLPR database 1

1

0 Degree 45 Degree 90 Degree

0.9

0.95 0.8

0.9

0.7 False Reject Rate

Coumulative Match Score

0.9

0.85 0.8 0.75

0.65

4

6

8

10 Rank

12

14

16

18

0.4

0.2 0.1 0

0.6 2

EER

0.5

0.3

0 Degree, Before Fusion 45 Degree, Before Fusion 90 Degree, Before Fusion 0 Degree, After Fusion 45 Degree, After Fusion 90 Degree, After Fusion

0.7

0.6

0

0.1

20

0.2

0.3

0.4 0.5 0.6 0.7 False Accepance Rate

0.8

0.9

1

(b) Result of XAUT database

(a) Result of NLPR database Fig. 12. ROC curve of different gait database under different various views.

1

Table 3 Fusion result of different gait database using different fusion strategies

0.95

Coumulative Match Score

0.9 0.85 0.8 0.75 0 Degree, Before Fusion 45 Degree, Before Fusion 90 Degree, Before Fusion 0 Degree, After Fusion 45 Degree, After Fusion 90 Degree, After Fusion

0.7 0.65 0.6 2

4

6

8

10 12 Rank

14

16

18

20

(b) Result of XAUT database Fig. 11. Recognition result using multiple features fusion in two different databases.

(2) Multiple feature views fusion recognition is better than any single feature and view recognition, which has improved the recognition accuracy effectively. Meanwhile, POS fusion method is better than rankbased method.

Fusion strategies

CCR (correct classification rate) (%)

NLPR Rank-based POS-based

92.5 95.0

XAUT Rank-based POS-based

86.0 92.0

(3) The larger the database is becoming, the lower correct classification is obtained. The main reason is that as the gait database becomes larger, the similar possibility of two sequences becomes higher, which is the direct reason effecting the recognition result. The best method to overcome this problem is finding more better and effective similar measurement of two sequences. 6. Conclusions and future work This paper has proposed one simple gait recognition method based on human silhouettes using multiple feature

J. Lu, E. Zhang / Pattern Recognition Letters 28 (2007) 2401–2411

representations and independent component analysis, and presented one gait recognition method using multiple features and views fusion based on GFSVM. From the analysis, recognition performance between multiple feature representations has been better either one individually and multiple views fusion recognition approach can overcome the limitation of most single view recognition methods, which can improve the recognition accuracy effectively. Although our recognition accuracy is comparatively high and encouraging, we still cannot conclude much about gaits. To provide a general approach to automatic human identification based on gait in real environments, much still remains to be done in the future. Further evaluation on a much larger and most varied database is still needed and one standard and authoritative gait database is strongly needed. We are setting up such a gait database with more subjects, more sequences with more different views and more variation in conditions such as the walkers wear different clothes in different seasons. The lack of general gait data-base, especially multiple views in the gait database, is another limitation to most current gait recognition algorithms. Our proposed method is just recognizing human through three view fusion, i.e. perpendicularity, along and oblique with the direction of human walking, in real environment, the angle between the walker’s direction and the camera is unpredictable, generally speaking, a useful experiment which can determine the sensitivity of the features from different views should be put forward and more multiple views fusion should be performed and that will provide us a more conviction results. At last, seeking better maturity measures, designing more sophisticated classifiers, extracting more effect feature, proposing better gait detection and segmentation algorithms, especially for satisfying to reduce the computational cost, and combination of holistic-based and model-based methods deserve more attention in future work. Acknowledgements The authors would like to express their thanks to the Institute of Automatic, Chinese Academic of Science (CASIA) for Human ID image database. Portion of the research in this paper uses the CASIA Gait database are collected by Institute of Automatic, Chinese Academic of Science. References Cao, L.J., Chua, K.S., Chong, W.K., Lee, H.P., Gu, Q.M., 2003. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 55, 321–336.

2411

Deniz, O., Catrillon, M., Hernandez, M., 2003. Face recognition using independent component analysis and support vector machines. Pattern Recognition Lett. 24, 2153–2157. Foster, Jeff P., Nixon, Mark S., Prugel-Bennett, Adam, 2003. Automatic gait recognition using area-based metrics. Pattern Recognition Lett. 24, 2489–2497. Hyvarinen, A., 1999. A fast and robust fixed-point algorithm for independent component analysis. IEEE Trans. Neural Networks 3, 626–634. Hyvarinen, A., Oja, E., 2000. Independent component analysis: Algorithm and applications. Neural Networks 13, 411–430. Kale, A., Rajagopalan, A.N., Cuntoor, N., Kruger, V., 2002. Gait-based recognition of humans using continuous HMMs. In: IEEE Internat. Conf. on Automatic Face and Gesture Recognition. Karhunen, J., Pajunen, P., Oja, E., 1998. The nonlinear PCA criterion in blind source separation: Relations with other approaches. Neurocomputing 22, 5–20. Kecman, V., 2001. Learning and soft computing, support vector machines. Neural Networks and Fuzzy Logic Models. The MIT Press, Cambridge, MA. Kim, M.S., Kim, D., Lee, S.Y., 2002. Face recognition using the embedded HMM with second-order block-specific observations. Pattern Recognition 36, 2723–2735. Kumar, S., Singh, C., 2005. A study of Zernike moments and its use in Devanagari handwritten character recognition. In: Internat. Conf. on Cognition and Recognition, India. Kumar, Ajay, Zhang, David, 2005. Personal authentication using multiple palmprint representation. Pattern Recognition 38, 1695–1704. Lee, L., 2002. Gait Analysis for Classification. Massachusetts Institute of Technology, USA. Li, K.L., Huang, K.H., Tian, S.F., 2002. Fuzzy support vector machine for multi-class classification. Acta Electron. Sinica 32 (6), 830–832. Lin, Chun-Fu, Wang, Sheng-De, 2002. Fuzzy support vector machines. IEEE Trans. Neural Network 13, 464–471. Lin, Chun-Fu, Wang, Sheng-De, 2004. Training algorithms for fuzzy support vector with data. Pattern Recognition Lett. 25, 1647–1656. Lu, J.W., Zhang, E.H., Zhang, Z.G., Xue Y.X., 2005. Gait recognition using independent component analysis. In: Internat. Symposium on Neural Networks, China, pp. 183–188. Lu, J.W., Zhang, E.H., Jing C.N., 2006. Gait recognition using wavelet descriptor and independent component analysis. In: Internat. Symposium on Neural Networks, China, pp. 232–237. Mowbray, Stuart D., Nixon, Mark S., 2003. Automatic gait recognition via Fourier descriptors of deformable objects. In: Internat. Conf. on Audio and Video-based Biometrics Person Authentication, CransMontana, pp. 566–573. Wagg, D.K., Nixon, M.S., 2004. An automated model-based extraction and analysis of gait. In: IEEE Conf. on Automatic Face and Gesture Recognition, Seoul, Korea, pp. 11–16. Wang, L., Tan, T., Ning, Z., Hu, W., 2003. Silhouette analysis-based gait recognition for human identification. IEEE Trans. Pattern Anal. Machine Intell. 25 (9), 1505–1518. Yu, S., Wang, L., Hu, W., Tan T., 2004. Gait analysis for human identification in frequency domain. In: Internat. Conf. on Image and Graphics. Hong Kong, China, pp. 282–285. Yuen, P.C., Lai, J.H., 2002. Face representation using independent component analysis. Pattern Recognition 35, 1247–1257. Zhang, David, Kong, Wai-Kin, You, Jane, Wong, Michael, 2003. Online palmprint identification. IEEE Trans. Pattern Anal. Machine Intell. 25 (9), 1041–1049. Zhang, E.H., Lu, J.W., Duan, G.L., 2005. Gait recognition via independent component analysis based on support vector machine and neural network. In: Internat. Conf. on Natural Computation, China, pp. 610–649.