Gabor texture in active appearance models

Gabor texture in active appearance models

ARTICLE IN PRESS Neurocomputing 72 (2009) 3174–3181 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/loca...

625KB Sizes 2 Downloads 99 Views

ARTICLE IN PRESS Neurocomputing 72 (2009) 3174–3181

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Gabor texture in active appearance models Xinbo Gao a, Ya Su a, Xuelong Li b, Dacheng Tao c, a b c

School of Electronic Engineering, Xidian University, Xi’an 710071, China School of Computer Science and Information Systems, Birkbeck College, University of London, London WC1E 7HX, UK School of Computer Engineering, Nanyang Technological University, Singapore

a r t i c l e in f o

a b s t r a c t

Article history: Received 4 December 2008 Received in revised form 25 February 2009 Accepted 3 March 2009 Communicated by S. Hu Available online 1 April 2009

In computer vision applications, Active Appearance Models (AAMs) is usually used to model the shape and the gray-level appearance of an object of interest using statistical methods, such as PCA. However, intensity values used in standard AAMs cannot provide enough information for image alignment. In this paper, we firstly propose to utilize Gabor filters to represent the image texture. The benefit of Gaborbased representation is that it can express local structures of an image. As a result, this representation can lead to more accurate matching when condition changes. Given the problem of the excessive storage and computational complexity of the Gabor, three different Gabor-based image representations are used in AAMs: (1) GaborD is the sum of Gabor filter responses over directions, (2) GaborS is the sum of Gabor filter responses over scales, and (3) GaborSD is the sum of Gabor filter responses over scales and directions. Through a large number of experiments, we show that the proposed Gabor representations lead to more accurate and robust matching between model and images. & 2009 Elsevier B.V. All rights reserved.

Keywords: Computer vision Active appearance models (AAMs) Gabor Texture representation

1. Introduction In computer vision, data representation has been standing as a key topic [2,9,15,25], and model-based approaches to interpreting images of deformable objects have been paid more and more attention. One motivation is to achieve robust segmentation by using the model to constrain solutions to be valid examples of the class of images modeled. It provides us an effective method to describe a wide range of phenomenon using few parameters, and an automated system to ‘understand’ the images. For example, Active Shape Models (ASMs) [27], inherited from Active Contour Models [16], can build a statistical shape model-based on a set of training images, and match a set of model points to a new image of object. Active Appearance Models (AAMs) [28,29] are one of the powerful deformable models to fulfill the work. It uses a statistical way to model the variations of objects and describes new ones. However, AAMs are different from ASMs that the former uses a model of the appearance of the whole of the region while the latter only models the image texture in small regions about each landmark point [31]. What’s more, AAMs heuristically match new images of objects by modeling the linear relationship between texture residuals and the parameter updates. Alternatively, Matthews and Baker [7,21] have combined AAMs with the Lucas–Kanade algorithm [1] to build an efficient AAMs fitting

 Corresponding author.

E-mail address: [email protected] (D. Tao). 0925-2312/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2009.03.003

method. This method reverses the role of the model template and the image and uses an efficient inverse compositional image alignment (ICIA) algorithm to ‘project out’ the appearance variation. It greatly improves the fitting efficiency. However, both methods are based on the intensity-based texture representation. In the recent research of AAMs, much attention has been paid to the texture representation. On the one hand, researchers [3,23] have introduced wavelet-based methods to represent the texture using as few dimensions as possible. These methods can efficiently decrease model parameters while keeping the same accuracy. On the other hand, many efforts focus on the fact that texture representation using intensity can not accurately and robustly describe the texture information. The key of solutions is to introduce better features or a mixture of them rather than using individual intensity. For example, Cootes and Taylor [26] have presented to model the local orientation of structures using the appearance of edge strength. Stegmann and Larsen [14] have chosen a mixture of gray-scale, hue, and edge magnitude and Scott et al. [8] have selected a variety of alternative representations based on gradients and corner-like features. Kittipanya- ngam and Cootes [20] have found that representations of the image structure in a region can improve AAMs in the sense of both accuracy and robustness. In a word, all these methods can enhance AAMs’ ability to fit the object in an image. However, more information should be extracted from images of object, such as scales. In this paper, we investigate the use of Gabor filters in AAMs’ texture representation. There are three major reasons for introducing the Gabor-based representation for AAMs: (1) human

ARTICLE IN PRESS X. Gao et al. / Neurocomputing 72 (2009) 3174–3181

brains seem to have a special function to process information in multi-resolution levels [10,11,24], which can be simulated by controlling the scale parameter in Gabor functions; (2) it is supposed that Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells [10,11,24], and (3) Gabor-function-based representations have been successfully employed in many computer vision applications such as texture analysis [5], face and palmprint recognition [33,34]. It has been proved that a bank of Gabor filters is required to span the expected orientation and frequency domain of the textures of interest. As a result, Yu and Li [19] have combined Gabor-based feature points tracking algorithm with AAMs to exactly align the face. They utilize Gabor-based feature extraction algorithm to provide initial feature position for AAMs. Therefore, the method can greatly save the computational cost of initialization and optimization in AAMs. However, it also suffers the same problem described above due to the texture representation of AAMs. For this purpose, we propose to utilize Gabor filters, rather than simply intensity values, to represent the image structure in AAMs. Although Gabor-function-based representations are effective for texture representation, the computational cost of the representations is high. Therefore, three simplified Gabor function representations are introduced as in [6]. Each representation is obtained by filtering an image with a certain sum of Gabor functions. These image representations are:

 GaborD is the sum of Gabor filter responses over directions;  GaborS is the sum of Gabor filter responses over scales; and  GaborSD is the sum of Gabor filter responses over scales and directions. These simplified Gabor-based image representations can reduce the number of filtering operations that are required. The rest of this paper is organized as follows: in Section 2, we will briefly introduce the fundamental of AAMs. Section 3 describes the details of the new texture expression methods based on Gabor filters. Experiments and conclusion are given in the following sections to show the improved performance of the proposed texture representation scheme.

3175

Then these models can be matched to a new image with an initial approximation to the position. This can be done using a fast linear update scheme, which modifies the model parameters to minimize the difference between a synthesized image and the target image. As described above, AAMs simply represent the region of interest using linearly normalized intensity values, g. However, models based on raw intensity tend to be sensitive to changes in conditions such as imaging parameters or biological variability. As a result, models built on one data set may not perform well on images taken under different conditions. Also, intensity models cannot describe the local structure of the image. This may not lead to accurate fitting in AAMs search. 2.2. Gabor functions As [10,11,24] show that, Gabor functions can model the responses of the visual cortex because they are similar to the receptive field profiles in the mammalian cortical simple cells. Lee [32] has shown that 2D Gabor functions can represent image well. A Gabor (wavelet, kernel, or filter) function is the product of an elliptical Gaussian envelope and a complex plane wave, defined as ( ) kkk kkk2 kxk2 cs;d ðx; yÞ ¼ ck ðx¯ Þ ¼ 2 exp  s 2s2    s2  expfikxg  exp  , (3) 2 where x ¼ ðx; yÞ is the variable in a spatial domain and k is the frequency vector, which determines the scale and direction of Gabor functions k ¼ ks eifd , where ks ¼ kmax/f5, with kmax ¼ p/2. In our application, f ¼ 2, s ¼ 0,1,2,3,4, and fd ¼ pd/8, for d ¼ 0,1,y,7. The term exp(s2/2) means the DC component, which is subtracted in order to make the kernel DC-free and insensitive to illumination. We use Gabor functions with five different scales and eight different orientations, making a total of 40 Gabor functions. The number of oscillations under the Gaussian envelope is determined by s ¼ 2p.

3. Gabor-based texture representation 2. Background

3.1. Gabor-based texture representation

2.1. Active appearance models AAMs can represent both the shape and texture variability appeared in a training set, which consists of images with key landmark points. The training set is usually labeled manually, although automatic methods have been developed [22,30]. AAMs can be thought of as generalizations of eigen patches or eigenfaces [18] in which the shape and the texture of the region are modeled and allowed to deform. Given a training set, we can generate statistical models of shape and texture variation (see [28,29] for details). The shape of an object can be represented as a vector sand the texture (graylevels or color values) represented as a vector g. Finally, the shape and the texture is controlled by the appearance parameters, c, according to s ¼ s þ Q s c,

(1)

g ¼ g þ Q g c.

(2)

where s is the mean shape, g the mean texture, and Qs and Qg are matrices describing the modes of variation derived from the training set.

The Gabor function representation of an image texture is obtained by convolving Gabor functions with an image. Concretely, the first two indices x and y give the pixel location, the third index s gives the value of the scale, and the fourth index d gives the direction. Examples of the real part of Gabor functions with five different scales and eight different directions are shown in Fig. 1. After convolving with Gabor functions, there are 40 components (images) for an image. Each component is the magnitude part of the output, which is obtained by convolving the image with a Gabor function. The example of the convolving result is shown in Fig. 2. 3.2. Variations of Gabor-based texture representation Although the Gabor function for representation is powerful, its computational cost is high, compared with the original intensitybased method. Therefore, three new representations of the image textures are used here, as in [6]. They are the sum over directions of Gabor-functions-based representation (GaborD), the sum over scales of Gabor-functions-based representation (GaborS), and the

ARTICLE IN PRESS 3176

X. Gao et al. / Neurocomputing 72 (2009) 3174–3181

sum over scales and directions of Gabor-functions-based representation (GaborSD). The most important benefit of these new representations is that the computational cost decreases without accuracy loss. Suppose I(x,y) is the image intensity, while cs;d ðx; yÞ is the Gabor function defined in (3). GaborD is the magnitude part of the outputs generated by convolving an image with the sum of Gabor functions over the eight directions with the scale fixed:    X   Iðx; yÞ  cs;d ðx; yÞ GaborDðx; yÞ ¼    d     X   ¼ Iðx; yÞ  cs;d ðx; yÞ.  

(4)

d

Fig. 1. The real part of Gabor functions for five different scales and eight different directions.

Here GaborD(x, y) is the output of the GaborD method for representation. Therefore, we have five different outputs to represent the original image in the GaborD decomposition (SEE Fig. 2b).

Fig. 2. Gabor-based textures by convolving an image with Gabor functions, which contain five different scales and eight different directions. The results include (a) Gabor textures with five different scales and eight different directions, (b) GaborD with five different scales, (c) GaborS with eight different directions, and (d) GaborSD with one direction and one scale.

ARTICLE IN PRESS X. Gao et al. / Neurocomputing 72 (2009) 3174–3181

GaborS is the magnitude part of the outputs generated by convolving an averaged image with the sum of Gabor functions over the five scales with the direction fixed:    X   Iðx; yÞ  cs;d ðx; yÞ GaborSðx; yÞ ¼    s     X   (5) ¼ Iðx; yÞ  cs;d ðx; yÞ,   s where GaborS(x, y) is the output of the GaborS method for representation. Therefore, we have eight different outputs to represent the original image in the GaborS decomposition (see Fig. 2c). GaborS(x, y) is the magnitude part of the output generated by convolving an image with the sum of all:    X X   GaborSDðx; yÞ ¼  Iðx; yÞ  cs;d ðx; yÞ   s d     XX   ¼ Iðx; yÞ  (6) cs;d ðx; yÞ,   s d

where GaborSD(x, y) is the output of the GaborSD method for representation. Therefore, we have only one output to represent the original image in the GaborSD decomposition (see Fig. 2d). 3.3. Gabor-based AAMs In order to make the AAMs more robust and accurate, Gabor texture representations are used to build the appearance model. More particularly, the object of training an image is firstly normalized to a shape vector, s, and a shape-free patch. Then, a set of multi-orientation and multi-scale Gabor filters are used to extract Gabor features by convolving these filters with the shapefree patch. This results in a set of Gabor filtered feature maps:

F ¼ fGabor ds g;

s ¼ 0; 1; . . . ; 4; d ¼ 0; 1; . . . ; 7,

3177

4. Experiments This section first briefly describes databases used in our experiments. We then compare the performance of our texture representation with several other established representations for AAMs. 4.1. Setup We carry out all the experiments on three databases: XM2VTS database [12], IMM face database [17], and IMM hand database [13]. For each database, 4-fold cross validation is used to evaluate the algorithm.

 The XM2VTS frontal data set contains 2360 mug shots of 295

 

individuals collected over 4 sessions. These images are of different people in completely different lighting conditions, taken with different cameras. From the database, 400 images are selected to test the performance of algorithms. Every image is annotated 58 landmarks by hands. An example can be seen in Fig. 3. IMM face database consists of 240 annotated images of 40 different human faces. Every image was annotated 58 landmarks by hands. IMM hand database contains a set of 40 images of four human hands. Each image is annotated with 56 landmarks. Sample images are shown in Fig. 4.

Two algorithms, SGrad and GEC [8], are used to compare with the proposed methods.

 SGrad: sigmoidal normalization of gradient at each pixel: ðg x g y ÞT gn ¼     , g þ g

(9)

(7)

Gabor ds

where denotes Gabor feature maps obtained with a Gabor filter on the scale s and direction d. Whereafter, these feature maps are concatenated to be the texture vector, g, as [8,14,19,26]. Finally, the model is built on these shape vectors and texture vectors, and appearance parameters, c, are generated according to (1) and (2). Subsequently in the fitting procedure, the model trained before is used to find the appearance parameters which best match a new image. Given a set of initial parameters, the fitting technique is to minimize the energy function: EðpÞ ¼ r T r,

Labeled image

(8)

where pT ¼ ðcT jt T juT Þ denotes all the parameters of the model (c is the appearance model parameters, t the pose transformation parameters and u the texture normalization parameters). rðpÞ ¼ g s  g m is the residual texture between the sampled texture gs and the current model texture gm. Remarkably, we replace the texture vector by the Gabor feature vector the same as in the modeling procedure. Finally, the fitting problem can be treated as an optimization problem and can be resolved by some gradient descent optimization algorithms such as Gauss–Newton algorithm [28,29]. Although Gabor function can be introduced in the feature maps as (7), it still suffers from the high computational cost. Consequently, three variations of Gabor function have to be considered for efficiency. Fortunately, the substitution for Gabor function is straightforward. The only difference is that Gabor feature maps are obtained by the variations of the Gabor function. As a result, the efficiency of Gabor-based AAMs will be improved. Performance of these variations will be analyzed in Section 4.

AAMs searching result

Gabor-based AAMs searching result

Fig. 3. Fitting comparison between AAMs and Gabor-based AAMs on XM2VTS database.

Labeled image

AAMs searching result

Gabor-based AAMs searching result

Fig. 4. Fitting comparison between AAMs and Gabor-based AAMs on IMM hand database.

ARTICLE IN PRESS 3178

X. Gao et al. / Neurocomputing 72 (2009) 3174–3181

where (gx gy) denotes the magnitude of  the gradient  function , and  g the mean of the form:

  gradient of an pixel,  g  the by the sigmoid normalized   g . The sigmoid function is of

m f ðxÞ ¼ , mþm

Table 1 Comparing point-to-point errors for different AAMs texture representation on XM2VTS database. Texture representation

(10)

where m denotes the mean of m

 GEC: Three measurements are combined to describe the local image structure: image gradient, edge, and corner-ness. Here edge and corner-ness are generated by Harris corner detector [4]. Detailed information can be found in [8]. For comparison, building and fitting procedures are the same. During building the model, we firstly construct a subspace to represent 95% of the variation in shapes. Then, we warp all images into the mean shape. It means that at the highest resolution, the normalized frame has around 11 000 pixels. Based on these normalized textures, we construct a texture subspace to represent 95% of the variation in textures. Finally, an appearance subspace is created to represent 99% of the total variation in the shape and the texture. While in the fitting procedure, the regression technique is used as in [28,29]. After each convergence, in order to obtain a metric of how successfully the model fitting converges, we use the mean of point-to-point errors between model points and hand-labeled points (pt–pt). If the shape is represented as x ¼ ½x1 ; x2 ; :::; xn ; y1 ; y2 ; :::; yn T ,

(11)

Intensity Gabor GaborD GaborS GaborSD GEC Sgrad

Eptpt (pts) Mean

Std

Median

2.2937 2.0484 2.4171 2.1273 2.3344 1.9846 1.9036

0.8474 0.7304 0.8667 0.7487 0.7446 0.4792 0.9252

2.0337 2.0840 2.2631 1.9185 2.1982 2.0002 2.1084

Table 2 Comparing point-to-point errors for different AAMs texture representation on IMM face database. Texture representation

Intensity Gabor GaborD GaborS GaborSD GEC Sgrad

Eptpt (pts) Mean

Std

Median

2.8229 2.5089 2.9153 2.5025 2.7771 2.2426 2.7660

0.7227 0.5428 0.6510 0.6901 0.7070 0.4194 0.7856

2.6933 2.5715 2.8340 2.6081 2.6438 2.1948 2.8000

the pt–pt error can be expressed as Ept-pt ðx; xgt Þ ¼

n 1X n i¼1

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxi  xgt;i Þ2 þ ðyi  ygt;i Þ2 ,

(12)

where xgt ¼ ½xgt;1 ; xgt;2 ; . . . ; xgt;n ; ygt;1 ; ygt;2 ; . . . ; ygt;n T means the ground truth of the shape. Each search is considered to be successful if the pt–pt error is less than 5 pixels: ( 1 if Ept2pt o5; hit ¼ (13) 0 else:

4.2. Accuracy comparison In order to verify the fitting accuracy of the proposed methods, the standard shape of every image is displaced by 10 pixels in x and y, respectively, to be the initial fitting location. An example of fitting is shown in Fig. 3. This example shows that the searching result of Gabor-based AAMs is more accurate than intensity-based AAMs. Quantitative results on different databases are shown in Tables 1 and 2. From Table 1 we can see firstly that although Gabor does not outperform GEC and SGrad, they have the similar matching accuracy which is better than intensity on the XM2VTS database. Secondly, among variations of Gabor-based representations, GaborS obtains the highest performance, which is a little worse than Gabor. However, GaborD and GaborSD perform even worse than intensity. From Table 2 we can see that Gabor also gives a better result than intensity on the IMM face database. Although Gabor does not outperform GEC, it performs better than SGrad. Among variations of Gabor-based representation, GaborS obtains the lowest pointto-point error, even lower than Gabor. However, GaborD does not outperform intensity, and GaborSD has the similar result with intensity.

In a summary, these representations have different performance on different databases, and we can find that:

 Gabor-based representation can improve the accuracy of

 

model fitting on different databases. This is because Gabor filters can provide more information than intensity. This information helps the model to match more accurately. GEC has higher accuracy than Gabor on both databases. This means that gradients, corners, and edges are important in the matching accuracy. Though using the information of all scales and directions, Gabor cannot perform much better than GaborS. This means that GaborS can express the most important information of Gabor transformation for matching accuracy.

4.3. Convergence ratio comparation To investigate the performance of our methods in terms of convergence ratio, each correct shape is displaced from 10 to 60 pixels in x and y. Then fitting procedures start from these displaced shapes. The convergence ratio can be defined as P hit , (14) CR ¼ n where hit is defined as (13), and the n is the total number of images. Finally, the searching results of correctly converged fitting are shown in Figs. 5 and 6. We evaluate the robustness of various representations using the disturbance region (the mean of two directions) with more than 90% convergence ratio (90%CR). It means that the greater the 90%CR is the better robustness the representation has.

ARTICLE IN PRESS

Successful Convergence Rate

X. Gao et al. / Neurocomputing 72 (2009) 3174–3181

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −60

AAM Gabor GaborD GaborS GaborSD GEC SGrad

−40

−20

0

20

40

3179

Table 3 CPU time comparison between texture representation schemes. Texture representation

Building times (s)

Fitting times (s)

Total times (s)

Intensity Gabor GaborD GaborS GaborSD GEC SGrad

76.5242 2251.5042 254.7418 446.3463 80.1537 223.3184 128.4063

77.8706 2260.0423 256.9688 448.9362 81.4738 224.8448 130.0342

154.3948 4511.5465 511.7106 895.2825 161.6275 448.1632 258.4405

60

Displacement = −60 to 60

5000 AAM Gabor GaborD GaborS GaborSD GEC SGrad

4500

Fig. 5. Convergence ratio vs. initial displacement on the XM2VTS database.

4000

Successful Convergence Rate

1 0.9 0.8

Time (s)

3500 AAM Gabor GaborD GaborS GaborSD GEC SGrad

0.7

3000 2500 2000 1500

0.6 0.5

1000

0.4

500

0.3

0

0.2

Building Time

0.1 0 −60

Fitting Time

Total Time

Fig. 7. CPU time comparison between texture representations.

−40

−20 0 20 Displacement = −60 to 60

40

60

From Table 3 we can see that:

Fig. 6. Convergence ratio vs. initial displacement on the IMM face database.

 Gabor takes the most time in both training and fitting From Fig. 5 we can see that Gabor has most robustness to displacements on XM2VTS database. Its 90%CR is about 30 pixels. This is because that Gabor utilizes the most information in either scale or direction. In a contrast, GaborS, GaborD, and GEC possess the similar performance with original AAMs. Their 90%CR is around 20 pixels, which is a little worse than Gabor. GaborSD has the worst robustness which is only 10 pixels. From Fig. 6 we can see that Gabor-based representation and its variations have the similar robustness to displacements, which are more than 30 pixels in 90%CR. SGrad and GEC do not perform well, which give 10 and 20 pixels, respectively. These results show that features derived from Gabor filters has similar robustness with original AAMs on IMM face database.



procedures, around thirty times more than intensity. This is because the original model uses only intensity information, while Gabor uses forty times more information than intensity, and. In variations of Gabor, GaborSD has the similar efficiency with intensity and GaborS is the most time-consuming. GaborD has the similar performance with GEC and expands about twice the time as much as SGrad.

In a summary, Gabor-based representation is more timeconsuming than original AAMs. However, variations of Gabor greatly improve the efficiency of the Gabor-based representation. The ranked order of efficiency of compared methods is Intensity4GaborSD4SGrad4GEC 4GaborD4GaborS4Gabor.

(15)

4.4. Computational efficiency comparison

4.5. Generalization analysis

The CPU time is an important indicator for algorithm performance. In order to test the computational efficiency of the proposed algorithm, the same strategy with accuracy comparison is used here: the standard shape of every image is displaced by 10 pixels in x and y, to be the initial fitting location. Then the model starts to fit from this initial location. Two sets of 100 images are used, respectively, for training and testing. Finally, the CPU time of the two procedures is recorded as shown in Table 3 and Fig. 7. All the experiments are conducted under P4 CPU and 1G memory PC.

In order to test the generalization of proposed algorithms on different objects, we apply them to the IMM hand database [13]. Firstly, the fitting accuracy analysis of the proposed methods is performed on the data set. Images chosen are 40 and the 4-fold cross validation is used to test the models. The same as the accuracy evaluation before, the standard shape of every image is displaced by 10 pixels in x and y, respectively, to be the initial fitting location. GEC and SGrad are used for comparison, and the fitting results are shown in Table 4.

ARTICLE IN PRESS 3180

X. Gao et al. / Neurocomputing 72 (2009) 3174–3181

Table 4 Comparing point-to-point errors for different AAMs texture representation on IMM hands. Eptpt (pts)

Texture representation

Intensity Gabor GaborD GaborS GaborSD GEC SGrad

Mean

Std

Median

0.9373 0.6518 0.7889 0.6469 0.9043 0.5994 1.0874

0.7210 0.6075 0.3244 0.1536 0.5074 0.1388 0.9664

0.6946 0.6274 0.6900 0.6099 0.8597 0.6060 0.6910

Experiments on multi-databases show that, Gabor-based texture representation can improve the model matching in terms of both accuracy and robustness. However, the computational complexity of the Gabor-based model is high (about 30 times as much as intensity-based model). At the same time, three variations show their improvement on efficiency. Among proposed methods, GaborS has the closest performance with Gabor while keeping a less computational cost. This phenomenon illustrates that there is much redundancy in Gabor-based texture representation. Efficient algorithms should be proposed in the next step to address how to reduce this redundancy.

Successful Convergence Rate

Acknowledgment 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −60

AAM Gabor GaborD GaborS GaborSD GEC SGrad

We want to thank the helpful comments and suggestions from the anonymous reviewers. This research was supported by National Science Foundation of China (60771068, 60702061, and 60832005), the Open-End Fund of National Laboratory of Pattern Recognition in China, and National Laboratory of Automatic Target Recognition, Shenzhen University, China, and the Program for Changjiang Scholars and innovative Research Team in University of China (IRT0645). References

−40

−20

0

20

40

60

Displacement = −60 to 60 Fig. 8. Convergence ratio vs. initial displacement on the IMM hand database.

From Table 4 we can find that, Gabor, GaborD, and GaborS obtain better matching accuracy than intensity. This result indicates the features derived from Gabor filters can be applied to the IMM hand database. An example of comparing fitting accuracy between Gabor and intensity is shown in Fig. 4. It can be found in this example that the searching result of Gabor-based AAMs is more accurate than intensity-based AAMs. Then, the robustness evaluation of proposed methods is carried out on the IMM hand database. Correct shapes are displaced from 10 to 60 pixels in x and y, respectively. Then fitting procedures start from these displaced shapes. Finally, results are shown in Fig. 8. From the figure we can see that, Gabor has the similar performance with GaborS and GEC, which is better than intensity. This result verifies again the robustness of Gabor features throughout multi-databases.

5. Discussion and conclusions This paper proposes to utilize Gabor filters in the texture representation of AAMs. It replaces intensity values of original AAMs by outputs of Gabor filters. This method can improve the fitting accuracy and robustness of AAMs, because Gabor functions can model the responses of the visual cortex and provide more information for image matching than intensity. Given the high computational cost of the Gabor-based texture representation, three different Gabor-function-based image representations are proposed to alleviate the problem. They are GaborS, GaborD, and GaborSD. These simplified representations can greatly improve the computation efficiency because they can reduce the number of filtering operations that are required.

[1] B.D. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in: Proceedings of International Joint Conference on Artificial Intelligence, 1981, pp. 674–679. [2] X. Li, S. Lin, S. Yan, D. Xu, Discriminant locally linear embedding with highorder tensor data, IEEE Transactions on Systems, Man, and Cybernetics, Part B 38 (2) (2008) 342–352. [3] C.B.H. Wolstenholme, C.J. Taylor, Wavelet compression of active appearance models, in: Proceedings of the Second International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 1679, 1999, pp. 544–554. [4] C. Harris, M. Stephens, A combined corner and edge detector, in: Alvey Vision Conference, 1988, pp. 147–151. [5] D. Dunn, W.E. Higgins, J. Wakeley, Texture segmentation using 2D Gabor elementary functions, IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (2) (1994) 130–149. [6] D. Tao, X. Li, X. Wu, S.J. Maybank, General tensor discriminant analysis and Gabor features for gait recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (10) (2007) 1700–1715. [7] I. Matthews, S. Baker, Active appearance models revisited, International Journal of Computer Vision 60 (2) (2004) 135–164. [8] I.M. Scott, T.F. Cootes, C.J. Taylor, Improving appearance model matching using local image structure, in: Proceedings of International Conference on Information Processing in Medical Imaging, vol. 2732, 2003, pp. 258–269. [9] D. Tao, X. Li, X. Wu, S.J. Maybank, Geometric mean for subspace selection, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2) (2008) 260–274. [10] J.G. Daugman, Uncertainty relation for resolution in space, spatial frequency and orientation optimized by two-dimensional visual cortical filters, Journal of the Optical Society of America 2 (7) (1985) 1160–1169. [11] J.G. Daugman, Two-dimensional spectral analysis of cortical receptive field profile, Vision Research 20 (5) (1980) 847–856. [12] K. Messer, J. Matas, J. Kittler, J. Luettin, G. Maitre, Xm2vtsdb: the extended m2vts database, Audio- and Video-based Biometric Person Authentication (1999) 72–77. [13] M.B. Stegmann, D.D. Gomez, A brief introduction to statistical shape analysis, March 2002, /http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/ 403/pdf/imm403.pdfS. [14] M.B. Stegmann, R. Larsen, Multi-band modelling of appearance, Image and Vision Computing 21 (1) (2003) 61–67. [15] D. Tao, M. Song, X. Li, J. Shen, J. Sun, X. Wu, C. Faloutsos, S.J. Maybank, Bayesian tensor approach for 3-D face modelling, IEEE Transactions on Circuits and Systems for Video Technology 18 (10) (2008) 1397–1410. [16] M. Kass, A. Witkin, D. Terzopoulos, Snakes: active contour models, in: International Conference on Computer Vision, 1987, pp. 259–268. [17] M.M. Nordstrøm, M. Larsen, J. Sierakowski, M.B. Stegmann, The IMM face database—an annotated dataset of 240 face images, Technical Report, Informatics and Mathematical Modelling, Technical University of Denmark, DTU, May 2004. [18] M. Turk, A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience 3 (1) (1991) 71–86.

ARTICLE IN PRESS X. Gao et al. / Neurocomputing 72 (2009) 3174–3181

[19] M. Yu, S. Li, Face alignment based on statistical models and Gabor wavelets, International Journal of Robotics and Automation 23 (1) (2008). [20] P. Kittipanya-ngam, T.F. Cootes, The effect of texture representations on AAM performance, in: Proceedings of International Conference on Pattern Recognition, vol. 2, 2006 pp. 328–331. [21] S. Baker, I. Matthews, Equivalence and efficiency of image alignment algorithms, in: Proceedings of the 2001 IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2001, pp. 1090–1097. [22] S. Baker, I. Matthews, J. Schneider, Automatic construction of active appearance models as an image coding problem, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (10) (2004) 1380–1384. [23] S. Darkner, R. Larsen, M.B. Stegmann, B.K. Ersboll, Wedgelet enhanced appearance models, in: Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, vol. 12, 2004, pp. 177. [24] S. Marcelja, Mathematical description of the responses of simple cortical cells, Journal of the Optical Society of America 70 (11) (1980) 1297–1300. [25] T. Zhang, D. Tao, J. Yang, Discriminative Locality Alignment, in: The Tenth European Conference on Computer Vision, 2008, pp. 725–738. [26] T.F. Cootes, C.J. Taylor, On representing edge structure for model matching, in: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2001, pp. 1114–1119. [27] T.F. Cootes, D.H. Cooper, C.J. Taylor, J. Graham, Active shape models—their training and application, Computer Vision and Image Understanding 61 (1) (1995) 38–59. [28] T.F. Cootes, G.J. Edwards, C.J. Taylor, Active appearance models, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (6) (2001) 681–685. [29] T.F. Cootes, G.J. Edwards, C.J. Taylor, Active appearance models, in: Proceedings of the European Conference on Computer Vision, vol. 2, 1998, pp. 484–498. [30] T.F. Cootes, S. Marsland, C. Twining, K. Smith, C.J. Taylor, Groupwise diffeomorphic non-rigid registration for automatic model building, in: Proceedings of the European Conference on Computer Vision, vol 3,024, 2004, pp. 316–327. [31] T.F. Cootes, G.J. Edwards, C.J. Taylor, Comparing active shape models with active appearance models, British Machine Vision Conference 1 (1999) 173–182. [32] T.S. Lee, Image representation using 2D Gabor wavelets, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (10) (2003) 959–971. [33] X. Pan, Q. Ruan, Palmprint recognition using Gabor feature-based (2D) 2PCA, Neurocomputing 71 (13-15) (2008) 3032–3036. [34] Y. Yao, X. Jing, H. Wong, Face and palmprint feature level fusion for single sample biometrics recognition, Neurocomputing 70 (7–9) (2007) 1582–1586.

Xinbo Gao received the B.Sc., M.Sc., and Ph.D. degrees in Signal and Information Processing from Xidian University, China, in 1994, 1997, and 1999, respectively. He joined the Department of Electric Engineering, Xidian University as a Lecturer in 1999 and is currently Professor, Director of the VIPS Lab and Director of International Office of Xidian University. His research interests are computational intelligence, machine learning, information processing and analysis, pattern recognition, and artificial intelligence. In these areas, he has published 4 books and around 100 technical

3181

articles in refereed journals and proceedings including IEEE TCSVT, TNN, etc. He is on the editorial boards of journals including EURASIP Signal Processing (Elsevier), Journal of Test and Measurement Technology, Journal of Data Acquisition and Processing, etc. He served as general chair/co-chair or program committee chair/ co-chair or PC member for around 30 major international conferences.

Ya Su received Bachelor Degree and Master Degree in Signal and Information Processing from Xidian University, Xi’an, China. Now he is pursuing his Ph.D. Degree at Xidian University. His research interests incude pattern recognition and machine learning.

Xuelong Li received B.Eng. and Ph.D. degrees from University of Science and Technology of China (USTC). Currently he works at University of London. He is also a Visiting Professor at Tianjin University and a Guest Professor at USTC.

Dacheng Tao received the Ph.D. from the University of London (UoL). Currently, he holds a Nanyang titled academic post in the Nanyang Technological University. His research interests include artificial intelligence, biometrics, computer vision, data mining, machine learning, multimedia, statistics, and visual surveillance.