Boosting encoded dynamic features for facial expression recognition

Boosting encoded dynamic features for facial expression recognition

Pattern Recognition Letters 30 (2009) 132–139 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.c...

481KB Sizes 0 Downloads 66 Views

Pattern Recognition Letters 30 (2009) 132–139

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Boosting encoded dynamic features for facial expression recognition Peng Yang a,*, Qingshan Liu a,b, Dimitris N. Metaxas a a b

Department of Computer Science, Rutgers, The State University of New Jersey, 110 Frelinghuysen Road, Piscataway, NJ 08854-8019, USA National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Beijing 100080, China

a r t i c l e

i n f o

Article history: Available online 4 April 2008 Keywords: Pattern recognition Facial expression Video understanding Boosting

a b s t r a c t It is well known that how to extract dynamic features is a key issue for video-based face analysis. In this paper, we present a novel approach of facial expression recognition based on the encoded dynamic features. In order to capture the dynamic characteristics of facial events, we design the dynamic haar-like features to represent the temporal variations of facial appearance. Inspired by the binary pattern coding, we further encode the dynamic features into the binary pattern features, which are useful to construct weak classifiers for boosting learning. Finally, the Adaboost is performed to learn a set of discriminating encoded dynamic features for facial expression recognition. We conduct the experiments on the CMU expression database, and the experiment result shows the power of the proposed method. We also extend this method to the active units (AU) recognition, and get a promising performance. Published by Elsevier B.V.

1. Introduction Automatic facial expression recognition has attracted much attention in recent years due to its potential applications. Previous work can be categorized into two classes: image-based methods and video-based methods (Fasel and Luettin, 2003; Pantic and Rothkrantz, 2000; Zeng et al., 2007). Image-based methods (Shan et al., 2005; Bartlett et al., 2003; Pantic and Rothkrantz, 2004) assumed that facial expressions are static and recognition is conducted on single image. Actually, a natural facial event evolves over time from the onset, the apex, and the offset, so does the facial expression. Therefore, image-based expression recognition could not achieve a good performance in practical systems. The videobased methods are demonstrated to be a good option, and they aim to integrate of temporal information of facial appearance (Zhao and Pietikainen, 2007; Black and Yacoob, 1997; Yacoob and Davis, 1994; Cohen et al., 2003). Although much progress has been made by many researchers (Cohen et al., 2003; Tian, 2004), achieving high accuracy is still a challenge due to the complicated variation of facial dynamics. How to extract and represent the dynamic features is a key issue. Normally there are two popular approaches to extract facial features: geometric feature-based methods (Gu and Ji, 2004; James Lien et al., 1999; Valstar et al., 2005; Zhang et al., 1998; Tian et al., 2002) and appearance-based methods (Bartlett et al., 2003; Shan et al., 2005). Gabor appearance features are widely used to describe local appearance (Liu and

Wechsler, 2002; Bartlett et al., 2003; Daugman, 2003), but their computation expense is much higher than other local features such as local binary pattern (LBP) (Ojala and Pietikainen, 2002) and haar-like features (Paul and Michael, 2002). In this paper, we propose a new approach of facial expression recognition. In order to capture temporal characteristic of facial expression, we design the dynamic haar-like features to represent facial images. Compared to gabor features, haar-like features are very simple, since they are just based on simple add or minus operators (Paul and Michael, 2002). Moreover, from our experiment results, we find that the haar-like features are a little better than the gabor features for describing facial expressions. The dynamic haarlike features are encoded into the binary patterns for further effective analysis inspired by Ojala and Pietikainen (2002). Finally based on the encoded dynamic features, the Adaboost is employed to learn the combination of optimal discriminate features to construct the classifier. We test the proposed method on the CMU facial expression database, and extensive experiments prove its effectiveness. We also extend it for AU recognition, and get a promising performance. This paper is organized as following, we give an overview of our proposed method in Section 2, and design the dynamic haar-like features in Section 3. How to create the code book and encode the dynamic features is addressed in Section 4. The boosting learning is introduced in Section 5. Experiments are reported in Section 6, followed by the conclusions in Section 7. 2. Overview of the proposed approach

* Corresponding author. Fax: +1 0 7324452795. E-mail address: [email protected] (P. Yang). 0167-8655/$ - see front matter Published by Elsevier B.V. doi:10.1016/j.patrec.2008.03.014

Since facial expression variations are dynamic in temporal domain, it is a trend to use the variations of temporal information

P. Yang et al. / Pattern Recognition Letters 30 (2009) 132–139

133

Fig. 1. The structure of the proposed framework.

Fig. 2. The flowchart of dynamic features extraction.

for facial expression recognition (Zhao and Pietikainen, 2007). In this paper, we propose a novel framework for facial expression recognition based on the encoded dynamic features. Fig. 1 illustrates the structure of our framework, which has three main components: dynamic feature extraction, coding dynamic features, and Adaboosting learning. For the component of the dynamic feature extraction, we design the dynamic haar-like features to capture the temporal variations of facial expressions. Inspired by the binary pattern coding (Ojala and Pietikainen, 2002), we analyze the distribution of each dynamic feature, and we create a code book for it. Based on the code books, the dynamic features are further mapped into the binary pattern features. Finally the Adaboost is used to learn a set of discriminating coded features for facial expression recognition.

For simplicity, we denote one image sequence I ¼ fI1 ; I2 ; . . . ; In g with n frames. We define Hi ¼ fhi;1 ; hi;2 ; . . . ; hi;m g, as the haar-like feature set in the image Ii , where m is the number of the haar-like feature. Corresponding to each haar-like feature hi;j , we build a dynamic haar-like feature ui;j in the temporal window as ui;j ¼ fhi;j ; hiþ1;j ; . . . ; hiþl;j g. Fig. 4 gives an illustration. We call each ui;j as a dynamic feature, we can see that the size of dynamic feature is

3. Dynamic feature representation The haar-like features achieved a great performance in face detection (Paul and Michael, 2002). Due to their much lower computation cost compared to the gabor features, we exploit the haarlike features to represent face images, and extend them to represent the dynamic characteristic of facial expression. Fig. 3 shows the example of extracting the haar-like features in a face image. The dynamic haar-like features are built by two steps: (1) thousands of haar-like features are extracted in each frame and (2) the same haar-like features in the consecutive frames are combined as the dynamic features. Fig. 2 shows the flowchart of the dynamic haar-like features, and the details are described in the following.

Fig. 3. Example of haar-like features superimposed onto a face image.

Fig. 4. Example of one dynamic feature unit ui;j in an image sequence.

Fig. 5. Example of coding one dynamic feature unit ui;j .

134

P. Yang et al. / Pattern Recognition Letters 30 (2009) 132–139

Fig. 6. The procedure of the expression recognition.

Fig. 7. Examples of six basic expressions (anger, disgust, fear, happiness, sadness and surprise).

Table 1 The area under the ROC curves (based on haar-like feature and gabor feature)

Table 3 The area under the ROC curves with different window size based on gabor feature

Expression

Expression

Angry Disgust Fear Happiness Sadness Surprise

Haar-like feature

Gabor feature

Static

Encoded dynamic

Static

Encoded dynamic

0.883 0.933 0.858 0.982 0.895 0.996

0.982 0.987 0.830 0.983 0.945 0.996

0.790 0.931 0.835 0.960 0.891 0.985

0.970 0.980 0.600 0.960 0.957 1.000

Angry Disgust Fear Happiness Sadness Surprise

Encoded dynamic gabor feature + AdaBoost Window size 3

Window size 4

Window size 5

Window size 6

Window size 7

0.97 0.98 0.6 0.96 0.957 1.00

0.986 0.941 0.610 0.946 0.983 1.00

0.976 0.910 0.609 0.989 0.929 0.990

0.953 0.890 0.690 0.940 0.954 0.999

0.991 0.954 0.770 0.925 0.975 0.997

Table 2 The area under the ROC curves with different window size based on haar-like feature

Table 4 The area under the ROC curves on different T

Expression

Expression

Angry Disgust Fear Happiness Sadness Surprise

Encoded dynamic haar-like feature + AdaBoost Window size 3

Window size 4

Window size 5

Window size 6

Window size 7

0.982 0.987 0.83 0.983 0.946 0.996

0.964 0.983 0.873 0.977 0.967 0.998

0.977 0.966 0.838 0.99 0.96 0.99

0.918 0.963 0.844 0.979 0.954 0.998

0.945 0.967 0.84 0.978 0.954 1.00

Angry Disgust Fear Happiness Sadness Surprise

Encoded dynamic haar feature + AdaBoost (window size is 3) T = 2.4

T = 1.9

T = 1.65

T = 1.5

T = 1.2

T = 1.0

T = 0.6

T = 0.3

0.913 0.978 0.799 0.965 0.92 0.995

0.991 0.983 0.78 0.975 0.955 0.995

0.982 0.987 0.83 0.983 0.9456 0.996

0.978 0.993 0.81 0.974 0.942 0.998

0.977 0.996 0.84 0.978 0.933 0.997

0.969 0.995 0.83 0.965 0.938 0.998

0.953 0.834 0.943 0.903 0.897 1.00

0.73 0.82 0.92 0.75 0.75 0.99

Fig. 8. Different facial motion speed of different subjects.

135

P. Yang et al. / Pattern Recognition Letters 30 (2009) 132–139 ROC curve for expression Disguss 1

0.9

0.9

0.8

0.8

0.7

0.7 true positive rate

true positive rate

ROC curve for expression Angry 1

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1 0

0.1

Coded Dynamice Feature Static Feature 0

0.2

0.4 0.6 false positive rate

0.8

0

1

Coded Dynamice Feature Static Feature 0

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6 0.5 0.4

0.4

0.2

0.2 0.1

Coded Dynamice Feature Static Feature 0

0.2

0.4 0.6 false positive rate

0.8

0

1

Coded Dynamice Feature Static Feature 0

0.9

0.9

0.8

0.8

0.7

0.7

0.6 0.5 0.4

0.2 0.1

Coded Dynamice Feature Static Feature 0.4 0.6 false positive rate

0.8

1

0.4

0.2

0.2

0.8

0.5

0.3

0

0.4 0.6 false positive rate

0.6

0.3

0

0.2

ROC curve for expression Surprise 1

true positive rate

true positive rate

ROC curve for expression Sadness 1

0.1

1

0.5

0.3

0

0.8

0.6

0.3

0.1

0.4 0.6 false positive rate

ROC curve for expression Happy

1

true positive rate

true positive rate

ROC curve for expression Fear

0.2

1

0

Coded Dynamice Feature Static Feature 0

0.2

0.4 0.6 false positive rate

0.8

1

Fig. 9. ROC curves of expression based on coded dynamic features and static features.

decided by the temporal window size. The temporal variation of facial expressions can be effectively described by all the ui;j s. 4. Coding the dynamic features As mentioned above, a dynamic feature unit is composed of a set of the haar-like features in the same position along the tempo-

ral window. Thus, each dynamic unit is a feature vector. In this paper, we further code the dynamic feature unit into a binary pattern inspired by Daugman (2003) and Ojala and Pietikainen (2002). This coding scheme has two advantages: (1) it is easier to construct weak learners for Adaboost learning with a scalar feature than with a feature vector and (2) the proposed binary coding is based on the statistical distribution, so the encoded features are robust to noise.

136

P. Yang et al. / Pattern Recognition Letters 30 (2009) 132–139

We create one code book for each facial expression as follows: first, we analyze the distribution of each feature h;j under each expression, and then the mean l and the variance r can be estimated from this distribution. The gaussian distribution N j ðl; rÞ is adopted to model the distribution of the feature unit h;j . Second, we obtain the code book ðN 1 ðl1 ; r1 Þ; N 2 ðl2 ; r2 Þ; . . . ; N m ðlm ; rm ÞÞ for each expression. Based on the code book, we can map the each haar-like feature hi;j to f1; 0g pattern by the formula: 8 < 0 : if khi;jrlj k > T j ð1Þ C i;j ¼ : 1 : if khi;j lj k < T rj where T is the threshold, we will give a detail analysis on the influence of different T values in Section 6, and we find our method is not sensitive to T. Based on the formula (1), we get the binary pattern Ei;j . Fig. 5 gives the procedure of creating the coded feature Ei;j with length L, and we can convert binary vector Ei;j to a scalar. Ei;j ¼ fC i;j ; C iþ1;j ; . . . ; C iþL2;j ; C iþL1;j g

ð2Þ

5. Boosting the coded dynamic features As we know, we can extract thousands of haar-like features from one image, so we can obtain thousands of the dynamic coded features. A set of discriminating features should be selected to construct the final expression classifier. In this paper, we use the Adaboost learning to achieve this goal, and the weak learner is built on each coded feature as in (Paul and Michael, 2002). For each expression, we set the image sequences of this expression as the positive examples, and the image sequences of the other expressions as the negative samples. Therefore, one classifier for each expression can be built based on the corresponding coded features. The learning procedure is in Algorithm 1. The testing procedure is displayed in Fig. 6. We take six expression categories into account: happiness, sadness, surprise, disgust, fear, and anger. Six Adaboost learners are trained to discriminate each expression from the others. Algorithm 1 (Learning procedure). 1. Given training image sequences ðxi ; yi Þ; . . . ; ðxn ; yn Þ; yi 2 f1; 0g for specified expression and the other expressions, respectively. 2. Initialize weight Dt ðiÞ ¼ 1=N. 3. Get the dynamic features on each image sequence. 4. Code the dynamic features based on the corresponding code book, and get Ei;j . Build one weak classifier on each coded feature. 5. Use the Adaboost to get the strong classifier Hðxi Þ for expression recognition.

6. Experiment

to evaluate facial expression recognition algorithms. It consists of 100 university students aged from 18 to 30 years. 65% subjects are female, 15% are African–American, and 3% are Asian or Latino. Subjects are instructed to perform a series of 23 facial displays, six of which are prototypic emotions mentioned above. We select 96 subjects and 300 image sequences from the database, and each subject has six basic expression sequences. The six basic expression samples are shown in Fig. 7. We randomly select 60 subjects as the training set, and the rest subjects as the testing set. We use two strategies to organize the data set: one is to manually fix the eyes’ location and crop the images to exclude the effect of misalignment on the recognition result and another one is to automatically detect face (Paul and Michael, 2002) and do alignment to simulate the practical application. The images are cropped and normalized to 64  64 for both strategies. In Section 6.1, we use the manually cropped data set to investigate the characteristics of the proposed method. In Section 6.2, we use the automatically cropped data set to verify the performance of our method in the practical system. In (Shan et al., 2005; Whitehill and Omlin, 2006), they gave some recognition ratio on each expression or AU, however, the false alarm was ignored. To evaluate the performance completely, the ROC curve is used in our experiment to evaluate the performance. 6.1. Analysis on the characteristics of the proposed method In order to analyze the performance of our method without the influence of face misalignment (Shan et al., 2004), we take the manually cropped data set in this section, and compare the dynamic haar-like features with the dynamic gabor feature. In our method, there are only two parameters: T and L. T is threshold in formula (1) and L is the size of the window in Eq. (2). We use the last frame of each sequence to evaluate the performance of the static image-based method. 6.1.1. Evaluation on the dynamic haar-like and the dynamic gabor features In this experiment, we fix the window size L ¼ 3 Ei;j ¼ fC i;j ; C iþ1;j ; C iþ2;j g

ð3Þ

to evaluate the performance of the haar-like and gabor features, respectively. The influence of different L will be discussed in Section 6.1.2. We set T ¼ 1:65 for all the expression, because the standard normal gaussian distribution x  Nð0; 1Þ, Prðkxk 6 1:65Þ ¼ 95%. In the practical system, for each expression different optimal thresholds T should be used to get the best performance, the details about the influence of T on the recognition performance are talked about in Section 6.1.3. The area under the ROC curves of different features is shown in Table 1. We can see that the encoded dynamic features outperform the static features, and the dy-

Our experiments are conducted on the Cohn–Kanade facial expression database (James Lien et al., 1999), which is widely used

Table 5 The area under the ROC curves (expression) Expression

Static feature + AdaBoost

Coded dynamic feature + AdaBoost

Angry Disgust Fear Happiness Sadness Surprise

0.856 0.898 0.841 0.951 0.917 0.974

0.973 0.941 0.916 0.9913 0.978 0.998

Fig. 10. Example of the AUs.

137

P. Yang et al. / Pattern Recognition Letters 30 (2009) 132–139 ROC curve for AU 2 1

0.9

0.9

0.8

0.8

0.7

0.7 true positive rate

true positive rate

ROC curve for AU 1 1

0.6 0.5 0.4 0.3

0.5 0.4 0.3

0.2

0.2

0.1 0

0.6

0.1

Static feature Coded dynamic feature 0

0.2

0.4 0.6 false positive rate

0.8

0

1

Static feature Coded dynamic feature 0

0.2

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6 0.5 0.4 0.3

0.6 0.5 0.4

0.2

0.1

0.1

Static feature Coded dynamic feature 0

0.2

0.4 0.6 false positive rate

0.8

0

1

Static feature Coded dynamic feature 0

0.2

0.9

0.9

0.8

0.8

0.7

0.7

0.6 0.5 0.4 0.3

1

0.6 0.5 0.4

0.2

0.1

0.1

Static feature Coded dynamic feature 0

0.2

0.4 0.6 false positive rate

0.8

0

1

Static feature Coded dynamic feature 0

0.2

ROC curve for AU 14

0.4 0.6 false positive rate

0.8

1

ROC curve for AU 20

1

1

0.9

0.9

0.8

0.8

0.7

0.7 true positive rate

true positive rate

0.8

0.3

0.2

0.6 0.5 0.4 0.3

0.6 0.5 0.4 0.3

0.2

0.2

0.1 0

0.4 0.6 false positive rate ROC curve for AU 12

1

true positive rate

true positive rate

ROC curve for AU 10 1

0

1

0.3

0.2

0

0.8

ROC curve for AU 5

1

true positive rate

true positive rate

ROC curve for AU 4

0.4 0.6 false positive rate

0.1

Static feature Coded dynamic feature 0

0.2

0.4 0.6 false positive rate

0.8

1

0

Static feature Coded dynamic feature 0

0.2

0.4 0.6 false positive rate

0.8

1

Fig. 11. ROC curves of AUs based on coded dynamic features and static features.

namic haar-like features are a little better than the dynamic gabor features.

Although for the fear expression, the static feature beats the dynamic feature, it is not surprise because the fear expressions from

138

P. Yang et al. / Pattern Recognition Letters 30 (2009) 132–139

Table 6 The area under the ROC curves (AUs) AUs

Static feature + AdaBoost

Coded dynamic feature + AdaBoost

AU1 AU2 AU4 AU5 AU10 AU12 AU14 AU20

0.718 0.714 0.606 0.652 0.660 0.850 0.735 0.601

0.7715 0.784 0.8213 0.7386 0.663 0.8624 0.7748 0.7163

people are quite different. In the following two parts, we analyze the relation between the recognition result and the two parameters L and T. 6.1.2. Analysis of the window size L One of the key parameters in our method is the size of the temporal window, for it is directly related to the dynamic features. In this section, we set L ¼ 3; 4; 5; 6; 7 as the size of the temporal window to investigate the performance of our method. Tables 2 and 3 report the experiment results. We can see that different L has a slight effect on different expression. It is caused by the fact that different expression has different motion pattern, and different subjects have different facial motion speed, the example is shown in Fig. 8. However, the performances are acceptable over all the L values. For simplicity, we set L ¼ 3 for all the expressions in the following experiments. 6.1.3. Analysis of the threshold T Another parameter is the threshold T, which enforces the real value of feature to the binary value. In this part, we try to check the influence of the parameter T. We set the windows size as 3, and use the haar-like feature as the basic feature unit because in Section 6.1.2, we find that haar-like features are a little better than the gabor features. Different thresholds T ¼ f0:3; 0:6; 1:0; 1:2; 1:5; 1:65; 1:9; 2:4g are used to train the models, and the corresponding recognition results are shown in Table 4. From Table 4, we can clearly see that our dynamic features are not sensitive to the threshold T, except for the fear expression. Even the threshold decreases to 0.6, our method still works well and gets good performance. Similarly, the recognition result of the fear expression varies with different T perhaps because its variances are large. 6.2. Experiment on automatically cropped data set In this section, we test the proposed method on the automatically cropped data set, we use Viola’s (Paul and Michael, 2002) method to detect face and normalize the face automatically based on the location of the eyes. The L is set to 3 and T is 1.65. Fig. 9 shows the ROC curves of six expressions, and the area under the ROC curves is listed in the Table 5. We can see the ROC curves based on the encoded dynamic feature are much better than the ones based on the static feature. 6.3. Experiment on facial AUs In this section, we also extend the proposed method for facial AU recognition. According to Ekman’s group description method of facial action coding system (FACS) (Ekman et al., 2002), totally there are 71 primitive units, called action unit (AU). Based on them, any expression can be represented by a single AU or a combination of several AUs. We test our approach on facial AUs database which has 33 subjects. Because there are few samples for some AUs, in

our experiments, we just focus on eight AUs: AU1, AU2, AU4, AU5, AU10, AU12, AU14 and AU20. The examples are shown in Fig. 10. We randomly select 22 subjects as the training set, and the other 11 subject as the testing set. The ROC curves are reported in Fig. 11, we can clearly see the coded dynamic features are much better than the static features. And the area under the ROC curves is listed in Table 6. 7. Conclusions This paper presented a novel approach for video-based expression recognition, which is based on the encoded dynamic features. We first extracted dynamic haar-like features to capture the temporal information of facial expressions, and then we further coded them into the binary pattern features inspired by the binary pattern coding. Finally, the Adaboost was performed to learn a set of discriminating encoded features to construct the final classifier. Experiments on the CMU facial expression database and our own facial AU database showed that the proposed method has a promising performance.

References Bartlett, M., Littlewort, G., Fasel , I., Movellan, J., 2003. Real time face detection and facial expression recognition: Development and applications to human computer interaction. Proc. Internat. Conf. on CVPR Workshop on Computer Vision and Pattern Recognition for Human–Computer Interaction. Black, M.J., Yacoob, Y., 1997. Recognizing facial expressions in image sequences using local parameterized models of image motion. Internat. J. Comput. Vision. Cohen, I., Sebe, N., Chen, L., Garg, A., Huang, T., 2003. Facial expression recognition from video sequences: Temporal and static modeling. Comput. Vision Image Understand.. Daugman, J., 2003. Demodulation by complex-valued wavelets for stochastic pattern recognition. Internat. J. Wavelets Multiresolution Inform. Process.. Ekman, P., Friesen, W.V., Hager, J.C., 2002. Facial Action Coding System: Manual. Fasel, B., Luettin, J., 2003. Automatic facial expression analysis: A survey. Pattern Recognition 36, 259–275. Gu, Haisong, Ji, Qiang, 2004. Facial event classification with task oriented dynamic bayesian network. In: Proc. of Internat. Conf. on Computer Vision and Pattern Recognition. James Lien, Jenn-Jier, Kanade, Takeo, Cohn, Jeffrey, Li, C., 1999. Detection, tracking, and classification of action units in facial expression. J. Robotics Autonomous Syst.. Liu, Chengjun, Wechsler, Harry, 2002. Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process.. Ojala, Timo, Pietikainen, Matti, 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Machine Intell.. Pantic, Maja, Rothkrantz, Leon J.M., 2000. Automatic analysis of facial expressions: The state of the art. IEEE Trans. Pattern Anal. Machine Intell. 22 (12), 1424– 1445. Pantic, M., Rothkrantz, J., 2004. Facial action recognition for facial expression analysis from static face images. IEEE Trans. Syst. Man Cybernet.. Paul, Viola, Michael, Jones, 2002. Robust real-time object detection. Internat. J. Comput. Vision. Shan, Shiguang, Gao, Wen, Chang, Yizheng, Cao, Bo, Yang, Peng, 2004. Review the strength of gabor features for face recognition from the angle of its robustness to mis-alignment. In: Proc. ICPR. Shan, Caifeng, Gong, Shaogang, McOwan, Peter W., 2005. Conditional mutual information based boosting for facial expression recognition. In: British Machine Vision Conf. Tian, Y., 2004. Evaluation of face resolution for expression analysis. In: Proc. Internat. Conf. on CVPR Workshop on Face Processing in Video (FPIV’04). Tian, Y., Kanade, T., Cohn, J., 2002. Evaluation of gabor wavelet-based facial action unit recognition in image sequences of increasing complexity. In: Proc. Internat. Conf. on Automatic Face and Gesture Recognition (FG’02). Valstar, M.F., Patras, I., Pantic, M., 2005. Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data. In: Proc. Internat. Conf. on CVPR Workshop on Computer Vision and Pattern Recognition for Human–Computer Interaction. Whitehill, Jacob, Omlin, Christian W., 2006. Haar features for FACS AU recognition. In: Proc. of Internat. Conf. on Automatic Face and Gesture Recognition. Yacoob, Yaser, Davis, Larry, 1994. Computing spatio-temporal representations of human faces. In: Proc. of Internat Conf. on Computer Vision and Pattern Recognition.

P. Yang et al. / Pattern Recognition Letters 30 (2009) 132–139 Zeng, Zhihong, Pantic, Maja, Roisman, Glenn I., Huang, Thomas S., 2007. A survey of affect recognition methods: Audio, visual and spontaneous expressions. In: Internat Conf. on Multimodal Interfaces. Zhang, Zhengyou, Lyons, M., Schuster, M., Akamatsu, S., 1998. Comparison between geometry-based and gabor-wavelets-based facial expression recognition using

139

multi-layer perceptrong. In: Proc. Internat. Conf. on Automatic Face and Gesture Recognition. Zhao, G., Pietikainen, M., 2007. Texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Machine Intell..