Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns

Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns

ARTICLE IN PRESS Neurocomputing 71 (2008) 1824–1831 www.elsevier.com/locate/neucom Total variation norm-based nonnegative matrix factorization for i...

895KB Sizes 0 Downloads 81 Views

ARTICLE IN PRESS

Neurocomputing 71 (2008) 1824–1831 www.elsevier.com/locate/neucom

Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns Taiping Zhang, Bin Fang, Weining Liu, Yuan Yan Tang, Guanghui He, Jing Wen College of Computer Science, Chongqing University, Chongqing 400044, PR China Available online 29 February 2008

Abstract The low-rank approximation technique of nonnegative matrix factorization (NMF) is emerging recently for finding parts-based structure of nonnegative data based on minimizing least-square error (L2 norm). However, it has been observed that the proper norm for image processing is the total variation norm (TVN) other than the L2 norm, and image denoising methods applying TVN can preserve clearer local features, such as edges and texture than L2 norm. In this paper, we propose a robust TVN-based NMF algorithm for identifying discriminant representation of image patterns. We provide update rule in optimality search process and prove mathematically convergence of the iteration. Experimental results show that the proposed TVNMF is more effective to describe local discriminant representation of image patterns than NMF. r 2008 Elsevier B.V. All rights reserved. Keywords: Nonnegative matrix factorization; Total variation norm; Discriminant representation of image patterns

1. Introduction Data analysis is to reveal as low as possible dimensional structure of patterns observed in high dimensional spaces [11,5,13,20–22]. A fundamental problem in data analysis is to find a suitable low-rank representation of the data, an optimal low-rank representation typically makes latent structure in the data explicit. PCA and ICA [3,23] as data analysis methods aim at learning holistic, not parts-based, representation of data, where low-rank representation is achieved by discarding least significant components. The resulting components are global interpretations but these methods are unable to extract basis components manifesting local features such as image edges. Whereas local features are important for pattern recognition and classification, hence learning local parts-based representation of visual patterns has been a research hotspot in computer vision for a long time. Nonnegative matrix factorization (NMF) as a learning technique for local parts-based representation of patterns is recently developed for data analysis with increasingly Corresponding author. Tel.: +86 23 65112784; fax: +86 23 65102502.

E-mail addresses: [email protected] (T. Zhang), [email protected] (B. Fang). 0925-2312/$ - see front matter r 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2008.01.022

popular in dimension reduction, compression, feature extraction and computer vision applications [1,24]. The NMF method is designed to capture alternative structures inherent in the data, and possibly to provide more biological insight. For example, NMF can yield a decomposition of human faces into parts reminiscent of features such as lips, eyes, nose, etc. However, it has been observed that the proper norm for image processing is the total variation norm (TVN) and not the L2 norm [17,16], and the TVN approach can preserve finer scale image features, such as edges and texture than L2 norm in the application of image denoising [2]. In this paper, we propose a robust algorithm of NMF by minimizing TVN instead of original L2 norm (Euclidean distance) to identify discriminant image patterns. Compared with L2 norm-based NMF approach, the TVN-based NMF algorithm is able to present parts-based discriminant representation of image patterns. Reconstruction simulation for image patterns by the proposed method also showed advantage over original NMF technique. This paper is structured as follows. In Section 2 we describe the general NMF. Section 3 discusses how to incorporate TVN minimization into the NMF algorithm. Section 4 provides experimental results that verify our algorithm. Finally, conclusion is given in Section 5.

ARTICLE IN PRESS T. Zhang et al. / Neurocomputing 71 (2008) 1824–1831

1825

2. Nonnegative matrix factorization

3.1. Total variation norm

NMF is a linear, nonnegative data representation technique. Given the nonnegative n  m matrix V and the constant rank r, the NMF [9] finds a nonnegative n  r matrix W and another nonnegative r  m matrix H such that the product WH approximate to V, that is

We denote the Euclidean space RNN by X to define the discrete TVN, where N  N is the size of two-dimensional matrices of images. Now we introduce a discrete gradient operator. If u 2 X , the gradient ru is a vector given by

V  WH

or

V ij 

r X

ðruÞi;j ¼ ððdx uÞi;j ; ðdy uÞi;j Þ with

W ik H kj .

(

k¼1

This can be interpreted as follows: each column of matrix W contains a basis vector while each column of H contains the weights needed to approximate the corresponding column of V. Hence the product WH can be regarded as a compressed form of the data in V. The constant rank of the factorization W and H is generally chosen such that ðn þ mÞronm. NMF has become increasingly popular in machine learning, signal processing, pattern recognition and computer vision applications [1,24]. One reason for the popularity is that NMF can produce parts-based discriminant representation where PCA and ICA are difficult to interpret such results. Another is that NMF codes naturally favor sparse, the entries of W and H are generally combined additively (not subtracted). Hence, these constraints might be useful for extracting parts-based representation of image patterns with low feature dimensionality [7,8]. In order to find an approximate factorization V  WH, the optimal choice of matrices W and H are defined to be those nonnegative matrices that minimize the squared error (Euclidean distance) between V and WH as follows: X EðW ; HÞ ¼ kV  WHk2 ¼ ðV ij  ðWHÞij Þ2 . i;j

Although the minimization problem is convex for W and H, respectively, it is not convex for both simultaneously. To deal with the problem, the authors [10] devised an iteration rule that is somewhat simpler to implement and showed good performance: T

H am

H am

ðW V Þam ðW T WHÞam

ðdx uÞi;j ¼ ( ðdy uÞi;j ¼

uiþ1;j ; ui;j ;

ioN;

0;

i ¼ N;

ui;jþ1  ui;j ; 0;

joN; j ¼ N:

The TVN of u is defined by X JðuÞ ¼ jðruÞi;j j 1pi;jpN

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi with jxj ¼ x21 þ x22 for every x ¼ ðx1 ; x2 Þ 2 R2 . The TVN has been introduced in image processing first in Refs. [17,16]. The authors pointed out that the proper norm for image processing is the TVN and not the L2 norm. Methods on TVN have been widely used to solve denoising problems, and remain one of the active areas of research in mathematical image processing. TVN has two fundamental properties: edge-preserving and multiscale additive signal decomposition, the theoretical justification for the properties was provided in [19]. Compared with L2 norm, the TVN is able to preserve finer scale image features such as edges and texture while denoising [2]. This fine scale details offer advantages to pattern recognition. Fig. 1 demonstrates the comparison result by applying these two techniques for image denoising. It is obvious that restoration image by TVN approximation looks better than the ‘‘same’’ L2 approximation. The ‘‘same’’ means approximation procedure subject to the same constraints [17]. 3.2. NMF by minimizing TVN

T

;

W ia

W ia

ðVH Þia . ðWHH T Þia

(1)

In fact, the update rule for optimality search procedure is based on gradient descent computation.

3. NMF with minimizing TVN In this section, we describe the basic idea to incorporate TVN into NMF framework for extraction local discriminant representation of image patterns in details, and derive the whole algorithm including iteration rule and convergence proof for real-world image processing.

In order to gain desired characteristics, several researchers suggested extension and modification of the original NMF model, including auxiliary constraints on W and H or penalty terms to enforce auxiliary constraints, and extension of the cost function for original problem. Li [12] noted that NMF could only found global features from the ORL face image database and suggested an extension they call local nonnegative matrix factorization (LNMF). Hoyer [7,8] extended the NMF to include the option to control sparseness explicitly. In this paper, we replace the cost function (least-square error minimization) of basis model with minimizing TVN for identifying local detailed representation in image patterns. The model can be mathematically expressed as

ARTICLE IN PRESS T. Zhang et al. / Neurocomputing 71 (2008) 1824–1831

1826

3.3. The iteration rule Similar to update rule (1) in [10], we drive an algorithm to solve Eq. (2) with its constraints as follows: ¼ H kij  jij H kþ1 ij

qEðW k ; H k Þ , qH ij

W kþ1 ¼ W kij  yij ij

qEðW k ; H k Þ qW ij

8i; j.

The algorithm is thus based on an alternating gradient descent method. For updating H kij , the step size jij is chosen as k H^ ij k

k

ðð1 þ lÞðW k ÞT W k H^ þ lðdy W k ÞT ðdy W k ÞH^ Þij þ e where k H^ ij ¼

Fig. 1. Image restoration results base on L2 norm and TVN. (a) Original 256  256 Lena image. (b) Original image with Gaussian noise (0, 0.004). (c) Reconstruction image by Winner filter (base on L2 norm minimization). (d) Reconstruction image by TVN minimization. Notice that the discontinuities (fine scale details) are much clearer in TVN reconstruction than L2 norm.

8 < H kij ;

rH EðW k ; H k Þij X0;

: s;

rH EðW k ; H k Þij o0;

,

both s and e are pre-defined small positive numbers. The k definition of H^ ij , similar to [14], is used for avoiding the case that H kij is not changed when the numerator of the step size is zero, and the gradient rH EðW k ; H k Þij o0. Similarly, we can calculate W kþ1 by defining jij . In order to facilitate writing, ij we define the following mark: k ðstepH numerator Þij ¼ ðð1 þ lÞðW k ÞT W k H^ k þ lðdy W k ÞT ðdy W k ÞH^ Þij þ e,

follows. For N  M matrix V Min

k

J 2 ðWHÞ

^ H k ðH k ÞT ðstepW numerator Þij ¼ ðð1 þ lÞW

s.t. V  WH, W ia X0; H bj X0 8i; a; b; j, X W ia ¼ 1 8a,

k

^ ðdx H k Þðdx H k ÞT Þij þ e. þ lW These consideration lead us to define the corresponding iteration scheme for optimal searching of the proposed TVNNMF model in Algorithm 1.

i

where X

JðWHÞ ¼

jðrWHÞi;j j

1pipN 1pjpM

is TVN of WH. This optimization problem is equivalent to the following optimization: 1 kV  WHk2 þ lJ 2 ðWHÞ 2 X 1X ¼ ðV ij  ðWHÞij Þ2 þ l jrðWHÞi;j j2 , 2 i;j i;j

EðW ; HÞ ¼

Min s.t.

EðW ; HÞ W ia X0; H bj X0 8i; a; b; j, X W ia ¼ 1 8a,

Algorithm 1. 1. Given 0olp1, d40 and s40. Initialize W 1ij X0; H 1ij X0; 8i; j: 2. For k ¼ 1; 2; . . . ; N, where N is a enough large number to guarantee convergence.k H^ ij H kþ1 ¼ H kij  rH EðW k ; H k Þij ij ðstepH numerator Þij W kþ1 ij

¼

W kij



^k W ij ðstepW numerator Þij

rW EðW k ; H k Þij

end. (2)

i

where l is a positive parameter which controls the tradeoff between goodness of fit to the data matrix V.

3.4. Convergence proof and sure optimality In order to prove convergence of the proposed iteration rule for TVN-based NMF algorithm, firstly, we introduce an auxiliary function used in [10].

ARTICLE IN PRESS T. Zhang et al. / Neurocomputing 71 (2008) 1824–1831

Definition 1. Gðh; h0 Þ be an auxiliary function for EðhÞ, if the conditions Gðh; h0 ÞXEðhÞ and Gðh; hÞ ¼ EðhÞ are satisfied.

Then consider Gðh ; hk Þ  Eðhk Þ ¼ ðh  hk ÞT ðKðhk Þ  W T W  lðW T W þ ðdy W ÞT ðdy W ÞÞÞðh  hk Þ,

Lemma 1. If G is an auxiliary function, then E is nonincreasing under the update rule

due to 0pGðh ; hk Þ  Eðhk ÞpEðhk Þ  Eðh Þ; taking the limit of the above inequality we obtain

hkþ1 ¼ arg min Gðh; hk Þ. h

Consider any column of H, we define k

k

k T

k

Gðh; h Þ ¼ Eðh Þ þ ðh  h Þ rEðh Þ þ 12ðh  hk ÞT Kðhk Þðh  hk Þ, EðhÞ ¼

1X 2

i

vi 

X

!2 W ia ha

a

1827

 ! 2  X  X  þl W ia ha  , r   a i

where

lim ðh  hk ÞT ðKðhk Þ  W T W

k!1

 lðW T W þ ðdy W ÞT ðdy W ÞÞÞðh  hk Þ ¼ 0. Refer to the proof of Lemma 2 in [10], we immediately know that Kðhk Þ  W T W  lðW T W þ ðdy W ÞT ðdy W ÞÞ is positive definite, then we have

K ab ðhk Þ ¼ dab ðW T Whk Þ=hka þ ldab ðððdy W ÞT ðdy W Þhk þ W T Whk Þa þ eÞ=hka . Similar to [10], it is easy to verify that the above definition Gðh; hk Þ is an auxiliary function for EðhÞ, which can derive the update rule of Algorithm 1.

lim ðh  hk Þ ¼ 0.

k!1

Due to h is any column of H, hence lim H k ¼ H  .

k!1

Lee and Seung in [10] proved that the update rule (1) causes the objective function value to be nonincreasing by an auxiliary function, but which does not preclude descent to a saddle point. While the objective function EðW ; HÞ is convex for W only or H only, it is not convex in both variables together. Thus it is difficult to find a global minimum, but the stationarity is still important, which is a necessary condition of a local minimum. The authors [6] presented numerical examples where the update rule (1) fails to approach a stationary point. Next we will show that the update rule of Algorithm 1 converge to a stationary point. Due to Lemma 1, it is easy to know that the cost function E is nonincreasing under the update rule of Algorithm 1, and the sequence fEðW k ; H k Þg generated by Algorithm 1 is a bound decreasing sequence, thus it is equal to say that fEðW k ; H k Þg is a convergent sequence. Theorem 1. If Algorithm 1 generates an infinite sequence W k ; H k , then W k ; H k is still a convergent sequence. The proposition is equal to assume limk!1 EðW k ; H k Þ ¼ EðW  ; H  Þ, then limk!1 H k ¼ H  , and limk!1 W k ¼ W  . Proof. Due to fEðW k ; H k Þg is a convergent sequence, thus

Similarly, we can prove that lim W k ¼ W  :

Next we are ready to prove that at any limit point (W  ; H  ), the matrix H  satisfies Karush–Kuhn–Tucker (KKT) optimality conditions [4]. Theorem 2. If ðW  ; H  Þ is a local minimization of the objective functional E in (2), then ðW  ; H  Þ satisfies the following KKT optimality conditions. 1. W ij X0; H ij X0; 8i; j. 2. W ij  rW EðW  ; H  Þij ¼ 0 and H ij  rH EðW  ; H  Þij ¼ 0; 8i; j: 3. rW EðW  ; H  Þij X0 and rH EðW  ; H  Þij X0; 8i; j: Proof. 1. Consider H kij 8i; j, when k ¼ 1, this theorem holds by initial conditions. Using induction, assuming W kij X0 and H kij X0, 8i; j are true at k. Consider

lim EðW k ; H k Þ  EðW k ; H  Þ ¼ 0.

k!1

Consider any column of H, we have

&

k!1

H kþ1 ij

¼

H kij



k H^ ij

ðstepH numerator Þij

rH EðW k ; H k Þij ,

lim Eðhk Þ  Eðh Þ ¼ 0.

k!1

From Lemma 1, Gðh; hk Þ is an auxiliary function for EðhÞ, we have Eðh ÞpGðh ; hk ÞpGðhk ; hk Þ ¼ Eðhk Þ.

_rH EðW k ; H k Þij ¼  ðW k ÞT ðV  W k H k Þij þ lððW k ÞT dx ðW k H k Þ þ ðdy W k ÞT dy ðW k H k ÞÞij ,

ARTICLE IN PRESS T. Zhang et al. / Neurocomputing 71 (2008) 1824–1831

1828

k H kþ1 ij XH ij

ððW k ÞT V Þij ðstepH numerator Þij

lððW k ÞT W k H k þ ðry W k ÞT ðry W k ÞH k Þij þ H kij ðstepH numerator Þij  H kij

lððW k ÞT dx ðW k H k Þ þ ðdy W k ÞT dy ðW k H k ÞÞij , ðstepH numerator Þij

Due to the definition of dx ðW k H k Þ, then ðW k H k  dx ðW k H k ÞÞX0. k Hence H kþ1 ij X0. The proof of W ij X0 is similar.   2. Considering H ij  rH EðW ; H  Þij . From Theorems 1 and 2, we have

lim H kþ1  H kij ¼ ij

k!1

_dy ðW k H k Þ ¼ ðdy W k ÞH k , H kþ1 ij

ððW k ÞT V Þij ¼ H kij ðstepH numerator Þij þ H kij

lððW k ÞT ðW k H k  dx ðW k H k ÞÞÞij . ðstepH numerator Þij

H ij  rH EðW  ; H  Þij ðstepH numerator Þij

¼ 0.

Thus H ij  rH EðW  ; H  Þij ¼ 0. Similarly we can use the same proof for W ij  rW EðW  ; H  Þij ¼ 0. 3. Using similar conclusions in [15].

Fig. 2. The first row is 25 images of ORL, CBCL and Yale database, respectively (from left to right). The second row is the results of face learning on ORL face database, and all the images are displayed as a whole, NMF (left), LNMF (center), TVN-NMF (right). The third row is the results of face learning on CBCL face database, and all the images are displayed as a whole, NMF (left), LNMF (center), TVN-NMF (right). The last row is the results of face learning on Yale face database, and all the images are displayed, respectively, NMF (left), LNMF (center), TVN-NMF (right).

ARTICLE IN PRESS T. Zhang et al. / Neurocomputing 71 (2008) 1824–1831

1829

Fig. 3. The first row are reconstruction images by NMF (left), LNMF (center) and TVN-NMF (right) with dimension 48, the number of iteration steps is under 100, the PSNR is 24.9, 13.4 and 24.7, respectively. The second row are reconstruction images by NMF (left), LNMF (center) and TVN-NMF (right) with 2000 iteration, the PSNR is 27.79, 14.31 and 28.41, respectively. This demonstration implies that NMF and LNMF discard more image details than TVN-NMF.

If H ij 40, then rH EðW  ; H  Þij ¼ 0. And if H ij ¼ 0; then rH EðW  ; H  Þij X0. Thus, we have rH EðW  ; H  Þij X0:

&

We have proved the convergence of the update rule and sure optimality for the proposed TVN-NMF method which can be used in image processing. 4. Experiments In this section, we show that NMF by minimizing TVN can be more effective in identifying and extracting discriminant local details for image patterns than standard NMF and LNMF [12]. The first experiment was performed for face learning on different face database. In the second, tests were performed under same condition and the goal was to compare TVN-NMF with NMF and LNMF in reconstruction accuracy. 4.1. Identifying discriminant representation of image patterns NMF is such a method that it is able to yield decomposition of real-world images as human faces into parts-based features such as lips, eyes, nose, etc. Fig. 2 shows the results of face learning on ORL [18], CBCL [25]

and Yale face database,1 for ORL and CBCL database, all images were represented as a whole and single image was depicted, respective, on Yale database. As can be seen, TVN-NMF produces a parts-based and localized representation of the data similar to NMF and LNMF on CBCEL database. However, on ORL database, it is obvious that the TVN-NMF discriminant representation is both parts-based and local detail preserving, whereas NMF is parts-based but holistic and LNMF is only localized parts-based. Due to inherent characteristics of local detail preserving, the TVN-NMF is able to detect expression manifold structure rather than NMF and LNMF. 4.2. Reconstruction NMF is used as a low-rank approximation technique. As the dimensionality decreases, more detailed information of image patterns is lost. In comparison with NMF and LNMF, TVN-NMF can preserve image detailed information. Fig. 3 shows reconstruction results for Lena image ð256  256Þ using TVN-NMF, NMF and LNMF for relative dimensions. As can be seen, TVN-NMF can be better approximate to original image than NMF and LNMF. Due to TVN-NMF can preserve local details, the reconstruction image by TVN-NMF has higher fidelity than NMF with even fewer PSNR. In order to evaluate the performance quantitatively, we defined the approximate error as the Euclidean distance between the original image data and the reconstruction data. Fig. 4 1 Yale Univ. Face Database, http://cvc.yale.edu/projects/yalefaces/yalefaces. html, 2002.

ARTICLE IN PRESS T. Zhang et al. / Neurocomputing 71 (2008) 1824–1831

1830 X107

References

2.2 2

NMF TVN-NMF LNMF(err./10)

Approximate error

1.8 1.6 1.4 1.2 1 0.8 0.6 0

200

400

600

800

1000 1200 1400 1600 1800 2000

Number of iteration step

Fig. 4. Approximation error versus number of iteration step using NMF, TVN-NMF and LNMF (Error/10) with compressive dimension 48 for Lena image (256  256).

shows the approximate error curves of TVN-NMF, NMF and LNMF under different iterative steps, it is demonstrated that NMF converges faster in the beginning few iterations, but as the number of iteration steps increases, TVN-NMF is most approximate to original data, while LNMF has much approximate error with even many iteration. As can be seen, TVN-NMF is more effective on image decomposition than NMF and LNMF. 5. Conclusion In this paper, we propose a robust NMF method based on minimizing TVN to identify local discriminant representation of image patterns. We design the frame model and iteration rule for searching optimal solution. The mathematical proof has been provided for convergence of the iteration rule and sure optimal solution. Compared with NMF, TVN-NMF demonstrates to be more effective to identify local discriminant representation of image patterns due to the inherent characteristics of TVN for image processing. This advantage makes the proposed method potential for pattern classification with high dimension problem such as face recognition. Acknowledgment The authors would like to thank the associate editor and reviewers for helpful comments that greatly improved the paper. This work is supported by the Program for New Century Excellent Talents of Educational Ministry of China (NCET-06-0762), Natural Science Foundations of Chongqing CSTC (CSTC2007BA2003 and CSTC2006BB2003).

[1] B.J. Shastri, M.D. Levine, Face recognition using localized features based on non-negative sparse coding, Mach. Vision Appl. 18 (2) (2007) 107–122. [2] T. Chan, S. Esedoglu, F. Park, A. Yip, Recent developments in total variation image restoration, in: Handbook of Mathematical Models in Computer Vision, Springer-Verlag, 2005, pp. 17–30. [3] P. Comon, Independent component analysis a new concept?, Signal Process. 36 (1994) 287–314. [4] D.P. Bertsekas, Nonlinear Programming, second ed., Athena Scientific, Belmont, MA, 1999. [5] D.L. Donoho, M. Vetterli, R.A. DeVore, I. Daubechies, Data compression and harmonic analysis, IEEE Trans. Inf. Theory 44 (6) (1998) 2435–2476. [6] E.F. Gonzales, Y. Zhang, Accelerating the Lee–Seung algorithm for non-negative matrix factorization, Technical Report, Department of Computational and Applied Mathematics, Rice University, 2005. [7] P.O. Hoyer, Non-negative sparse coding, in: Proceedings of IEEE Workshop on Neural Networks for Signal Processing, 2002. [8] P.O. Hoyer, Nonnegative matrix factorization with sparseness constraints, J. Mach. Learn. Res. 5 (2004) 1457–1469. [9] D.D. Lee, H.S. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 401 (1999) 788–791. [10] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, Process. Adv. Neural Inform. (2000). [11] K.C. Li, Sliced inverse regression for dimension reduction, J. Am. Stat. Assoc. 86 (1991) 316–342. [12] S.Z. Li, X.W. Hou, H.J. Zhang, Q.S. Cheng, Learning spatially localized, parts-based representation, IEEE Comput. Vision Pattern Recognition (2001). [13] X. Li, S. Lin, S. Yan, D. Xu, Discriminant locally linear embedding with high order tensor data, IEEE Trans. Syst. Man Cybern. Part B 38 (2) (2008). [14] C.-J. Lin, Projected gradient methods for non-negative matrix factorization, Neural Comput. 19 (2007) 2756–2779. [15] C.-J. Lin, On the convergence of multiplicative update algorithm for non-negative matrix factorization, IEEE Trans. Neural Networks 18 (6) (2007) 1589–1596. [16] L. Rudin, Images, numerical analysis of singularities and shock filters, Caltech, C.S. Dept. Report TR: 5250:87, 1987. [17] L.I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D 60 (1992) 259–268. [18] F. Samaria, A. Harter, Parameterisation of a stochastic model for human face identification, in: Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota, FL, December 1994. [19] D. Strong, T. Chan, Edge-preserving and scale-dependent properties of total variation regularization, Inverse Problems 19 (2003) 165–187. [20] D. Tao, X. Li, X. Wu, S.J. Maybank, General tensor discriminant analysis and Gabor features for gait recognition, IEEE Trans. Pattern Anal. Mach. Intell. 29 (10) (2007) 1700–1715. [21] D. Tao, X. Li, W. Hu, S.J. Maybank, X. Wu, Supervised tensor learning, Knowledge Inf. Syst. (Springer: KAIS) 13 (1) (2007) 1–42. [22] D. Tao, M. Song, X. Li, J. Shen, J. Sun, X. Wu, C. Faloutsos, S.J. Maybank, Bayesian tensor approach for 3D face modelling, IEEE Trans. Circuits Syst. Video Technol. 18 (2008), in press. [23] M. Turk, A.P. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci. 3 (1) (1991) 71–86. [24] Y. Wang, Y. Jia, C. Hu, M. Turk, Fisher non-negative matrix factorization for learning local features, Int. J. Pattern Recognition Artif. Intell. 19 (4) (2005) 495–511. [25] B. Weyrauch, J. Huang, B. Heisele, V. Blanz, Component-based face recognition with 3D morphable models, in: First IEEE Workshop on Face Processing in Video, Washington, DC, 2004.

ARTICLE IN PRESS T. Zhang et al. / Neurocomputing 71 (2008) 1824–1831 Taiping Zhang received his B.Sc. and M.Sc. degrees in computational mathematics from Chongqing University in 1999 and 2001, respectively. Since 2005, he is currently a doctoral student in the Department of Computer Science at Chongqing University. His research interests include pattern recognition, image processing, machine learning, and computational mathematics.

Bin Fang received the B.Eng. degree in electrical engineering from Xi’an Jiaotong University, Xi’an, China, the M.Sc. degree in electrical engineering from Sichuan University, Chengdu, China, and the Ph.D. degree in electrical engineering from the University of Hong Kong, Hong Kong, China. He is currently a Professor in the Department of Computer Science at Chongqing University. His research interests include computer vision, pattern recognition, medical image processing, biometrics applications, and document analysis. Weining Liu received her M.Sc. and Ph.D. degrees in computer science from Chongqing University in 1989 and 1999, respectively. She is currently a Professor in the Department of Computer Science at Chongqing University. Her research interests include information security, computer networks and communications.

Yuan Yan Tang received the B.S. degree in electrical and computer engineering from Chongqing University, Chongqing, China, the M.Eng. degree in electrical engineering from the Graduate School of Post and Telecommunications, Beijing, China, and the Ph.D. degree in computer science from Concordia University, Montreal, Canada. He is presently a Professor in the Department of Computer Science at Chongqing University

1831

and a Chair Professor in the Department of Computer Science at Hong Kong Baptist University and Adjunct Professor in computer science at Concordia University. He is an Honorary Lecturer at the University of Hong Kong, an Advisory Professor at many institutes in China. His current interests include wavelet theory and applications, pattern recognition, image processing, document processing, artificial intelligence, parallel processing, Chinese computing and VLSI architecture. He has published more than 250 technical papers and is the author/co-author of 21 books/bookchapters on subjects ranging from electrical engineering to computer science. He has serviced as General Chair, Program Chair and Committee Member for many international conferences. Professor Tang will be the General Chair of the 19th International Conference on Pattern Recognition (ICPR’06). He is the Founder and Editor-in-Chief of International Journal on Wavelets, Multiresolution, and Information Processing (IJWMIP) and Associate Editors of several international journals related to Pattern Recognition and Artificial Intelligence. Guanghui He received the B.E. degree in computational mathematics and the M.Sc. degree in computer science from Chongqing university, Chongqing, China in 1999 and 2003, respectively. He is currently working toward the Ph.D. degree at the School of Computer of Chongqing University. He is currently a Teacher in the College of Mathematics and Physics Science at Chongqing University. His research interests include pattern recognition, image processing, and biometrics applications. Jing Wen received the B.E. and M.E. degrees in computer science from Southwest Petroleum University, Nanchong, China in 2000 and 2003, respectively. She is currently working toward the Ph.D. degree at the College of Computer of Chongqing University. Her research interests include pattern recognition, biometrics, etc.