tensor sparse decomposition

tensor sparse decomposition

Accepted Manuscript Parameter selection for nonnegative l1 matrix/tensor sparse decomposition Yiju Wang, Wanquan Liu, Louis Caccetta, Guanglu Zhou PII...

273KB Sizes 1 Downloads 115 Views

Accepted Manuscript Parameter selection for nonnegative l1 matrix/tensor sparse decomposition Yiju Wang, Wanquan Liu, Louis Caccetta, Guanglu Zhou PII: DOI: Reference:

S0167-6377(15)00078-4 http://dx.doi.org/10.1016/j.orl.2015.06.005 OPERES 5962

To appear in:

Operations Research Letters

Received date: 6 October 2014 Revised date: 26 January 2015 Accepted date: 8 June 2015 Please cite this article as: Y. Wang, W. Liu, L. Caccetta, G. Zhou, Parameter selection for nonnegative l1 matrix/tensor sparse decomposition, Operations Research Letters (2015), http://dx.doi.org/10.1016/j.orl.2015.06.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

*Manuscript Click here to view linked References

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Parameter Selection for Nonnegative l1 Matrix/Tensor Sparse Decomposition Yiju Wanga∗ Wanquan Liub a

Louis Caccettac

Guanglu Zhouc

School of Management Science, Qufu Normal University, Rizhao Shandong, China

b

Department of Computing, Curtin University, Perth, Western Australia.

c

Department of Mathematics & Statistics, Curtin University, Perth, Western Australia.

Abstract For the nonnegative l1 matrix/tensor sparse decomposition problem, we derive a threshold bound for the parameters beyond which all the decomposition factors are zero. The obtained result provides a guideline on selection for l1 regularization parameters and extends the corresponding result on Lasso optimization problem. Keywords: Sparse decomposition; Regularization parameter; Global optimal solution; threshold bound; Zero solution.

1.

Introduction

Obtaining a low-rank matrix from a given multi-dimensional data is a classical feature extraction process in data mining, which is usually formulated as a low-rank matrix decomposition(approximation) problem [4]. For example, for a set of n-dimensional observations, principal component analysis (PCA) amounts to computing the singular decomposition of the data matrix and projecting the n-dimensional data along several principal orthogonal eigenvectors [9]. While, the low-rank tensor decomposition(approximation) based on multi-linear algebra like CANDECOMP/PARAFAC (CP) and Tucker models [4] provides a unified framework for higherorder data analysis [4, 7]. It should be noted that the density of the latent factors in matrix/tensor low-rank decomposition may destroy the supporting information behind the data and hence the decomposition can not provide sufficient information [15]. For example, for a gene expression data set with 5000 genes for cancer patients, PCA can give a low dimensional representation which can help cluster cancer versus healthy patients [8]. However, in reality, we do not know in advance which genes should be expressed and hence the dense factors can not provide sufficient information. To search the interpretable factors, sparsity needs to be imposed on the latent factors so that one can associate cancer versus no cancer with a small group of genes, and this results in matrix/tensor sparse decompositions [5]. Now, the sparsity strategy is widely used in signal processing, biostatistics, etc. [13, 14, 15]. Another example of applications in matrix/tensor sparse decompositions is clustering, which is widely used in information retrieval, databases, text and data mining, bioinformatics, market-basket analysis, and so on [2, 10]. By grouping the data, the clustering can identify distinctive “checkerboard” patterns for a given data and hence some useful features can be extracted from the mass data. In essence, clustering is to group a set of objects (represented typically by a set of feature vectors) into distinct classes which can be modeled as partitioning the data set into clusters such that the feature vectors falling in the same cluster are close to each other and the vectors in different clusters are far away from each other [10, 12]. The clustering problem can also be formulated as matrix/tensor sparse decompositions [12]. It is well known that l1 regularization is an efficient way to control the sparsity of latent factors in sparse optimization and the strategy is widely used in signal processing [3, 14] and biostatistics [15]. In traditional l1 regularization problem, such as compress sensing [16], speech emotion recognition [17], the regularization parameter selection is investigated theoretically [6, 11]. However, for l1 regularized matrix/tensor nonnegative sparse ∗ Corresponding author. Tel.: +86 86-633-3980468 E-mail addresses: [email protected] (Y.Wang), [email protected] (W. Liu), [email protected] (L. Caccetta), g.zhou@curtin. edu.au (G.Zhou).

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

decompositions, the regularization parameter selection has not been investigated systemically in the literature. Based on this observation, we consider the l1 regularization parameter selection for the matrix/tensor nonnegative sparse decomposition in this paper. More precisely, we provide a threshold bound of the regularization parameters beyond which all the optimal decomposition factors are zero. The obtained result provides a guideline on selection for l1 regularization parameters. Furthermore, the result improves the norm balancing property for l1 regularization parameters [12] and extends the corresponding results on the original Lasso function [11]. To end this section, we present some notations used in this paper. Throughout this paper, we use lowercase letter, say x, to denote scalars, bold lowercase letter, say x, to represent vectors, and xi to denote i-th entry of vector x. We use bold uppercase letter, say A, to denote a matrix, and use Aij to denote the ij-th entry of matrix A. We use bold and calligraphic letter, say A, to denote a higher order tensor. We use x(1) ◦ x(2) ◦ · · · ◦ x(m) (i) or ◦m to denote the out-product of vectors x(1) ∈ Rn1 , x(2) ∈ Rn2 , · · · , x(m) ∈ Rnm with i1 i2 · · · im -th entry i=1 x (m) (1) (2) xi1 xi2 · · · xim , 1 ≤ ij ≤ nj , j = 1, 2, · · · , m. We use k · k1 to denote l1 -norm of a vector or a matrix which refers to the sum of the absolute value of the entries, use k · k2 to denote the l2 -norm of a vector, and use k · k∞ to denote the maximum of the absolute value of the vector entries. The inner product of two matrices A, B ∈ Rm×n or two higher-order tensors A, B ∈ Rn1 ×···×nm is defined as hA, Bi =

m,n X

hA, Bi =

Aij Bij ,

i,j=1

n1 ,··· X,nm

i1 ,··· ,im =1

Ai1 i2 ···im Bi1 i2 ···im .

The Frobenius norms of matrix A and tensor A are respectively defined as p p kAkF = hA, Ai, kAkF = hA, Ai.

2.

Sparsity analysis of latent factors in matrix/tensor decomposition

As a powerful tool in data analysis with multiple arrays, tensor has received much attention of researchers and from its similarities to matrix, tensor decomposition is proposed by exploring their multilinear algebra properties [4]. As massive amounts of data often lead to limitations and challenges in analysis, sparsity is often imposed on the latent factors to improve the analysis and inference learning [13]. Mathematically, the nonnegative sparse tensor decomposition is formulated as follows [12], min kA −

 (i,j) 2 ◦m kF , i=1 x

K P

j=1

s.t. x(i,j) ∈ Rni is nonnegative and sparse, i = 1, · · · , m; j = 1, · · · , K, where tensor A ∈ Rn1 ×···nm is the given data which is often nonnegative. As the problem is NP-hard [1], it can be relaxed as follows by introducing l1 regularization into the model, min

X(i,j) ≥0,i=1,··· ,m;j=1,2,··· ,K

kA −

K X j=1

m X  (i,j) 2 λ(i) kX(i) k1 , k + ◦m x F i=1

(2.1)

i=1

where positive numbers λ(i) , i = 1, 2, · · · , m are regularization parameters used to control the sparsity of latent factors X(i) = (x(i,1) , x(i,2) , · · · , x(i,K) ), i = 1, 2, · · · , m. If K = 1, then problem (2.1) reduces to min

x(i) ≥0,i=1,··· ,m

(i) 2 kA − ◦m i=1 x kF +

m X i=1

λ(i) kx(i) k1 .

(2.2)

If the tensor order is 2, then tensor A reduces to matrix A and problem (2.1) reduces to the following nonnegative matrix sparse decomposition arising in data mining such as PCA [5, 8] and co-clustering [12], min

X,Y≥0

kA − XY ⊤ k2F ,

s.t. X ∈ RI×K , Y ∈ RJ×K are both sparse.

Correspondingly, its l1 relaxed form is as follows

min kA − XY ⊤ k2F + λx kXk1 + λy kYk1 ,

X,Y≥0

2

(2.3)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

where X = (x(1) , x(2) , · · · , x(K) ) ∈ RI×K and Y = (y(1) , y(2) , · · · , y(K) ) ∈ RJ×K . Furthermore, if matrix XY ⊤ is of rank 1, i.e., K = 1, then matrices X, Y reduces to vectors x ∈ RI , y ∈ RJ and problem (2.3) reduces to min kA − xy ⊤ k2F + λx kxk1 + λy kyk1 ,

(2.4)

x,y≥0

which is reminiscent of the regularization form of the Lasso model in compress sensing [3, 6], min kAx − bk22 + λkxk1 . x

In the Lasso model, the Lagrangian multiplier λ is served as the regularization factor to control the sparsity of latent vector x and it was shown that when the regularization factor λ > 0 is sufficiently large, or more precisely, if λ ≥ 2kA⊤ bk∞ , then its solution is a zero vector [11]. A similar question for models (2.1) and (2.3) is proposed naturally: does the sufficient large values of the regularization factors guarantee that the optimality solutions of problems (2.1) and (2.3) are zero? and what’s the relation of the regularization parameters in controlling the sparsity of the latent factors? To investigate these problems, we first present the following norm-balancing property of problem (2.1) established in [12] and the assumptions needed in subsequent analysis. Lemma 2.1 For any optimal solution (X(1) , · · · , X(m) ) of problem (2.1) with regularization factor (λ(1) , · · · , λ(m) ), it holds that λ(1) kX(1) k1 = · · · = λ(m) kX(m) k1 . Assumption 2.1 For any fixed positive regularization factor (λ(1) , · · · , λ(m) ), the optimization problems (2.1) and (2.2) both have a unique solution. In the following analysis, we will first consider problem (2.2) and then extend the obtained results to problem (2.1). Lemma 2.2 Suppose Assumption 2.1 hold. Let (x(1) , · · · , x(m) ) and (y(1) , · · · , y(m) ) be the optimal solutions m Q (i) (1) (m) (1) (m) λx = of problem (2.2) respectively with regularization factors (λx , · · · , λx ) and (λy , · · · , λy ) such that m Q

i=1

(i) λy .

i=1

value.

Then kA −

(i) 2 ◦m i=1 x kF

= kA −

(i) 2 ◦m i=1 y kF

(1)

and problem (2.2) has the same optimal objective function

(2)

(m)

(1)

(2)

(m)

For given regularization factors (λx , λx , · · · , λx ) and (λy , λy , · · · , λy ), from the assumption on m Q (i) (i) ti = 1. By the assumption, them, there exist ti , i = 1, 2, · · · , m such that λx = ti λy and Proof.

i=1

(i) 2 kA − ◦m i=1 x kF +

m P

i=1

Similarly, kA −

(i) 2 ◦m i=1 y kF

+

Thus, (i) 2 kA − ◦m i=1 x kF +

From Assumption 2.1, ti x

(i)

(i)

λx kx(i) k1 =

i=1

i=1

(i) 2 kA − ◦m i=1 y kF +



m X

m P

(i) 2 kA − ◦m i=1 ti x kF +

(i) λ(i) y ky k1

≥ kA −

(i)

λx kx(i) k1 =

(i) 2 ◦m i=1 x kF

+

(i) 2 kA − ◦m i=1 y kF +

= y , i = 1, 2, · · · , m. Hence (i)

m X

(i) 2 m (i) 2 kA − ◦m i=1 y kF = kA − ◦i=1 x kF ,

i=1

and the proof is completed. 3

i=1

m P

i=1

i=1

i=1 m P

λy(i) ky(i) k1 =

(i)

λx(i) kx(i) k1 .

m P

i=1

(i)

λy kti x(i) k1

λy ky(i) k1 .

m X

(i) 2 kA − ◦m i=1 ti x kF +

=

m P

(i)

λy kti x(i) k1 (i)

λy ky(i) k1 .

m X i=1

λx(i) kx(i) k1 , 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

(i) For the optimal value kA − ◦m i=1 x kF of problem (2.2), it can also be shown that it is nondecreasing with m Q (i) λx . respect to the regularization factor product i=1

Based on Lemmas 2.1 and 2.2, we can establish the following significant conclusion for problem (2.2).

Proposition 2.1 Under Assumption 2.1, the optimal solution of problem (2.2) is zero provided that (

1

λ(i) ) m >

i=1

1 kAk2F }. max{2kAkF + 2, m

Proof.

m Q

From Lemma 2.2, we only consider the case that λ(i) = λ , (

m Q

1

λ(i) ) m , i = 1, 2, · · · , m. For λ ≥

i=1 (i)

we claim that the optimal solution of problem (2.2) satisfies that kx k1 6 1, i = 1, 2, · · · , m, otherwise, by the norm balance property i.e., Lemma 2.1, kx(i) k1 > 1, i = 1, 2, · · · , m, and hence the optimal solution of problem (2.2) is zero as 1 2 m kAkF ,

(i) 2 kA − ◦m i=1 x kF +

m X i=1

λkx(i) k1 ≥

m X i=1

λkx(i) k1 > kAk2F +

As the the objective function in problem (2.2) can be written as (i) 2 kA − ◦m i=1 x kF +

where Ax(1) · · · x(m) =

m X i=1

n1 ,n2P ,··· ,nm i1 ,i2 ,··· ,im

for some 1 ≤ j ≤ m, one has

λkx(i) k1 = kAk2F − 2Ax(1) · · · x(m) + (1)

m Y

i=1

m X

λk0k1 .

i=1

kx(i) k22 +

m X i=1

λkx(i) k1 ,

(m)

Ai1 i2 ···im xi1 · · · xim . From the optimal condition of problem (2.2) w.r.t. x(j)

−2Ax(1) · · · x(j−1) x(j+1) · · · x(m) + 2

m Y

i=1,i6=j

kx(i) k22 x(j) + λξ = µ,

(2.5)

where Ax(1) · · · x(j−1) x(j+1) · · · x(m) ∈ Rnj with entries n1 ,··· ,nj−1 ,nj+1 ··· ,nm

X

i1 ,··· ,ij−1 ,ij+1 ···im

and for ij = 1, 2, · · · , nj , µij =

(

(j−1) (j+1)

(1)

(m)

ij = 1, 2, · · · , nj ,

Ai1 ···ij−1 ij ij+1 ···im xi1 · · · xij−1 xij+1 · · · xim ,

(j)

= 0, if xij > 0,

ξ ij =

(j)

≥ 0, if xij = 0,

(

1,

(j)

if xij > 0, (j)

∈ [−1, 1], if xij = 0.

1 For λ ≥ m kAk2F , from the conclusion just obtained, for any optimal solution (x(1) , · · · x(m) ) of problem (2.2), it holds that kx(i) k1 ≤ 1 and thus kx(i) k2 ≤ 1 for i = 1, 2, · · · , m. Q (i) (j) kx k2 Consider equalities in (2.5) with xij > 0. From the inequalities kAx(1) · · · x(j−1) x(j+1) · · · x(m) kF ≤ kAkF i6=j

and kx(i) k2 ≤ 1 for i = 1, 2, · · · , m, we conclude that

λ ≤ 2kAx(1) · · · x(j−1) x(j+1) · · · x(m) k2 + 2

m Y

i=1,i6=j

kx(i) k22 kx(j) k2 ≤ 2kAkF + 2.

1 Thus, x(j) = 0 if λ > max{2kAkF + 2, m kAk2F }, and hence x(i) = 0 for i = 1, 2, · · · , m by Lemma 2.1. This completes the proof. 

Now, we turn to consider problem (2.1) and have the following result. Lemma 2.3 For the optimal solution of problem (2.1), it holds that kAk2F ≥ kA −

K X

j=1,j6=j0

 (i,j) 2 kF , ◦m i=1 x 4

∀ 1 ≤ j0 ≤ K.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Proof.

Suppose on the contrary, there exists 1 ≤ j0 ≤ K such that kAk2F < kA −

from the nonnegativity of latent factors, one has kA − = kA −

K P

j=1 K P

K P

j=1,j6=j0

 (i,j) 2 ◦m kF . Then i=1 x

m  P (i,j) 2 λ(i) kX(i) k1 ◦m kF + i=1 x i=1

j=1,j6=j0

◦m i=1

x

(i,j)

K Q m m  2 P P Q (i,j) λ(i) kX(i) k1 kx(i,j0 ) k22 + 2 ( m , x(i,j0 ) i) + kF − 2Ax(1,j0 ) · · · x(m,j0 ) + i=1 hx i=1

> kAk2F − 2Ax(1,j0 ) · · · x(m,j0 ) + (i,j0 ) 2 ≥ kA − ◦m kF + i=1 x

m P

i=1

m Q

i=1

kx(i,j0 ) k22 +

λ(i) kx(i,j0 ) k1 .

m P

i=1

i=1

j=1 j6=j0

λ(i) kX(i) k1

¯ (i) = (¯ ¯ (i,2) , · · · , x ¯ (i,K) ), i = 1, 2, · · · , m be such that Let X x(i,1) , x  (i,j) x if j = j0 , ¯ (i,j) = x 0, if j 6= j0 , Then

kA −

K X j=1

m K m X X X   (i,j) 2 ¯ (i) k1 . ¯ (i,j) k2F + λ(i) kX ◦m λ(i) kX(i) k1 > kA − ◦m kF + i=1 x i=1 x i=1

j=1

i=1

This contradicts that (X(1) , · · · , X(m) ) is an optimal solution of problem (2.3) and thus the conclusion follows. Based on Proposition 2.1 and Lemma 2.3, we can obtain the main result of this section.

Proposition 2.2 Under Assumption 2.1, the optimal solution of problem (2.1) is zero provided that (

m Q

1

λ(i) ) m >

i=1

1 max{2kAkF + 2, m kAk2F }.

Proof. Suppose (X(1) , · · · , X(m) ) is the optimal solution of problem (2.1) with regularization factor pair (λ(1) , · · · , λ(m) ). Then for any 1 ≤ j0 ≤ K, (x(1,j0 ) , · · · , x(m,j0 ) ) is the optimal solution of the following problem min

x(i,j0 ) ≥0,i=1,2,··· ,m

By Lemma 2.2,

kA −

m X  m (i,j0 ) 2 (i,j) λ(i) kx(i,j0 ) k1 − ◦ x k + ◦m x i=1 F i=1

K X

i=1

j=1,j6=j0

kAk2F ≥ kA − Recalling Proposition 2.1, we conclude that x conclusion follows.

K X

j=1,j6=j0

(i,j0 )

 (i,j) 2 kF . ◦m i=1 x

= 0, i = 1, 2, · · · , m. From the arbitrariness of j0 , the 

Since problem (2.3) and (2.4) are respectively special case of problems (2.1) and (2.2), from Propositions 2.1 and 2.1, we have the following conclusion as a conquence. Corollary 2.1 Suppose problems (2.3) and (2.4) both have a unique solution, then their optimal solutions are √ zero provided that λx λy > max{2kAkF + 2, 12 kAk2F }.

3.

Conclusions

In this paper, we derived a threshold bound of the regularization factor for l1 form nonnegative matrix/tensor sparse decompositions beyond which the optimal solution of concerned problem is zero. The obtained result improves the norm balancing property for l1 regularization parameters and extends the corresponding results on Lasso function. Certainly, the results provides a theoretical support for the choice of regularization factors in numerical implementations. Acknowledgments. The authors wish to give their sincere thanks to Associate Editor, Patrice Marcotte and the anonymous referees for their valuable suggestions and insight comments which improve the presentation of this paper. This work is partially supported by NSFC (Grant No.11171180), the Scientific Research Innovation Team in Colleges and Universities of Shandong Province and Taishan Scholarship of Shandong Province. 5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

References [1] A. Anagnostopoulos, A. Dasgupta, and R. Kumar, Approximation algorithms for co-clustering, Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, (2008) 201-210. [2] S. Busygin, O. A. Prokopyev, P.M. Pardalos, Biclustering in data mining, Computers and Operations Research, 35(9)(2008) 2964-2987. [3] E.J. Cande’s, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory, 52(2)(2006) 489C509. [4] A. Cichocki, R. Zdunek, A. Phan, and S. Amari, Nonnegative matrix and tensor factorizations, Wiley, 2009. [5] A. d’Aspremont, F. Bach, L. El Ghaoui, Optimal solutions for sparse principal component analysis, J. Machine Learning Research, 9(2008) 1269-1294. [6] D.L. Donoho, For most large underdetermined systems of linear equations, the minimal l1 norm solution is also the sparsest solution, Commun. Pure Appl. Math., 59(6)(2006) 797-829. [7] J.H. Hou, L.P. Chau, M.T. Nadia, Y. He, Scalable and compact representation for motion capture data using tensor decomposition, IEEE Signal Process Letters, 21(3)(2014) 255-259. [8] E. Hyman, P. Kauraniemi, S. Hautaniemi, M. Wolf, et al., Impact of DNA amplification on gene expression patterns in breast cancer, Cancer Research, 62(2002) 6240-6245. [9] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, 1986. [10] A. Li, D. Tuck, An effective tri-clustering algorithm combining expression data with gene regulation information, Gene Regul Syst Bio., 3(2009) 49-64. [11] M. R. Osborne, B. Presnell and B.A. Turlach, On the Lasso and its dual, J. Comput. Graphical Statistics, 9(2)(2000) 319-337. [12] E. E. Papalexakis, N. D. Sidiropoulos, R. Bro, From K-means to higher-way co-clustering: multilinear decomposition with sparse latent factors, IEEE Trans Signal Processing, 61(2)(2013) 493-506. [13] N. D. Sidiropoulos, A. Kyrillidis, Multi-way compressed sensing for sparse low-rank tensors, IEEE Signal Processing Letters, 19(11)(2012) 757-760. [14] Y. Tsaig and D. L. Donoho, Breakdown of equivalence between the minimal l1 -norm solution and the sparsest solution, Signal Process., 86(3)(2006) 533-548. [15] D. M. Witten, R. Tibshirani, T. Hastie, A penalized matrix decomposition with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10(2009) 515-534. [16] X.Y. Zhang, S.Li, Compressed sensing via dual frame based l1 -analysis with weibull matrices, IEEE Signal Process Letters, 20(3)(2013) 265-268. [17] W.M. Zheng, M.H. Xin, X.L. Wang, B.Wang, A novel speech emotion recognition method via incomplete sparse least square regression, IEEE Signal Process Letters, 21(5)(2014)569-572.

6