Low Rank Representation on SPD matrices with Log-Euclidean metric

Low Rank Representation on SPD matrices with Log-Euclidean metric

ARTICLE IN PRESS JID: PR [m5G;July 29, 2017;7:42] Pattern Recognition 0 0 0 (2017) 1–12 Contents lists available at ScienceDirect Pattern Recogni...

2MB Sizes 0 Downloads 62 Views

ARTICLE IN PRESS

JID: PR

[m5G;July 29, 2017;7:42]

Pattern Recognition 0 0 0 (2017) 1–12

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/patcog

Low Rank Representation on SPD matrices with Log-Euclidean metric Boyue Wang b,d, Yongli Hu a,b,∗, Junbin Gao c, Muhammad Ali d, David Tien d, Yanfeng Sun a,b, Baocai Yin a,b,e a

Beijing Advanced Innovation Center for Future Internet Technology, Beijing 100124, China Beijing Municipal Key Lab of Multimedia and Intelligent Software Technology, College of Metropolitan Transportation, Beijing University of Technology, Beijing 100124, China c Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, NSW 2006, Australia d School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW 2795, Australia e College of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116620, China b

a r t i c l e

i n f o

Article history: Received 15 November 2016 Revised 26 June 2017 Accepted 5 July 2017 Available online xxx Keywords: Symmetrical positive definite matrices Log-Euclidean metric Low Rank Representation Subspace clustering

a b s t r a c t Symmetric Positive semi-Definite (SPD) matrices, as a kind of effective feature descriptors, have been widely used in pattern recognition and computer vision tasks. Affine-invariant metric (AIM) is a popular way to measure the distance between SPD matrices, but it imposes a high computational burden in practice. Compared with AIM, the Log-Euclidean metric embeds the SPD manifold via the matrix logarithm into a Euclidean space in which only classical Euclidean computation is involved. The advantage of using this metric for the non-linear SPD matrices representation of data has been recognized in some domains such as compressed sensing, however one pays little attention to this metric in data clustering. In this paper, we propose a novel Low Rank Representation (LRR) model on SPD matrices space with Log-Euclidean metric (LogELRR), which enables us to handle non-linear data through a linear manipulation manner. To further explore the intrinsic geometry distance between SPD matrices, we embed the SPD matrices into Reproductive Kernel Hilbert Space (RKHS) to form a family of kernels on SPD matrices based on the Log-Euclidean metric and construct a novel kernelized LogELRR method. The clustering results on a wide range of datasets, including object images, facial images, 3D objects, texture images and medical images, show that our proposed methods overcome other conventional clustering methods. © 2017 Published by Elsevier Ltd.

1. Introduction In recent years, with the massive unlabeled images being widely generated on the Internet (e.g., Facebook, Flickr, ...), the subspace clustering research is playing a more and more outstanding role in many computer vision applications [5,17,37]. Subspace clustering is to group a set of data drawn from a union of subspaces into their underlying subspaces. To investigate and represent the underlying subspaces structure, many methods have been proposed, such as conventional iterative [44], statistical [16,21], factorization-based algebraic [26,33,35], and spectral clustering [11,13,29,50]. Among these methods, the spectral clustering based methods are of state-of-the-art with good prospects, the key ingredient of which is to construct a ’good’ affinity matrix. ∗

Corresponding author. E-mail addresses: [email protected], [email protected] (B. Wang), [email protected] (Y. Hu), [email protected] (J. Gao), [email protected] (M. Ali), [email protected] (D. Tien), [email protected] (Y. Sun), [email protected] (B. Yin).

Motivated by the success of compressed sensing in exploring underlying structures hidden in data [3,9], one introduces selfexpression regularization to learn a ‘good’ affinity matrix for clustering [17,29,34,54]. The general clustering framework can be written as,

min (Z ) + λ(E ) s.t. X = XZ + E, E ,Z

(1)

where X = [x1 , ..., xN ] ∈ Rd×N denotes a set of sample data;  (E) represents the regularization of the reconstructed error E, which can be chosen as E0 , E2, 1 or E2F depending on practical tasks (see [29]); and (Z) stands for the regularizer on the coefficient matrix Z ∈ RN × N . Sparse Subspace Clustering (SSC) [46] employs the 1 -norm Z1 to obtain the sparsest representation for each data object. However, SSC neglects the existence of inherent joint structure among the representations of a data set, which is very important for clustering applications. Low Rank Representation (LRR) adopts low rank or nuclear norm Z∗ to capture the underlying global structure hidden in data. However, SSC, LRR and other clustering methods are designed to work with vector-valued data. In practice, structured data can

http://dx.doi.org/10.1016/j.patcog.2017.07.009 0031-3203/© 2017 Published by Elsevier Ltd.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR 2

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12

preserve structural or spatial information hidden in data, which have more discriminative power for pattern recognition tasks than vector-valued data [20,48]. As a widely used image feature descriptor derived from a covariance of a set of raw features, SPD matrix has been proved having a Riemannian geometry, which has been successfully used in object recognition [6,23,32], face recognition [28], as well as texture classification [12]. Because of its nonlinearity, most properties and operators in a linear space are not suitable to the space of SPD matrices. As proper distances on the space of SPD matrices, two most popular Riemannian-based geodesic distance measures are Affine-invariant metric (AIM) [39] and Log-Euclidean metric [4,24]. On the SPD manifold, AIM distance between Y1 and Y2 , defined as, /2 /2 dAIM (Y1 , Y2 ) = log(Y−1 · Y2 · Y−1 )F , 1 1

(2)

where Y1 , Y2 ∈ denote SPD n × n matrices;  · F and log denote the Euclidean norm and matrix logarithm, respectively. Due to the curvature of the SPD manifold, AIM metric (2) accumulates a high computational burden. Thus, AIM-based algorithms perform slowly in many cases and are even hard to implement when the size of SPD matrices is large. Additionally, from the view of mathematical analysis, the product/multiplication is not associative on the SPD manifold. Without associativity, the structure group and many fundamental properties disappear [2]. The SPD manifold induced by the Log-Enclidean has Lie group structure, where the manifold has been expanded into a flat space using the matrix logarithm operation. In the flat space, the distance between SPD matrices becomes the conventional Euclidean distance, which improves the computation efficiency. Meanwhile, Log-Euclidean metric still reserves excellent theoretical properties, such as associative [1,2]. Besides, two types of Bregman divergences, namely Jeffrey and Stein divergences, are also widely used to measure the similarities on SPD matrices. Compared with AIM, the Bregman divergence gives an ideal analogical distance with less computation, though it is not the true geodesic distance on SPD manifold [7]. In this paper, we extend the LRR model in Euclidean space onto the space of SPD matrices as LogELRR by utilizing the convenience that Log-Euclidean metric provides. In order to embed the SPD matrices into high-dimensional Hilbert space, we employ a set of kernel functions based on Log-Euclidean metric on the SPD manifold. From this, we further propose a kernelized version of the LogELRR algorithm. n S+

1.1. Related work For spectral clustering algorithms, the most ideal condition is to construct a block diagonal affinity matrix. To achieve this goal, based on classical SSC and LRR, Lu et al. [30] discover Least Squares Regression (LSR) having the grouping effect to group highly correlated data together; and Lu et al. [31] further use trace lasso to simultaneously encourage sparsity and grouping effect for data, which balances between SSC and LSR. Hu et al. [22] theoretically analyze the effectiveness of grouping effect. Feng et al. [14] introduce a graph Laplacian constrain to construct a block diagonal affinity matrix. For SPD matrices, Sra and Cherian [43] used Frobenius norm to measure the similarity between SPD matrices, which neglects the Riemannian geometry of SPD matrices. There are three main schemes to handle the non-linearity of SPD matrices. 1) Differential geometric schemes, which map SPD matrices to the tangent space. Yuan et al. [53] early employ Log-Euclidean metric to measure two SPD matrices for human action recognition. 2) Statistical schemes. Cherian et al. [8] proposed a nonparametric Bayesian framework for clustering SPD matrices by extending the Dirichlet process Mixture Models, which use log-deteminant divergence

to measure SPD matrices. 3) Kernelized schemes, which map SPD matrices to Reproducing Kernel Hilbert Space. This scheme has received the most attention in recent years. Wang et al. [51] firstly defined a linear kernel based on Log-Euclidean metric for SPD matrices. Li et al. [28] further defined polynomial kernel, and exponential kernel based on Log-Euclidean metric. Harandi et al. [19] embedded SPD manifold into RKHS with the aid of two Bregman divergences, namely Stein and Jeffrey divergences. Jayasumana et al. [25] theoretically analyzed the positive definiteness of Gaussian kernels based on manifold-value. Based on the Log-Euclidean metric, Vemulapalli et al. [45] applied multiple kernel learning classifier to exploit more discriminant information between SPD matrices. The purpose of this paper is to perform low rank representation on SPD matrices to explore the low rank structure on the manifold. Fu et al. [15] employ AIM to measure the distance between SPD matrices and construct a new LRR model (AIM-RLRR),

min Z

N N  1 zi Qi zTi + λZ∗ s.t. zi j = 1, i = 1, 2, ..., N, 2 i=1

(3)

j

where zi is the ith row of coefficient matrix Z and Qi = /2 /2 /2 /2 [tr(log(Y−1 Y j Y−1 )log(Y−1 Yk Y−1 ))], and Yi denotes the ith i i i i SPD matrix. When N becomes large (e.g., 10 0 0), the size of Q = [Q1 , ..., QN ] ∈ RN×N×N increases exponentially, which is hard to implement on a regular workstation. 1.2. Contributions Our main contributions in this paper are summarized as: •

• •

Proposing a novel LRR model on SPD matrices space with LogEuclidean metric, namely LogELRR. This is different from the model proposed in [15] where the LRR was implemented in the tangent space of the manifold thus it is a first order approximation to the manifold; Providing kernelized extensions of LogELRR method; and Deriving an optimization problem which has a closed-form solution, making our proposed algorithms faster.

The rest of the paper is organized as follows. In Section 2, we review some necessary properties of SPD matrices. In Section 3, we propose a new LRR model on SPD matrices based on the LogEuclidean metric. In Section 4, we introduce some kernels on SPD matrices based on the Log-Euclidean metric and give a set of kernelized extensions of LogELRR model. In Section 5, the proposed method is tested on several public datasets. Finally, the conclusion is discussed in Section 6. 2. Backgrounds In this section, we briefly introduce some properties of SPD matrices that are necessary to understand our proposed approaches. 2.1. Notations Bold uppercase letters (i.e., X, Y, ...) denote matrices. The corresponding transpose matrices (i.e., XT , YT , ...), and the ith matrices (i.e., Xi , Yj , ...) are also defined. Bold lowercase letters (x, y, ...) stand for vectors and xi is the ith vector or the ith column in matrix X, to be understood according to the context. The jth element in the ith row of matrix X is defined as xij . The italic letters (N, n, ...) mean scalar values or integers. It is also necessary to state frequently usedoperators, where  2 XF and X∗ stand for the Frobenius norm ( i j xi j ) and nuclear norm (the sum of all singular values) of matrix X, respectively. And x2 means the 2 norm of a vector x. Moreover, X,

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12

Y is the inner product of two matrices with the same dimensions and the trace norm tr(XT Y) has the same meaning. Finally, I denotes the identity matrix. Other special notations will be explained when they are used.

2.2.1. Logarithm and exponent of a SPD matrix Any SPD matrix Y ∈ Sn+ has a unique real and symmetric logarithm [2]. Since the existence of an orthogonal basis on which an SPD matrix can be diagonalized, the logarithm (resp., the exponent) has a particularly simple expression by performing logarithm (resp., the exponent) calculation on the eigenvalues as follows,

log(Y ) = R · diag(log(diag())) · RT , exp(Y ) = R · diag(exp(diag())) · RT ,

where R ·  · RT = Y is obtained decomposition on SPD matrix Y.

by

performing

3.1. Problem statement In

Euclidean

space,

(4) Eigen-

min Z

min

2.2.2. Logarithm product Before we show the logarithm product calculation, we firstly observe the product between two SPD matrices Y1 , Y2 ∈ Sn+ defined in AIM [39],

Y1  Y2 = Y11/2 · Y2 · Y11/2 , where  denotes the abstract AIM multiplication. With this multiplication, formula (2) can be easily derived from dAIM = Y−1  1  Y )  . Similarly, Arsigny et al. [2] proposed a Y2 AIM = log(Y−1 2 F 1 logarithm product between SPD matrices as follows, Definition 1 (Logarithm Product). Let Y1 , Y2 ∈ Sn+ , the logarithm product between Y1 and Y2 is defined by

(5)

where  is the new abstract multiplication and this logarithmic product is important for defining Log-Euclidean metric. Li et al. [28] further define the logarithm inner product between SPD matrices as follows, Definition 2 (Logarithm Inner Product). With two SPD matrices Y1 , Y2 ∈ Sn+ , the function from the product space of Sn+ to the space R:

Y1 , Y2 log = tr(log(Y1 )T log(Y2 ))

2.2.3. Log-Euclidean metric (Distance) With the definitions of logarithm product and inner product, and inspired by AIM distance (2), we can simply deduce the LogEuclidean metric between two SPD matrices Y1 , Y2 ∈ Sn+ as, 2 2 −1 −1 dLEM (Y1 , Y2 ) = Y−1 1  Y2 log = Y1  Y2 , Y1  Y2 log

min Z

of

d

dimension

data

N 

 xi −

N 

i=1

zi j x j 22 + λZ∗ .

(8)

j=1

N 

2 dLEM ( Yi ,

N 

i=1

zi j  Y j ) + λZ∗ ,

(9)

j=1

N 

log(Yi ) −

i=1

N 

zi j log(Y j )2F + λZ∗ ,

(10)

j=1

where log(Yi ) represents the logarithmic mapping of SPD matrix Yi . We name this problem LogELRR. 3.2. Optimization To solve the optimization problem (10), we decompose its first term as follows,

 log(Yi ) −

T −1 = tr(log(Y−1 1  Y2 ) log (Y1  Y2 ))

N 

zi j log(Y j )2F

j=1

= tr(log(Y2 )T − log(Y1 )T , log(Y2 ) − log(Y1 ))

 log(Y2 ) − log(Y1 )2F ,

set

2 (·, · ) to measure where we adopt the Log-Euclidean Metric dLEM the distance between the SPD sample Yi and its representation N j=1 zi j  Y j which is at the moment an abstract “linear combination” of all SPD matrices. However, there is no defined scalar multiplication and sum operators on the space of SPD matrices, so how to construct this abstract “linear combination” on the space of SPD matrices is still a problem. To solve this problem, we adopt the similar method in [20]. They construct Sparse Representation on Grassmann manifolds by mapping Grassmann manifolds into Euclidean space. In their method, the abstract ’linear combination’ of Grassmann points is implemented by the linear computation of the mapped points in Euclidean space. Following Harandi’s work, we also map SPD matrices into Euclidean space by the Logarithm map in (4), so the abstract ’lin ear combination’ on the space of SPD matrices Nj=1 zi j  Y j can be translated to the linear combination of the mapped points as N 2 j=1 zi j log (Y j ). Therefore, the difference item with dLEM (·, · ) metric in (9) can be represented as the Euclidean distance log(Yi ) − N 2 j=1 zi j log (Y j )F , and we obtain the final LRR model on SPD as follows,

(6)

is an induced inner product based on the logarithmic map of SPD matrices.

=

a

 The first term xi − Nj=1 zi j x j 2F means the distance between the ith sample xi and its representation, i.e., the linear combination N by all the samples j=1 zi j x j , which also can be denoted by  d2 (xi , Nj=1 zi j x j ), where d2 (, ) is the Euclidean distance. Following this strategy, given the points {Y1 , Y2 , ..., YN } in the SPD matrices space Sn+ , we formulate the ’new’ low rank representation on the space of SPD matrices as follows,

Z

Y1  Y2 = exp(log(Y1 ) + log(Y2 ))

given

{x1 , x2 , ..., xN } ∈ Rd×N , where N denotes the number of samples, the LRR model can be formulated as the following self-expression form with the nuclear norm constraint,

2.2. Log-Euclidean metric on Lie group of SPD matrices



3

(7)

which equals to the geodesic distance [1,2] and logarithm norm Y2log := Y, Ylog . 3. LRR on SPD matrices base on Log-Euclidean metric In this section, we will extend the classic LRR model to the space of SPD matrices with Log-Euclidean metric.

= tr(log(Yi )T log(Yi )) − 2

N 

zi j tr(log(Yi )T log(Y j ))

j=1

+

N  N 

zi j1 zi j2 tr(log(Y j1 )T log(Y j2 )).

(11)

j1 =1 j2 =1

For convenience, we define an auxiliary symbol δi j = tr(log(Yi )T log(Y j )) to represent the inner product of logarithmic SPD matrices. Then a matrix reflecting the similarity of all

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR 4

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12

logarithmic SPD matrices is obtained

 = [δi j ]Ni=1, j=1 . After some simple manipulation, the objective function (10) can be rewritten as

min −2tr(Z) + tr(ZT Z ) + λZ∗ .

(12)

Z

It has been demonstrated that  is a positive semi-definite matrix [47]. Consequently, we have a spectral decomposition of  given by

 = UDUT ,

where UT U = I and D = diag(σi ) with non-negative eigenvalues σ i . So, the optimization problem (10) can be converted to

min Z 2 −  2 2F + λZ∗ . 1

1

(13)

Z

There exists a closed form solution to the optimization problem (13) following Favaro et al. [13], which is given by the following proposition, Proposition 1. Given that  = UDUT as defined above, the solution to (13) is given by

Z∗ = UDλ UT ,

where Dλ is the diagonal matrix with its ith element defined by



Dλ (i, i ) =

1 − σλi 0

i f σi > λ, otherwise.

Now we have obtained the optimal representation Z∗ which can ∗

∗ T

be used to construct an affinity matrix via W = |Z |+2|Z | . Such a generated affinity matrix can be an input to a spectral clustering algorithm to get the final clustering result. Since NCut is widely employed in many spectral clustering methods [17,29,37,49], we also employ it in this paper. 3.3. Log-Euclidean data representation method In this paper, we mainly consider image related clustering problems. First we explain how to represent an image related object by using the point on the SPD manifold. Let each image be of dimension a × b. If fi ∈ Rn denotes the feature vector for ith pixel, we can get a feature matrix F = {f1 , ..., f(a×b) } ∈ Rn×(a×b) for each image and the corresponding SPD (covariance) matrix Y ∈ Rn×n . We apply Eigen-decomposition on the SPD matrix Y i.e., Y = UDUT . Then the logarithm of diagonal matrix D can be used to construct logarithmic mapping data by log(Y ) = Ulog(D )UT , where log(D ) = diag(log(σi )) for each singular value σ i . Since a × b is much larger than n, Y is in general close to full rank, which means all σ i will not be zero. 4. Kernelized LRR on SPD matrices base on Log-Euclidean metrics In this section, we consider the kernelization of the LogELRR model and give several feasible kernelising forms. 4.1. Kernels on SPD Matrices base on Log-Euclidean metric The above LogELRR method can be regarded as a typical kernelising method to process the non-linear data of SPD matrices with the Log-Euclidean Metric. To further explore the appropriate metric of SPD matrices space, we extend the LogELRR method to a generalized kernel framework. In this case, an effective way is to map (called feature mapping) the data into a feature space, normally a Hilbert (inner-product) space, where the mapped data may be “linearly” separated. For any given feature mapping from Rd to

Hilbert space φ : Rd → H, a kernel function can be defined as the inner-product of mapped data κ (x1 , x2 ) = φ (x1 ), φ (x2 )H , where ·, ·H is the inner product in Hilbert space. The kernel function satisfies a number of nice properties, see [36,41]. Knowing kernel function expression can avoid the computation of the explicit and expensive mapping of data into the high-dimensional feature space. It is easier to construct kernel functions with Mercer properties [41], thus the feature mapping φ can be implicitly determined by a kernel function. The commonly used kernel functions include Polynomial kernel (κ pn (x1 , x2 ) = (x1 , x2  + c )n , c ≥ 0), Exponential kernel (κen (x1 , x2 ) = exp(x1 , x2  + c )n ) and Gaussian kerx −x 2

nel (κg (x1 , x2 ) = exp(− 1 c 2 2 )). For the SPD matrices forming a Riemannian manifold, it is natural to map them into a new feature space to seek desired linear properties. The key issues are to find a proper mapping and to define the corresponding inner product. If we let the mapping φ defined as the Logarithm operator in (4), we can define a kernel function based on the inner product definition in (6) as follows,

κ (Y1 , Y2 ) = φ (Y1 ), φ (Y2 )H = log(Y1 ), log(Y2 )H = tr((log(Y1 )T log(Y2 )). From this view, we can regard the proposed LogELRR method as a type of kernel method. In addition, inspired by the work of [28], we further explore the generalized kernel method of SPD matrix and extend this kernel method to different forms based on LogEuclidean metric for different applications, shown in the following definition: Definition 3. Let Y1 , Y2 ∈ Sn+ , β be the variance and pn be a polynomial of degree n ≥ 1 with positive coefficients, three kind of SPD matrix kernels based on Log-Euclidean metrics are defined as follows, 1) Log-E polynomial kernel

κ pn (Y1 , Y2 ) = pn (tr(log(Y1 )T log(Y2 ))) 2) Log-E exponential kernel

κen (Y1 , Y2 ) = exp( pn (tr(log(Y1 )T log(Y2 )))) 3) Log-E Gaussian kernel

κg (Y1 , Y2 ) = exp(− log(Y1 ) − log(Y2 )2F ) For better flow of the paper, we move the proof of positive definiteness of above kernel functions to Appendix. 4.2. Kernelized LRR on SPD matrices base on Log-Euclidean metrics We take a specific mapping φ ( · ) which maps the SPD matrix into Hilbert space. Then, we can construct the low rank representation of SPD matrix in the Hilbert space formulated as the following problem:

min Z

N  i=1

φ ( Yi ) −

N 

zi j φ (Y j )22 + λZ∗ ,

(14)

j=1

By using a similar derivation of LogELRR in (11), we can easily derive the following generalized kernel method of SPD matrix with respect to the model (14),

min −2tr(ZK ) + tr(ZKZT ) + λZ∗ , Z

(15)

where K is an N × N kernel matrix, K = (κi j )N with element i=1, j=1 κi j = κ (Yi , Y j ). If we take the exponential kernel, Gaussian kernel and polynomial kernel in Definition 3, we can gain a set of new kernelized models, denoted by LogExpLRR, LogGaussLRR and LogPloyLRR, respectively.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12

Clearly the symmetric matrix K is positive semi-definite. Therefore, problem (15) can be also re-written as

min ZK 2 − K 2 2F + λZ ∗ , 1

1

Z

(16)

1

where K 2 is the square root matrix of kernel matrix K. A closed solution to problem (16) is obtained in the same way as that of problem (13). The whole algorithm of the kernelized LRR on SPD matrices is summarized in the following Algorithm 1. Algorithm 1 The kernelized LRR on SPD matrices. Input: A set of samples {Yi }N , where SPD matrix Yi ∈ Sn+ , with N i=1 denoting the number of samples; the balancing penalty parameter λ and a kernel function κ (·, · ). Output: The Low-rank representation Z 1: Logarithmic mapping: 2: for i=1:N do UDUT ← SVD(Yi ) 3: 4: log(Yi ) ← Ulog(D )UT ; 5: end for 6: 7: 8: 9: 10: 11: 12:

Constructing kernels on SPD matrices: for i=1:N do for j=1:N do κi j = κ (log(Yi ), log(Y j )); end for end for

13:

Performing SVD on kernel matrix K: UDUT ← SVD(K) 15: Calculating the coefficient matrix Z by Z ← UDλ UT 14:

4.3. Computation complexity In terms of computational complexity, we define the rank of coefficient matrix Z by r, the dimension of SPD matrix Yi by n × n and the number of samples by N. Throughout Algorithm 1, the main computational complexity of our proposed method can be divided into three parts: Logarithmic mapping log( · ) (step 2–5), constructing Log-Euclidean kernels K (step 8–12) and the solution to the algorithm part (step 14–15). In the first part, we decompose each SPD matrix Yi ∈ Rn × n by SVD to construct the logarithmic data and the computational complexity is O (Nn3 ); In the second part, neglecting the complexity of kernel functions, the complexity of generating kernel matrix K is O (N 2 ); In the last part, we perform a partial SVD method to solve the final coefficient matrix Z, whose computation is O (r N 2 ). Overall, the computational complexity of our proposed method (14) is O (Nn3 ) + O ((r + 1 )N 2 ). 5. Experiments In this section, we conduct clustering experiments on several widely used datasets to evaluate the effectiveness of our proposed methods. The datasets used in our experiments are: ETH80 dataset,1 Feret face dataset,2 3D dataset,3 Brodatz texture dataset,4 KTH-tips texture dataset,5 Virus dataset.6

5

5.1. Experimental setups To fairly compare and contrast the performance of our proposed methods, we implement two groups of state-of-the-art clustering methods: (1) Linear Clustering methods: • •

• •

Standard Low Rank Representation (LRR) [29] Robust Subspace Segmentation by Simultaneously Learning Data Representations (RSS) [17] K-means Normalized Cut (NCut) [42] (2) Nonlinear Clustering methods:

• • •





Sparse Manifold Clustering and Embedding (SMCE) [10] Latent Space Sparse Subspace Clustering (LS3C) [38] Riemannian LRR based on the Affine Invariant Metric (AIMRLRR) [15] Riemannian LRR based on the kernel defined by the Jeffrey divergence (J-RLRR), referencing to Harandi et al. [18]. Riemannian LRR based on the kernel defined by the Stein divergence (S-RLRR), referencing to Harandi et al. [18].

LRR, as a baseline, is to prove the bright performance of LRR in solving clustering problems. RSS simultaneously learns the representations of data and an affinity matrix from the representations, which are both used for clustering and we report the best experimental result for each case to compare with. Both the Kmeans and NCut are the most popular clustering methods. As a manifold-based clustering method, SMCE utilizes the local manifold structure to find a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. LS3C learns the projection of data and finds the sparse coefficients in the low-dimensional latent space. AIM-RLRR, as another important baseline, is similar with our proposed method LogELRR, which employs AIM to measure the distance between SPD matrices. The clustering performance is evaluated by four standard measurements including Accuracy (ACC), Normalized Mutual Information (NMI) [27], Rand index (RI) [40] and Purity (PUR) [52]. They are employed to assess different aspects of a given clustering algorithm: Accuracy reflects the percentage of correctly labeled samples; NMI calculates the mutual dependence of the predicted clusters and the ground-truth partitions from the informationtheoretic perspective; RI evaluates true positives within clusters and true negatives between clusters; PUR measures the accuracy of the dominating class in each cluster. The parameters of our proposed model should be properly set. Among these parameters, λ is the most important penalty parameter for balancing the nuclear term and reconstruct term. Based on a few preliminary experiments, we observed that the optimal value of λ depends on the properties of the data, e.g., the favored λ tends to smaller when the noise level in data is lower while λ will become larger if the noise level is higher. Thus we empirically tune it within a range from 0.1 to 20. Following the works in [28], kernel parameters in all our experiments are fixed to pn (x ) = x50 for Log-E polynomial kernel and Log-E exponential kernel, and β = 2 × 10−2 for Log-E Gaussian kernel. All the algorithms are coded in Matlab 2014a and implemented on an Intel Core i7-4600M 2.9GHz CPU machine with 8G RAM. 5.2. Object dataset clustering

1 2 3 4 5 6

https://www.mpi-inf.mpg.de/departments. http://www.nist.gov/itl/iad/ig/feret.cfm. http://rgbd-dataset.cs.washington.edu/. http://www.ux.uis.no/∼tranden/brodatz.html. http://www.nada.kth.se/cvap/databases/kth-tips/. http://www.cb.uu.se/∼gustaf/virustexture/.

We evaluate the proposed methods on the object clustering problem using ETH80 dataset. This dataset contains 80 objects from 8 categories, with 41 images of each object obtained from different viewpoints (see Fig. 1). This is a challenge dataset due to the view changes and pose changes in the intra-class objects.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR 6

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12 Table 2 Running time (seconds) for AIMLRR and LogELRR algorithms. Run Time

ETH-80

Feret 10

Feret 20

Feret 30

AIM-RLRR LogELRR

101.91 0.49

9.51 0.37

50.15 0.66

93.40 0.77

Fig. 1. Some samples (apple, cup, cow, car, dog, horse, pear, tomato) from ETH80 dataset. Table 1 Clustering results (in %) on the ETH-80 dataset. The figures in boldface are the best performance among all the compared methods. Measurements

ACC

NMI

RI

PUR

LRR RSS K-means NCut SMCE LS3C

72.87 71.65 92.07 75.61 82.01 78.96

72.17 76.69 88.48 78.50 91.67 74.85

89.62 91.78 96.49 93.17 95.77 92.01

75.30 74.70 92.07 79.88 86.89 78.96

AIM-RLRR J-RLRR S-RLRR

61.89 85.06 80.79

52.65 80.56 83.37

86.95 93.39 94.13

61.89 85.06 82.62

LogELRR LogExpLRR LogGaussLRR LogPloyLRR

93.29 96.34 97.26 95.44

90.69 93.12 94.97 91.83

97.10 98.26 98.69 97.92

93.29 96.34 97.26 95.43

Fig. 2. Some samples from Feret face dataset.

matrices. The proposed three kernelized LogELRR, especially Gaussian kernel LogGaussLRR receives the better clustering results. It is worth mentioning that the kernelized scheme emerges its advantage over the Log-Euclidean metrics on SPD matrices. We also focus on which kernel is more useful for different tasks. From Table 2, we can find LogELRR is more than 200 times faster than AIMLRR. 5.3. Feret face dataset

In this experiment, we choose a subset comtaining 8 objects (1 object from each of the 8 categories) with full sampling (41 views) per object from the ETH-80 dataset. Following the principle of Cherian and Sra [6], we decide to extract texture features, color features and gradient features to construct the covariance matrices for describing objects. To generate texture features, three common filters are chosen: H1 = [1 2 1]T , H2 = [−1 0 1]T and H3 = [−1 2 − 1]. Then, combining these three filters as a filter bank:



Lbank = H1 H1T , H1 H2T , H1 H3T , H2 H1T , H2 H2T , H2 H3T , H3 H1T , H3 H2T ,



T H3 H3T .

After applying the filter bank Lbank on the pixel at position (x, y), we can get a 9-Dimension vector Fbanks which stands for the texture information. Then, we can describe each pixel by

FET H80 (x, y ) = [ Fbanks , x, y, r, g, b, Ix , Iy , ILoG ,



Ix2 + Iy2 ]T ,

where x and y are spatial coordinates; r, g and b denote the color information at position (x, y), respectively; |Ix | and |Iy | are magnitude of 1-order gradients along x and y directions; ILoG (x, y) which means the Laplacian of Gaussian filter is used for edge detection. Then, each pixel can be denoted by a 18-Dimension vector. These common texture feature, color feature and gradient feature extraction methods will be employed repeatly in the following experiments. So each object image is described by a 18 × 18 covariance matrix. Table 1 lists the comparison results on ETH80 dataset with state-of-the-arts clustering methods. The bold number in each column represents the best result for the corresponding standard measurement. Classic LRR has unsatisfactory performance, while our proposed method LogELRR achieves notable performance improvement, we owe to the use of Log-Euclidean metric on SPD

Face recognition is one of the hottest topics in computer vision and pattern recognition area. We choose Feret face dataset for clustering experiment. This dataset also has challenges due to rich expression changes and view changes. As suggested in the work [19], the ’b’ subset of Feret facial dataset is used to test the clustering performance of our proposed methods, which includes 1400 images from 200 subjects. Each subject has 7 images captured in different directions and expressions, shown in Fig. 2. The size of the facial images is 64 × 64. To generate a covariance matrix to describe face data, we define the following 43-Dimension feature vector:

FF eret (x, y ) = [I (x, y ), x, y, |G0,0 (x, y )|, · · ·, |G4,7 (x, y )| ]T , where |Gu, v (x, y)| is the magnitude of a 2D Gabor wavelet feature at the position (x, y) with orientation u and direction v. Here, we set the orientation and direction are 5 and 8, respectively. Finally, we obtain a 43 × 43 covariance matrix for an image. We select 10, 20 and 30 clusters as sample sets which contain 70,140 and 210 face images, respectively. The overall clustering results are presented in Table 3. Actually, the case of 30 classes is quite difficult for any standard clustering algorithms, however, our proposed methods maintained comparable experimental results. In Table 2, we can find that our proposed method, LogELRR, is at least 30 times faster than AIMLRR for different classes. 5.4. 3D object dataset Different from the above experiments, we assess the performance of the proposed methods for object clustering tasks on 3D point cloud data, i.e., 3D object RGB-D dataset. This dataset is recorded by a Kinect machine and is composed of 300 common household objects which are organized into 51 categories. There are in total 15K frames. Some samples from RGB-D dataset are shown in Fig. 3. This 3D dataset has challenges with view point changes and its 3D coordinate representation.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12

7

Table 3 Clustering results (in %) on the Feret Face dataset. The figures in boldface are the best performance among all the compared methods. Cluster Num

10

20

30

ACC

NMI

RI

PUR

ACC

NMI

RI

PUR

ACC

NMI

RI

PUR

LRR RSS K-means NCut SMCE LS3C

45.71 65.71 72.86 65.71 74.29 61.43

46.07 68.34 74.48 74.56 76.09 68.46

85.71 91.18 92.63 91.55 92.96 89.23

48.57 67.14 72.86 68.57 74.29 64.29

35.00 51.43 60.71 70.00 77.86 50.00

48.51 64.93 74.78 80.15 83.47 64.51

91.78 94.08 95.23 96.07 96.75 92.66

37.86 54.29 61.43 71.43 77.86 52.14

29.05 46.19 55.71 62.38 71.90 38.57

50.52 63.98 73.90 78.83 81.09 61.53

94.02 95.57 96.18 96.84 97.32 93.49

30.48 48.57 56.67 66.19 73.33 40.95

AIM-RLRR J-RLRR S-RLRR

72.86 94.29 88.57

80.96 93.55 93.08

92.67 97.93 97.18

75.71 94.29 88.57

71.43 90.00 84.29

83.10 94.84 91.88

95.99 98.67 98.21

72.86 90.71 86.43

68.57 81.43 80.00

85.43 89.57 89.87

97.43 98.35 98.32

71.90 82.86 82.38

LogELRR LogExpLRR LogGaussLRR LogPloyLRR

100 88.57 80.00 91.43

100 85.22 83.29 88.45

100 94.82 94.91 96.15

100 88.57 80.00 91.43

94.29 80.00 83.57 87.14

97.16 86.54 87.98 90.42

99.24 96.26 97.73 97.64

94.29 81.43 83.57 87.86

88.10 76.67 75.24 82.38

93.85 88.04 84.70 87.85

98.97 97.33 97.73 97.94

88.57 78.10 76.19 82.86

Table 4 Clustering results (in %) on the RGBD dataset. The figures boldfaced give the best performance among all the compared methods. Cluster Num

10

15

ACC

NMI

RI

PUR

ACC

NMI

RI

PUR

LRR RSS K-means NCut SMCE LS3C

45.00 44.40 35.00 41.33 41.13 30.79

45.07 33.26 38.58 42.84 50.49 30.83

84.59 63.04 82.59 86.12 82.76 81.66

51.33 48.48 41.33 47.67 50.17 37.92

38.95 41.87 32.53 38.58 40.97 28.92

43.33 55.53 40.89 44.75 55.30 34.36

88.59 90.67 86.78 90.11 86.59 87.70

45.32 47.97 40.13 46.50 50.34 35.03

J-RLRR S-RLRR

33.50 60.92

33.10 68.53

72.42 89.06

40.92 70.00

28.03 52.68

35.06 64.37

81.07 91.36

37.32 62.66

LogELRR LogExpLRR LogGaussLRR LogPloyLRR

51.42 67.00 56.50 63.67

51.28 74.05 69.79 72.58

83.87 91.64 90.04 90.90

55.33 73.29 67.50 71.71

46.95 60.42 55.74 60.45

55.54 71.40 67.24 70.52

90.66 93.34 92.61 93.10

55.79 69.39 64.03 68.45

tures of data. The great performance of our proposed LogELRR and its kernel extensions verify the key roles of logarithmic SPD matrices and SPD matrix kernels based on Log-Euclidean metric, respectively. It also should be noted that exponent-kernel-based LogEXPLRR wins the highest performance in this experiment. Fig. 3. Some samples from RGB-D dataset.

Consulting to the experiment setting suggested in Cherian and Sra [6], we choose a subset of 10 & 15 categories and each category collects about 250 to 350 frames. As for each frame, the object was segmented out and each 3D cloud point (same as the pixel in an image) is mapped into a 18-Dimension feature vector as follows:

FRGBD (x, y ) = [ x, y, z, r, g, b, Ix , Iy , Ixx , Iyy , Ixy , Im , Dx , Dy , Dm , nx , ny , nz ]T . In the above feature, (x, y, z) is the 3-Dimension spatial coordinate. The 2-order gradient for the pixel at position (x,y) can be repre2 2 2 sented by Ixx = ∂∂x2I , Iyy = ∂∂y2I and Ixy = ∂∂x∂Iy . Im is the magnitude of the intensity gradient. Dx , Dy and Dm represent gradients over the depth-maps. And finally nx , ny and nz represent the surface normal at the given 3D point. We merge all features in each frame to form a covariance matrix size of 18 × 18. Table 4 shows the performance of all the studied methods for the task of 3D object clustering. We evaluate the performance of the competitors on the tasks with different number of subjects including 10 and 15. Classic LRR model beats other compared methods, which demonstrates its capability in revealing subspace struc-

5.5. Texture dataset In this experiment, two texture datasets, Brodatz and KTHTIPS2-b, are used to test the texture clustering performance of our proposed methods. These two datasets preserve rich textural contents, which are challenges for clustering methods. 1) Brodatz texture dataset This dataset includes 112 texture images which have different background intensities. Following the work of Harandi et al. [19], we select 16 images which have the distribution of texture as 16 categories and sample 64 patches of size 32 × 32 from each original image of size 256 × 256. Finally, we obtain 1024 patches used in our experiment. Some origin images from Brodatz dataset are shown in the first row of Fig. 4. We define a 5-dimension feature descriptor to compute the pixel feature as done by Cherian and Sra [6]:

FRGBD (x, y ) = [x, y, I (x, y ), Ix , Iy ]T , where I(x, y) represents the corresponding intensity information; thus, a 5 × 5 covariance matrix can be easily generated for each texture image. Table 5 compares the performance of the proposed methods against the state-of-the-art clustering methods. In contrast to object datasets, the texture dataset is more of a challenge due to the

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR 8

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12 Table 6 Clustering results (in %) on the KTH-TIPS2-b dataset. The figures in boldface are the best performance among all the compared methods.

Fig. 4. Some samples about texture samples. First row from Brodatz dataset and second row from KTH-TIPS2-b dataset.

Measurements

ACC

NMI

RI

PUR

LRR RSS K-means NCut SMCE LS3C

79.55 48.65 49.07 71.30 77.44 47.81

78.93 55.78 63.32 70.63 81.83 49.82

94.62 88.72 87.43 92.67 94.64 86.82

79.55 50.67 55.89 71.30 81.57 47.98

J-RLRR S-RLRR

87.54 82.41

85.23 83.72

96.20 95.28

87.54 84.85

LogELRR LogExpLRR LogGaussLRR LogPloyLRR

91.75 83.59 81.99 85.69

88.36 83.92 84.45 87.84

97.32 95.48 95.77 96.41

91.75 83.59 84.93 86.45

Table 5 Clustering results (in %) on the Brodatz texture dataset. The figures in boldface are the best performance among all the compared methods. Measurements

ACC

NMI

RI

PUR

LRR RSS K-means NCut SMCE LS3C

67.09 43.65 59.96 60.35 59.98 35.16

66.04 54.79 65.64 68.44 68.91 40.38

94.48 91.24 93.17 93.81 93.66 87.49

67.48 50.49 62.89 65.04 66.31 36.33

J-RLRR S-RLRR

73.05 63.67

71.40 70.47

95.21 94.14

74.12 69.24

LogELRR LogExpLRR LogGaussLRR LogPloyLRR

68.36 73.54 74.02 73.24

67.64 75.14 74.96 73.70

94.68 95.75 95.34 95.73

69.53 74.41 75.00 74.02

fact that there are fewer differences among images. Obviously, our proposed LogELRR and exponent-kernel-based LogEXPLRR methods are superior to all compared methods, which demonstrates the positive effect of Log-Euclidean metric again, and choosing a proper kernel according to practical tasks is very important. 2) KTH-TIPS2-b texture dataset This dataset has 11 clusters with 4 objects per cluster, in total 4,397 images. Each object is photographed in 9 scales, 4 different illumination conditions and 3 poses (9 × 4 × 3 = 108 images). We resize each image into 128 × 128 pixels. Fig. 4 shows some samples from KTH-TIPS2-b texture dataset. In our experiment, we choose one object with all 108 images from each cluster; therefore, the sample set is 108 × 11 = 1188 images. Following the regularization of Harandi et al. [18], we extract the 23-dimension feature vector:

FKT H (x, y ) = [r, g, b, |G0,0 (x, y )|, · · ·, |G3,4 (x, y )|]T , where |Gu, v (x, y)| is the 2D Gabor filter as defined as before and we generate 20 Gabor filters in 4 orientations and 5 directions. At these settings, we can obtain a 23 × 23 covariance matrix for each image. In Table 6, the overall performance of our proposed methods is compared against six other clustering methods. This texture dataset with more distinctive features is a bit easier than the previous Brodatz dataset from Fig. 4; thus the experimental results as a whole are better than the Brodatz texture dataset’s. Although all our proposed methods, including the kernel extensions, outperform other compared methods in every standard measurement, LRR is also excellent in compared methods. An interesting phenomenon is that all kernel extensions of LogELRR fail to LogELRR. We should seek a more appropriate kernel for this dataset, but Log-Euclidean metric has also been proven to be a ’good’ solution.

Fig. 5. Some samples from Virus dataset.

Table 7 Clustering results (in %) on the Virus dataset. The figures in boldface are the best performance among all the compared methods. Measurements

ACC

NMI

RI

PUR

LRR RSS K-means NCut SMCE LS3C

24.13 28.60 27.73 33.53 38.53 31.53

22.53 28.66 28.73 32.90 36.13 31.01

86.62 88.76 88.57 89.47 89.66 89.01

29.40 31.13 31.67 36.53 40.93 32.80

J-RLRR S-RLRR

33.80 40.67

31.27 38.36

89.44 90.18

37.33 42.40

LogELRR LogExpLRR LogGaussLRR LogPloyLRR

42.87 44.40 42.07 44.13

40.48 42.20 40.91 41.86

90.81 90.48 90.38 90.48

45.93 47.87 44.60 47.53

5.6. Medical dataset We further test our proposed methods on medical images, i.e., a virus dataset. This dataset is composed of transmission electron microscopy (TEM) images of 15 virus types. Each class of virus contains 100 unique virus images sized 41 × 41. Some virus images are shown in Fig. 5. This is also a very challenging dataset since the virus texture and structure are always similar. Following the regularization of Harandi et al. [18], we compute the 25-Dimension feature vector at each pixel (x, y) of an image:

FV irus (x, y ) = [ I (x, y ), Ix , Iy , Ixx , Iyy , |G0,0 (x, y )|, · · ·, |G4,5 (x, y )| ]T , then, we can use a covariance matrix to represent a virus image and the matrix size is 25 × 25. In Table 7, the overall performance of our proposed methods is compared against six clustering methods. Similar to the previous experimental results, our proposed methods outperform all compared methods and the exponentkernel-based LogExpLRR method is the best.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

ARTICLE IN PRESS

JID: PR

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12

9

m

T i, j=1 ci c j tr (log (Yi ) log (Y j )) ≥ 0 for all m ∈ N, {Y1 , ..., Ym } ⊆ X and {c1 , ..., cm }⊆R. Now,

6. Conclusion In this paper, we propose a novel LRR model on SPD matrices based on Log-Euclidean metric, namely LogELRR, which overcomes the high-computation limitation of the current AIM-based method and achieves dramatic implementation efficiency. Additionally, we embed SPD matrices into Hilbert space based on Log-Euclidean metric, so the LogELRR model is generalized to its kernelized version and three different kernels (Gaussian kernel, exponential kernel and polynomial kernel) are also employed. The optimization problem of the generalized LogELRR model is formulated as a unified framework and an effective closed-form solution is obtained. The experimental results, compared to the state-of-the-art clustering algorithms, have demonstrated the excellent advantages of the proposed methods. Acknowledgments The research project is supported by the Australian Research Council (ARC) through the grant DP140102270 and also partially supported by National Natural Science Foundation of China under Grant No. 61390510, 61672071, 61632006, 61370119, Beijing Natural Science Foundation No. 4172003, 4162010, 4152009, Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality No.IDHT20150504. Jing-Hua Talents Project of Beijing University of Technology.

m  i, j=1



m 

= =

c jY j

j=1

m 

log

ci Yi 2log ≥ 0.

According to Lemma 1, we just need to prove that m 2 i, j=1 ci c j log (Yi ) − log (Y j )F ≤ 0 for all m ∈ N, {Y1 , ..., Ym } ⊆ X m and {c1 , ..., cm }⊆R with i=1 ci = 0. Now, m 

ci c j log(Yi ) − log(Y j )2F

i, j=1 m 

m 

+

Definition 4 [25]. Let X be a nonempty set. A kernel f : X × X → R is called positive definite if it is symmetric and

= −2

cj

m 

ci Yi , Yi log − 2

i=1

ci

i=1

m 

m 

ci c j Yi , Y j log

i, j=1

c j Y j , Y j log

j=1 m 

ci c j Yi , Y j log

i, j=1

= −2

m 

ci Yi 2log ≤ 0.

i=1

i, j=1

for all m ∈ N, {x1 , ..., xm } ⊆ X and {c1 , ..., cm }⊆R. While the kernel f is called negative definite if it is symmetric and m 



i=1

Before we prove that the presented kernel functions in Definition 3 are positive definite, let us firstly review some general properties of positive and negative definite kernels.

ci c j f ( x i , x j ) ≥ 0

m 

So, Log-E polynomial kernel is positive definite. 2) Log-E exponential kernel κen (Y1 , Y2 ) = exp( pn (tr(log(Y1 )T log(Y2 )))) Through Lemma 1 and pn (tr(log (Y1 )T log (Y2 ))) is positive definite, we can easily see that Log-E exponential kernel is positive definite. 3) Log-E Gaussian kernel κg (Y1 , Y2 ) = exp(− log(Y1 ) − log(Y2 )2F )

j=1

m 

ci Yi ,

i=1

=

Appendix

ci c j tr(log(Yi )T log(Y j ))

Therefore, Log-E Gaussian kernel is positive definite. References

ci c j f ( x i , x j ) ≤ 0

i, j=1

for all m ∈ N, {x1 , ..., xm } ⊆ X and {c1 , ..., cm }⊆R with

m i

ci = 0.

Lemma 1. Let X be a nonempty set and f : X × X → R be a kernel. The kernel exp(−γ f (x, y )) is positive definite for all γ > 0 if and only if f is negative definite. Proof. Please refer to the Theorem 5.2 of [25].



Lemma 2. Let X be a nonempty set and f : X × X → R be arbitrary  n positive definite kernels, if g(z ) = ∞ n=0 an z is holomorphic (analytic) in its domain and an ≥ 0 for all n ≥ 0, then the composed function g◦f is a positive definite kernel. Proof. Please refer to the Proposition 1 of [28]. Now, let us prove the presented Definition 3 are positive definite.



kernel

functions

in

1) Log-E polynomial kernel κ pn (Y1 , Y2 ) = pn (tr(log(Y1 )T log(Y2 ))) Based on Lemma 2, we only need to prove that tr(log (Y1 )T log (Y2 )) is positive definite, that is

[1] V. Arsigny, P. Fillard, X. Pennec, N. Ayache, Log-euclidean metrics for fast and simple calculus on diffusion tensors, Magn. Reson. Med. 56 (2006) 411–421. [2] V. Arsigny, P. Fillard, X. Pennec, N. Ayache, Geometric means in a novel vector space structure on symmetric positive definite matrices, SIAM J. Matrix Anal. Appl. 29 (1) (2007) 328–347. [3] E. Candés, J. Romberg, T. Tao, Robust uncertainty principals: Exact signal reconstruction from high incomplete frequency information, IEEE Trans. Image Process. 52 (2) (2006) 489–509. [4] R. Caseiro, J.F. Henriques, P. Martins, J. Batista, A nonparametric Riemannian framework on tensor field with application to foreground segmentation, in: International Conference on Computer Vision, 2011. [5] L. Chen, Y. Zhou, D.M. Chiu, Analysis and detection of fake views in online video services, ACM Trans. Multim. Comput. Commun. Appl. 11 (2) (2015) 1–20. [6] A. Cherian, S. Sra, Riemannian sparse coding for positive definite matrices, in: European Conference on Computer Vision, 2014. [7] A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos, Jensen–Bregman logdet divergence with application to efficient similarity search for covariance matrices, IEEE Trans. Pattern Anal. Mach. Intell. 35 (9) (2013) 2161–2174. [8] A. Cherian, V. Morellas, N. Papanikolopoulos, Bayesian nonparametric clustering for Positive Definite matrices., IEEE Trans. Pattern Anal. Mach. Intell. 38 (5) (2016) 862–874. [9] D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 56 (4) (2006) 1289–1306. [10] E. Elhamifar, R. Vidal, Subspace clustering, IEEE Signal Process. Mag. 28 (1) (2011) 52–68. [11] E. Elhamifar, R. Vidal, Sparse subspace clustering: algorithm, theory, and applications, IEEE Trans. Pattern Anal. Mach. Intell. 35 (1) (2013) 2765–2781.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

JID: PR 10

ARTICLE IN PRESS

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12

[12] M. Faraki, M. Harandi, F. Porikli, Material classification on symmetric positive definite manifolds, in: IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 749–756. [13] P. Favaro, R. Vidal, A. Ravichandran, A closed form solution to robust subspace estimation and clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1801–1807. [14] J. Feng, Z. Lin, H. Xu, S. Yan, Robust subspace segmentation with block-diagonal prior, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014. [15] Y. Fu, J. Gao, X. Hong, D. Tien, Low rank representation on Riemannian manifold of symmetrical positive definite matrices, in: SIAM Conferences on Data Mining (SDM), 2015, pp. 316–324. [16] A. Gruber, Y. Weiss, Multibody factorization with uncertainty and missing data using the EM algorithm, in: IEEE Conference on Computer Vision and Pattern Recognition, I, 2004, pp. 707–714. [17] X. Guo, Robust subspace segmentation by simultaneously learning data representations and their affinity matrix, in: International Joint Conference on Artificial Intelligence, 2015. [18] M. Harandi, M. Salzmann, F. Porikli, Bregman divergences for infinite dimensional covariance matrices, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014. [19] M. Harandi, R. Hartley, B. Lovell, C. Sanderson, Sparse coding on symmetric positive definite manifolds using Bregman divergences, IEEE Trans. Neural Networks Learn. Syst. 27 (6) (2015) 1294–1306. [20] M.T. Harandi, C. Sanderson, C. Shen, B. Lovell, Dictionary learning and sparse coding on Grassmann manifolds: an extrinsic solution, in: International Conference on Computer Vision, 2013, pp. 3120–3127. [21] J. Ho, M.H. Yang, J. Lim, K. Lee, D. Kriegman, Clustering appearances of objects under varying illumination conditions, in: IEEE Conference on Computer Vision and Pattern Recognition, 1, 2003, pp. 11–18. [22] H. Hu, Z. Lin, J. Feng, J. Zhou, Smooth representation clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014. [23] Z. Huang, R. Wang, S. Shan, X. Li, X. Chen, Log-euclidean metric learning on symmetric positive definite manifold with application to image set classification, in: International Conference on Machine Learning, 2015. [24] S. Jayasumana, R. Hartley, M. Salzmann, H. Li, M. Harandi, Kernel methods on the Riemannian manifold of symmetric positive definite matrices, in: IEEE Conference on Computer Vision and Pattern Recognition, 2013. [25] S. Jayasumana, R. Hartley, M. Salzmann, H. Li, M. Harandi, Kernel methods on Riemannian manifolds with gaussian rbf kernels, IEEE Trans. Pattern Anal. Mach. Intell. 37 (12) (2015) 2464–2477. [26] K. Kanatani, Motion segmentation by subspace separation and model selection, in: IEEE International Conference on Computer Vision, 2, 2001, pp. 586–591. [27] T.O. Kvalseth, Entropy and correlation : some comments, IEEE Trans. Syst. Man Cybern. 17 (3) (1987) 517–519. [28] P. Li, Q. Wang, W. Zuo, L. Zhang, Log-Euclidean kernel for sparse representation and dictionary learning, in: IEEE International Conference on Computer Vision, 2013, pp. 1601–1608. [29] G. Liu, Z. Lin, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation, IEEE Trans. Pattern Anal. Mach. Intell. 35 (1) (2013) 171–184. [30] C. Lu, H. Min, Z. Zhao, L. Zhu, D. Huang, S. Yan, Robust and efficient subspace segmentation via least squares regression, in: European Conference on Computer Vision, 2012. [31] C. Lu, J. Feng, Z. Lin, S. Yan, Correlation adaptive subspace segmentation by trace lasso, in: IEEE International Conference on Computer Vision, 2013.

[32] Y. Lu, Z. Lai, Z. Fan, J. Cui, Q. Zhu, Manifold discriminant regression learning for image classification, Neurocomputing 166 (2015) 475–486. [33] Z.-M. Lu, B. Li, Q.-G. Ji, Z.-F. Tan, Y. Zhang, Robust video identification approach based on local non-negative matrix factorization, Int. J. Electron. Commun. 69 (1) (2015) 82–89. [34] L. Ma, C. Wang, B. Xiao, W. Zhou, Sparse representation for face recognition based on discriminative low-rank dictionary learning, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2586–2593. [35] Y. Ma, A. Yang, H. Derksen, R. Fossum, Estimation of subspace arrangements with applications in modeling and segmenting mixed data, SIAM Rev. 50 (3) (2008) 413–458. [36] A. Nachman, Theory of reproducing kernels, Trans. Am. Math. Soc. 68 (3) (1950) 337–404. [37] F. Nie, X. Wang, H. Huang, Clustering and projected clustering with adaptive neighbors, Knowledge Discovery and Data Mining, 2014. [38] V.M. Patel, H.V. Nguyen, R. Vidal, Latent space sparse subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 691–701. [39] X. Pennec, P. Fillard, N. Ayache, A Riemannian framework for tensor computing, Int. J. Comput. Vis. 66 (1) (2006) 41–66. [40] W.M. Rand, Objobject criteria for the evaluation of clustering methods, J. Am. Stat. Assoc. 66 (336) (1971) 846–850. [41] B. Scholkopf, A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, USA, 2001. [42] J. Shi, J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (1) (20 0 0) 888–905. [43] S. Sra, A. Cherian, Generalized dictionary learning for symmetric positive definite matrices with application to nearest neighbor retrieval, in: Machine Learning and Knowledge Discovery in Databases, 2011, pp. 318–332. [44] P. Tseng, Nearest q-flat to m points, J. Optim. Theory Appl. 105 (1) (20 0 0) 249–252. [45] R. Vemulapalli, J.K. Pillai, R. Chellappa, Kernel learning for extrinsic classification of manifold features, in: IEEE Conference on Computer Vision and Pattern Recognition, 2013. [46] R. Vidal, Subspace clustering, IEEE Signal Process. Mag. 28 (2) (2011) 52–68. [47] B. Wang, Y. Hu, J. Gao, Y. Sun, B. Yin, Low rank representation on Grassmann manifolds, in: Asian Conference on Computer Vision, 2014. [48] B. Wang, Y. Hu, J. Gao, Y. Sun, B. Yin, Low rank representation on Grassmann manifolds: an extrinsic perspective, CoRR (2015). abs/1504.01807 [49] B. Wang, Y. Hu, J. Gao, Y. Sun, B. Yin, Product Grassmann manifold representation and its lrr models, American Association for Artificial Intelligence, 2016. [50] B. Wang, Y. Hu, J. Gao, Y. Sun, B. Yin, Laplacian lrr on product Grassmann manifolds for human activity clustering in multi-camera video surveillance, IEEE Trans. Circuits Syst. Video Technol. 27 (2016). [51] R. Wang, H. Guo, L. Davis, Q. Dai, Covariance discriminative learning: a natural and efficient approach to image set classification, in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2496–2503. [52] L. Xu, J. Neufeld, B. Larson, D. Schuurmans, Maximum margin clustering, Neural Information Processing Systems, 2004. [53] C. Yuan, W. Hu, X. Li, S. Maybank, G. Luo, Human action recognition under log-euclidean Riemannian metric, in: Asian Conference on Computer Vision, 2009. [54] X. Zhang, C. Xu, M. Li, X. Sun, Sparse and low-rank coupling image segmentation model via nonconvex regularization, Int. J. Pattern Recogn. Artif. Intell. 29 (2) (2015) 1555004.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

JID: PR

ARTICLE IN PRESS B. Wang et al. / Pattern Recognition 000 (2017) 1–12

[m5G;July 29, 2017;7:42] 11

Boyue Wang received the B.Sc. degree from Hebei University of Technology, Tianjin, China, in 2012. he is currently pursuing the Ph.D. degree in the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing. His current research interests include computer vision, pattern recognition, manifold learning and kernel methods.

Yongli Hu received his Ph.D. degree from Beijing University of Technology in 2005. He is a professor in College of Metropolitan Transportation at Beijing University of Technology. He is a researcher at the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology. His research interests include computer graphics, pattern recognition and multimedia technology.

Junbin Gao graduated from Huazhong University of Science and Technology (HUST), China in 1982 with BSc. degree in Computational Mathematics and obtained PhD from Dalian University of Technology, China in 1991. He is a Professor of Big Data Analytics in the University of Sydney Business School at the University of Sydney and was a Professor in Computer Science in the School of Computing and Mathematics at Charles Sturt University, Australia. He was a senior lecturer, a lecturer in Computer Science from 2001 to 2005 at University of New England, Australia. From 1982 to 2001 he was an associate lecturer, lecturer, associate professor and professor in Department of Mathematics at HUST. His main research interests include machine learning, data analytics, Bayesian learning and inference, and image analysis.

Muhammad Ali received his Master of Science in Pure-Mathematics from University of Dera Ismail Khan, Pakistan, in 20 0 0. He then received the Master of Science(MSc) degree in Applied Mathematics with specialization in Mathematical Modelling and Scientific Computing from University of Kaiserslautern, Germany, in 2005. He is currently working towards the PhD degree at the School of Computing and Mathematics (Faculty of Business), Charles Sturt University Bathurst, Australia, since 2013. His research interests include statistical estimation techniques, with most focus on optimization techniques in Directional distributions using Riemannian manifolds, with applications in computer vision.

David Tien received the bachelors degree in computer science from the Heilongjiang University, China, the masters degrees in computing from Chinese Academy of Sciences as well as in pure mathematics from Ohio State University, USA, and the Ph.D. in electrical engineering from The University of Sydney, Australia. Dr Tien is currently teaching computer science at the Charles Sturt University, Australia. His current research interests include image and signal processing, artificial intelligence, telecommunication coding theory, and biomedical engineering. Dr. Tien served as the Chairman of the IEEE NSW Section.

Yanfeng Sun received her Ph.D. degree from Dalian University of Technology in 1993. She is a professor in College of Metropolitan Transportation at Beijing University of Technology. She is a researcher at the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology. She is the membership of China Computer Federation. Her research interests are multi-functional perception and image processing.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009

JID: PR 12

ARTICLE IN PRESS

[m5G;July 29, 2017;7:42]

B. Wang et al. / Pattern Recognition 000 (2017) 1–12 Baicai Yin received his Ph.D. from Dalian University of Technology in 1993. He is a Professor in the College of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology. He is a researcher at the Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology. He is a member of the China Computer Federation. His research interests include multimedia, multifunctional perception, virtual reality, and computer graphics.

Please cite this article as: B. Wang et al., Low Rank Representation on SPD matrices with Log-Euclidean metric, Pattern Recognition (2017), http://dx.doi.org/10.1016/j.patcog.2017.07.009