Diversity and consistency learning guided spectral embedding for multi-view clustering

Diversity and consistency learning guided spectral embedding for multi-view clustering

Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering Communicated by Prof. Zhou Xiuzhuang Journal Pre-proof Diver...

7MB Sizes 0 Downloads 47 Views

Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering

Communicated by Prof. Zhou Xiuzhuang

Journal Pre-proof

Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering Zhenglai Li, Chang Tang, Jiajia Chen, Cheng Wan, Weiqing Yan, Xinwang Liu PII: DOI: Reference:

S0925-2312(19)31110-5 https://doi.org/10.1016/j.neucom.2019.08.002 NEUCOM 21157

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

22 March 2019 11 July 2019 1 August 2019

Please cite this article as: Zhenglai Li, Chang Tang, Jiajia Chen, Cheng Wan, Weiqing Yan, Xinwang Liu, Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.08.002

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering Zhenglai Lia , Chang Tanga,∗, Jiajia Chenb,∗, Cheng Wana , Weiqing Yanc , Xinwang Liud a School

of Computer Science, China University of Geosciences, Wuhan, 430074, P. R. China of Pharmacy, The Affiliated Huai’an Hospital of Xuzhou Medical University, Huai’an 223002, P. R. China c School of Computer and Control Engineering, Yantai University, Yantai 264005, P. R. China d College of Computer, National University of Defense Technology, Changsha, 410073, P. R. China

b Department

Abstract Multi-view clustering aims to group data points into their classes. Exploiting the complementary information underlying multiple views to benefit the clustering performance is one of the topics of multi-view clustering. Most of existing multi-view clustering methods only constrain diversity and consistency in the data space, but not consider the diversity and consistency in the learned label space. However, It is natural to take the impacts of diversity in the learned label matrix into consideration, because different view would generate different clustering label matrix, in which some labels are consistent and some are diverse. To overcome this issue, we propose a novel multiview clustering method (DCMSC) by constraining diversity and consistency in both the learned clustering label matrix and data space. Specifically, in the learned label space, we relax the learned common label matrix into consistent part and diverse part. Meanwhile, by applying an introduced row-aware diversity representation and l2,1 -norm to constrain diverse part, wrong-labels and the influences of noises on the consistent part are reduced. In the data space, we weight each view by using a self weight strategy. Furthermore, we conduct clustering in spectral embedded spaces instead of original data spaces, which suppresses the effect of noises and decreases redundant information. An augmented Lagrangian multiplier with alternating direction minimization (ALM-ADM) based optimization solution can guarantee the convergence of our method. Extensive experimental results on both synthetic datasets and real-world datasets demonstrate the effectiveness of our method. ∗ Corresponding

authors Email addresses: [email protected] (Zhenglai Li), [email protected] (Chang Tang), [email protected] (Jiajia Chen), [email protected] (Cheng Wan), [email protected] (Weiqing Yan), [email protected] (Xinwang Liu)

Preprint submitted to Elsevier

September 4, 2019

Keywords: Multi-view clustering, spectral embedding, diversity and consistency learning 2010 MSC: 00-01, 99-00

1

1. Introduction

2

As a most fundamental technique in pattern recognition [1, 2, 3], computer vision [4, 5, 6], and

3

machine learning [7, 8, 9], subspace clustering aims to partition a group of data points into k clusters

4

by establishing similarities among them. In the past few years, a great number of classical clustering

5

algorithms( e.g., K-means [10] and Spectral clustering [11] ) have been developed [11, 12, 10, 13].

6

However, these methods mainly focus on dealing with single-view cases, making it hard to find good

7

clusters for multi-view data. In practice, directly using these methods on multi-view data often

8

cannot obtain ideal performance.

9

With the development of information technology, the data we face can usually be described from

10

multiple views in many practical applications. The most common examples are pictures and videos

11

that can be represented as SIFT [14], HOG [15], LBP [16] and GIST [17] pattern. In multi-view data,

12

different perspectives usually have different statistical properties. Therefore, an important challenge

13

of multi-view clustering is to integrate the strengths of heterogeneous features by investigating the

14

structure among various perspectives. Focusing on this issue, a variety of unsupervised multi-view

15

clustering algorithms have been proposed [18, 19, 20, 21, 22, 23]. The work in [18] proposes the

16

multi-version of K-means and expectation-maximization (EM) for multi-view clustering. Bickel

17

and Scheffer [19] constructed a projection for multi-view data, with a correlated clustering label.

18

The work in [20] proposes an nonnegative matrix factorization (NMF) based multi-view clustering

19

method to search a cooperative factorization for multiple views. The proposed approach in [21]

20

forces the similarity matrix of each view to be similar as possible via a co-regularized framework.

21

However, These methods often handle not good with the impacts of diversities between different

22

views, which considerably degrades the performance of clustering. The called diversities between

23

multiple views are that each view has its unique part and others do not exist.

24

In recent years, a large number of unsupervised multi-view clustering methods have been pro-

25

posed to constrain the diversities between different perspectives and achieved reasonable success. A

26

multi-view clustering based on K-means for solving large scale data is proposed in [24]. It combines

27

these different features among multiple views by using the view weights. In addition, the l2,1 -

28

norm constraint makes its model robust to outliers. Xu et al. [25] integrated feature selection into 2

29

multi-view clustering method for high-dimensional data task. Wang et al. [26] proposed a multi-

30

view affinity propagation for clustering. The messages are passed both within individual views and

31

across different views. It is helpful to guarantee the clustering quality and the clustering consistency.

32

Xu et al. [27] proposed a multi-view K-means clustering based on least-absolute residual model

33

and used discriminative feature embedding to solve high dimensional problems. The self-weighted

34

parameters for each view make good use for complementary information between different views.

35

Nie et al. [28] proposed an adaptively weighted procrustes (AWP) approach to scatter multi-view

36

spectral embedding, which weights views with their clustering capacities. Cao et al. [29] proposed

37

a self-representation based method to learn a diversity-induced similarity graph. It uses Hilbert-

38

Schmidt independence criterion (HSIC) [30] as a diversity item to obtain the diversity information

39

of multi-view representations. Then, the Laplacian matrix for each representation makes the ob-

40

jective function smooth. The approach in [31] is a self-representation based method which captures

41

the diversity information of multi-view representation by an introduced position-aware exclusivity

42

term and obtains consistent information by a global clustering indicator. The method in [32] learns

43

a unified a Laplacian rank constrained similarity graph by introducing a self-weighted parameter.

44

Tang et al. [33] learned a joint affinity graph for multi-view subspace clustering. Then, the diversity

45

regularization and graph rank constraint were integrated into a low rank self-representation model.

46

Huang et al. [34] constructed a latent intact space from multiple insufficient views and captured

47

the cluster structure from the intact space at the same time. Zhan et al. [35] learned a consensus

48

graph by decreasing disagreement among views and imposing Laplacian rank constraint. Zhang et

49

al [36] captured underlying complementary information to seek an optimal latent representation for

50

multi view clustering task.

51

Although these methods provide better results than single-view methods in a variety of scenarios,

52

there are still at least two issues to be addressed. Many previous methods tend to only constrain

53

diversity and consistency in the data space, but ignore these properties in the learned label space.

54

It would result wrong-labels when we ignore the impacts of diversities and noises on the learned

55

clustering matrix. Furthermore, consistency always exists in clustering label between different views

56

which makes it more feasible to capture consistency and diversity in label spaces. To overcome this

57

issue, we propose a novel multi-view clustering method to consider the consistency and diversity in

58

the data space and learned clustering label matrix at the same time. The low dimensional features

59

spaces are also exploited to obtain promising results. In general, the contributions of our method

3

60

are summarized as the following three aspects:

61

(1) We propose a novel multi-view clustering method which considers the diversity and consis-

62

tency in the data space and learned label space at the same time to learn a pure and robust label

63

matrix for multi-view clustering task.

64

(2) A row-aware diversity representation is proposed in our method to describe the diversities

65

in clustering label and spectral embedding, respectively. By using the diversity representation and

66

l2,1 -norm, the diversities and noises are restricted efficiently in the learned clustering label.

67

68

(3) Our method conducts clustering on spectral embedded spaces instead of original data spaces, which is helpful to press the noises and redundancy information.

69

(4) An efficient optimization algorithm based on an augmented Lagrangian multiplier with al-

70

ternating direction minimization (ALM-ADM) guarantees the convergence of our method. And

71

extensive experiments on both synthetic datasets and real-world datasets demonstrate the effec-

72

tiveness of our method.

73

Throughout this paper, metrics are written as boldface capital letters and vectors are denoted

74

by boldface lower case letters, respectively. For an arbitrary matrix X ∈
75

76

(p, q)th entry, xp and xq denotes the pth row and qth column of X, XT denotes its transpose of X, q



Pd Pd Pd qPn 2 T r(X) = p=1 xpp , X F = T r(XT X), X 2,1 = p=1 xp = p=1 q=1 Xpq , 1 denotes def

77

vector with all ones, Ind = {Y ∈ {0, 1}n×k | Y1 = 1} denotes a set of indicator matrix.

78

The rest of this paper is organized as follows. We first review a few of the most related works in

79

Section II. Section III gives our proposed method and the optimization algorithm, the complexity

80

and convergence analysis are also presented in this section. Section IV shows our experimental

81

results and discussions. Section V concludes this paper.

82

2. Related Work

83

In this section, we mainly focus on reviewing existing work which is most relevant to our proposed

84

method.

85

2.1. Spectral Embedding Suppose there are a set of data points which denote as xp ∈
4

similarity matrix which demonstrates the relationship of data points. The similarity matrix can be constructed by a kernel function [37]. Spq = exp(−

kxp − xq k22 ) 2σ 2

(1)

the parameter σ is used to adjust the neighborhood size. Next, the spectral embedding of original data can be captured by solving the following eigenvalue decomposition problem [38]: min T r(FT LF) s.t. FT F = I

(2)

F

86

87

88

where L is the Laplacian graph of similarity matrix S. L can be obtained by L = D − S, where D Pn is a diagonal degree matrix consists of Dpq = q=1 Spq . F is the spectral embedding and optimal F are the eigenvectors corresponding to the k smallest eigenvalues of L. In addition, the obtained

89

F is the relaxed solution compared with the discrete label matrix. Spectral rotation and K-means

90

are useful methods to convert the relaxed solution F into a discrete label.

91

2.2. Multiview Clustering via Adaptively Weighted Procrustes

92

By extending the spectral embedding, Nie et al. [28] proposed an adaptively weighted procrustes

93

(AWP) approach to recover the discrete label of spectral embedding. In AWP, a predefined view

94

weight is applied to distinguish the clustering capacity of different views for improving the clustering

95

performance. AWP aims to solve the following optimization problem.

min

Y,R(i)

v X

1

Y − F(i) R(i) 2 F p i=1 i (i) T

s.t. Y ∈ Ind, (R ) R

(i)

(3) def

= I, pi = kY∗ − F

(i)

R∗(i) kF

96

where R(i) represents the rotation of ith view spectral embedding F(i) . pi denotes the ith view

97

clustering capacity which depends on the optimal solution clustering label Y∗ and rotation R(i) ∗ . def

98

The def represents that a variable is defined as something. The Ind means Ind = {Y ∈ {0, 1}n×k |

99

Y1 = 1}, which denotes a set of indicator matrix.

100

2.3. Re-Weighted Discriminatively Embedded K-Means for Multi-View Clustering K-means is a very efficient algorithm and has been widely used. According to[27], for a data set X ∈
U denotes the clustering centroids and V denotes the clustering indicators. In particular, the clustering centroids matrix V and clustering indicators matrix U can be obtained by solving the following optimization problem.

2 min X − UVT F s.t. V ∈ Ind

(4)

U,V

To cluster data described by multi-view features, the usual method is to integrate the heterogeneous features by exploring the complementary information among them. Therefore, K-means can be extended to cope with the multi-view clustering problem. m X

(i)

X − U(i) VT 2 s.t. V ∈ Ind F

min

U(i) ,V

(5)

i=1

101

where X(i) represents the features in ith view, V(i) represents the clustering centroids in ith view

102

and V is the consistent clustering indicators. To cluster high dimension data efficiently and exploit the complementary information between different views, Xu et al. [27] put forward a re-weighted discriminatively embedding based on Kmeans for multi-view clustering (RDEKM). In RDEKM, a least-absolute residual model is applied to reduce the impact of outliers and complete dimension reduction. In brief, the model of RDEKM is formulated as follow. min

W(i) ,U(i) ,V

m X i=1

α(i) W(i)T X(i) − U(i) VT 2F

(6)

s.t. W(i)T W(i) = I, V ∈ Ind α(i) =

1

2kW(i)T X(i) − U(i) VT kF

103

the W(i) ∈
104

method. In addition, a self-weighted parameter α(i) makes the objective function smooth.

105

2.4. Diversity-induced Multiview Subspace Clustering The self-representation model X = XZ is most used to learning a similarity graph, S =

|Z|+|Z|T 2

.

Cao et al. [29] exploits the Hilbert-Schmidt independence criterion (HSIC) to capture the diversity among multiple representation. The proposed diversity-induced multiview subspace clustering

6

(DiMSC) can be formulated as following optimization problem. min i Z

m m X X

(i)

X − X(i) Z(i) 2 + λs tr(Z(i) L(i) Z(i)T ) F i=

+ λv

m X

i=1

(i)

HSIC(Z , Z

(j)

(7)

)

j6=i

106

the second item of the objective function is a local structure constraint to enforce the representation

107

as similar as in original data space.

108

These previous work can handle the diversity and consistency information well in the data space

109

and achieve appreciative clustering results. However, they do not take the diversity and consistency

110

in the learned label space into consideration. This would learn a non-pure and suboptimal label

111

matrix duo to the diversities and noises in it. Different from these previous work, we consider sepa-

112

rating the learned clustering label matrix into a consistent part and a diversity part and making use

113

of an diverse representation and l2,1 -norm constraint to capture diversities and noises in clustering

114

label and data space at the same time.

115

3. Proposed method

116

3.1. Method As aforementioned, we aim to constrain the diversity and consistency in the data space and label space at the same time, then learn an optimal label matrix for multi-view clustering. In our model, we relax the learned clustering label matrix into a consistent part and a diverse part, then integrate them into a multi-view K-means based model. Consequently, we formulate the objective function as follows: min

U(i) ,Y,R(i)

m X i=1

2 α(i) F(i)T − U(i) (Y + R(i) )T F

s.t. Y ∈ Ind, α(i) =

1

(8)

2kF(i)T − U(i) (Y + R(i) )T kF

117

which α(i) is a self weighted parameter, which indicates the clustering capability of i-th view. Y

118

is regraded as the consistent part of label matrix. R(i) is treated as the diverse part of i-th label

119

matrix. F(i) is a commendable low dimension feature embedding of original data X(i) . Compared

120

with the original data X(i) , F(i) suppresses the redundancy information of i-th view and the impacts

7

121

of the noises. Thus, we conduct clustering on the low dimension feature F(i) instead of original

122

data X(i) .

123

For the diverse part R(i) , a number of criteria for constraining similarity and diversity can

124

be used, such as Kullback-Leibler (KL) divergence [39], Hilbert- Schmidt independence criterion

125

(HSIC) [30] and maximum mean discrepancy [40]. They are not intuitive and efficient enough to

126

describe the diversities in different spectral embedding and clustering labels, respectively. Here, we

127

introduce a row-aware diversity representation item to describe the diversities lying in clustering

128

label of different perspectives.

129

130

131

(i)

(i)

(i)

(i)

(j)

(j)

(j)

Definition1 : Given two matrix R(i) = [r1 ; r2 ; ...; rp ; ...; rn ] and R(j) = [r1 ; r2 ; ...; rp ; Pn (j) (i) (j)T ...; rn ]. The diversity between R(i) and R(j) is define as p=1 rp rp . In the definition, the diversity representation is based on an inner product in each row. The

132

larger the value of the inner product, the higher the similarity between the two rows. Therefore,

133

we call that the defined diverse representation is row-aware. In addition, we predefine a parameter βijp as a prior information to control the diversities (i)

(j)

between rp and rp more precisely. Specifically, if the diversities between the pth row of spectral embedding in ith and jth view are larger, we should impose βijp larger weight to obtain the larger (i)

(j)

diversities between rp and rp . Here, we use a log function [41] to get βijp . The log function limits (j) the value of the diversity information βijp within [0, 1] and is a decreasing function of f(i) p fp .

1

βijp =

134

(9)

(i)T ) 1 + log(f(i) p fp

where f(i) p is the i-th view p-th row of low dimensional spectral embedded features F. By integrating these constraints into Eq.(8), the final optimization function of our method is formulated as follows: min

U(i) ,Y,R(i)

+ λ1

m X i=1

2 α(i) F(i)T − U(i) (Y + R(i) )T F

m m n X X X

(j)T βijp r(i) + λ2 p rp

i=1 j6=i,j=1 p=1

s.t. Y ∈ Ind, α(i) =

m X

(i)

R 2,1

(10)

i=1

1 2kF

(i)T

(i)

− U (Y + R(i) )T kF

135

where λ1 , λ2 are two balance parameters. The third term l2,1 -norm constraint is used to suppress

136

the noises in the learned label matrix. 8

137

As we can see from Eq.(10), compared with previous work, we simultaneously constrain the

138

diversity and consistency in the data space and learned label space. A self-weighted approach is

139

utilized to measure the clustering capability of each view. Then, we divide the learned label matrix

140

into a consistent part and a diverse part. A row-aware diversity constraint captures diversities lying

141

in label space. finally, we can learn a pure and optimal label matrix for multi-view clustering task.

142

3.2. Optimization

143

In this section, we employ the augmented Lagrange multiplier with alternating direction mini-

144

mizing (ALM-ADM) strategy [42] to optimize our objection function. By separating the objective

145

function, we have three subproblems here, Y-subproblem, R-subproblem and U-subproblem. Y-subproblem: when U(1) , U(2) , ..., U(m) , R(1) , R(2) , ..., R(m) are fixed, Eq.(10) can be rewritten as: m X

min Y

i=1

2 α(i) F(i)T − U(i) (Y + R(i) )T F

(11)

s.t. Y ∈ Ind

It is worth noting that, each row of Y is independent. By separating F(i) , Y and R(i) into independent vectors. Then the function can be converted as follow: min Y

m X

α(i)

i=1

n X

(i)T

T 2

fp − U(i) (yp + r(i) p ) 2 p=1

s.t. Y ∈ Ind, yp ∈ Y, Yp,c ∈ {0, 1},

C X

(12)

Yp,c = 1

c=1

In addition, each row of identify matrix IC is a candidate of yp . IC = {e1 , e2 , ..., eC }, yp ∈ {e1 , e2 , ..., eC }

(13)

Thus, the optimal solution of Eq.(12) is one of the row of IC that can make the objective function Eq.(14) reach the minimum. c∗ = arg min c

m X i=1

α(i)

n X

(iT )

T 2

fp − U(i) (ec + r(i) p ) 2

(14)

p=1

R-subproblem when Y, U(1) , U(2) , ..., U(m) are fixed, Eq.(10) can be rewritten as:

2 min α(i) F(i)T − U(i) (Y + R(i) )T F R(i)

+ λ1

m n X X

rp(i) r(j)T βijp + λ2 T r(R(i)T D(i) R(i) ) p

j6=i,j=1 p=1

9

(15)

where Di is a diagonal matrix corresponding to the ith view [43]. Thus, it is defined as: 1 D(i) pp = (i) 2

2 rp 2

(16)

Intuitively, each row of R(i) can be updated independently. Thus, by separating X(i) , Y and R(i) into independent vectors, taking the derivative of Eq.(15), and setting it to zero, the Eq.(15) is set as: (i)

α(i)(−2f(i)T U(i) + 2yp U(i)T U(i) + 2rP U(i)T U(i) ) p + 2λ1

n m X X

(j)

(i) rP βijp + 2λ2 r(i) p Dpp = 0

(17)

p=1 j6=i,j=1

U-subproblem when Y, R(1) , R(2) , ..., R(m) are fixed, Eq.(10) can be rewritten as: min U(i)

m X

(i)T

2

F − U(i) (Y + R(i) )T F

(18)

i=1

Taking the derivative of Eq.(18), and setting it to zero.

− 2F(i)T Y − 2F(i)T R(i) + 2U(i) YT Y + 2U(i) YT R(i) + 2U(i) R(i)T Y + 2U(i) R(i)T R(i) = 0

146

(19)

3.3. Complexity and Convergence

147

Convergence Analysis:

148

The work in [43] and [44] has proved the stable convergence of ALM-ADM with two subproblems.

149

While, it is difficult to give joint proof of the convergence of ALM-ADM with more than two

150

subproblems theoretically. There are three subproblems in algorithm 1, i.e., Y-subproblem, R-

151

subproblem and U-subproblem. In Y-subproblem, Y can obtain its optimal solution by searching

152

a set of ec . The convergence of R-subproblem is guaranteed in [43]. U-subproblem solves a least-

153

absolute residual model to obtain optimal solution. Thus, the convergence of each subproblem can

154

be guaranteed. In addition, experimental results on synthetic data and real world datasets show

155

that our method has strong and stable convergence.

156

Computational Complexity:

157

Our method consists of three subproblems. The complexity of updating Y is O(nk 2 ), where n

158

and k are the number of samples and the clusters respectively. The complexity of updating each 10

Algorithm 1: Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering (DCMSC) Input: initial label matrix Y, spectral embedding F(i) , the number of clusters k, β via Eq.(9), max number of iterations Tmax , threshold .; Output: label matrix Y, clustering centroids matrix U(i) , the diversity part R(i) ; 1

Initialize Ri , U(i) , α(i) = 1/m;

2

while obj(t) − obj(t − 1) <  or t > Tmax do

3

Update valuable D(i) via Eq.(16);

4

Update valuable R(i) via Eq.(17);

5

Update valuable U(i) via Eq.(19);

6

Update valuable Y via Eq.(14);

7

Update valuable α(i) via Eq.(8);

8

end

9

return R(i) , U(i) , Y;

159

row of R(i) is O(k 3 ). Thus, the complexity of updating the entire R(i) is O(nk 3 ). For updating

160

U(i) , the complexity is O(k 3 ). In all, the total complexity of our method is O(nk 2 + nk 2 + k 3 ) for

161

each iteration.

162

4. Experiment

163

4.1. Datasets

164

165

We experiment with synthetic datasets and six widely used real world datasets to evaluate our method. Table 1 gives the information of these datasets. These real world datasets are as follows:

166

HandWritten [45] consists of 2000 handwritten digital binary images (0 ∼ 9), each containing

167

200 digital points. We use the published six perspectives for multi-view clustering, where these

168

features are 76 Fourier coefficients of the character shapes (FOU), 216 profile correlations (FAC),

169

64 Karhunen-love coefficients (KAR), 240 pixel averages in 2×3 windows (PIX), 74 Zernike moment

170

(ZER) and 6 morphological (MOR) features.

171

Caltech101-7 [46] contains 8677 object recognition images belonging to 101 classes. We choose

172

7 categories and 6 features that are widely used, i,e., 48 dimension Gabor features, 40 dimension 11

Table 1: Information of the multi-view datasets

Dataset

Samples

Views

Clusters

Synthetic data 1

1000

3

2

Synthetic data 2

200

2

2

Synthetic data 3

300

2

3

handwritten

2000

6

10

Caltech101-7

1474

6

7

BBCsport

544

2

5

Yale

165

3

15

3sources

169

3

6

LandUse-21

2100

3

21

173

wavelet moments (WM), 254 dimension GENTRIST features, 1984 dimension HOG features, 512

174

dimension GIST features, and 928 dimension LBP features.

175

176

177

178

179

180

181

182

183

184

185

186

187

188

BBCSport [47] contains documents from the BBC Sport website corresponding to sports news in 5 topical areas, associated with 3183 dimension and 3203 dimension features respectively. Yale [48] consists of 165 gray scale face maps belonging to 15 categories. Each category has a different facial expression and form. 3Sources is a multi-view text dataset, collected from three well-known online news sources: BBC, Reuters and Guardian. LandUse-21 [49] consists of 2100 satellite images belonging to 21 classes. We use three types of features to clustering. Synthetic datasets are used to study the robustness of our proposed method to noises. The synthetic datasets are as follows: Synthetic data 1: this dataset composes of 3 views 1000 data points which is generated by a two-component Gaussian mixture model, given by [21]. Synthetic data 2: this dataset is a two-moon dataset which consists of 2 views 200 data points belonging to 2 clusters, obtained by adding 0.12 and 0.14 percentage of noise.

189

Synthetic data 3: there are 300 data points belonging to three clusters in this dataset. Two

190

views add 0.14 and 0.16 percentage of noise respectively. The Synthetic data 2 and 3 are obtained

12

191

in [50].

192

4.2. Compared Methods

193

194

195

196

We compare our method with 2 single view clustering methods and 7 multi-view clustering methods to show the superiority of our method. K-means and spectral Clustering (SC) are classical clustering method, which are widely used due to their simplicity and efficiency.

197

Diversity-induced multi-view subspace clustering (DiMSC) [29] uses Hilbert-Schmidt

198

independence criterion (HSIC) [30] as a diversity item to capture the complementary information

199

of multi-view representations.

200

Parameter-free auto-weighted multiple graph learning (AMGL) [51] proposes a frame-

201

work for multi-view clustering and semi-supervised tasks via the reformulation of the standard

202

spectral clustering model.

203

Exclusivity-consistency regularized multi-view subspace clustering (ECMSC) [31]

204

captures the complementary information of multi-view representation by introducing a position-

205

aware exclusivity term.

206

207

Self-weighted multiview clustering with multiple graphs (SwMC) [32] learns a unified similarity graph by introducing a self-weighted parameter.

208

Adaptive structure concept factorization for multiview clustering (MVCF) [52] pro-

209

poses a concept factorization-based multi-view clustering which jointly optimize the graph matrix

210

to make full use the correlated information between multiple views.

211

Graph learning for multiview clustering (MVGL) [50] proposes a graph learning-based

212

method without a predefined graph, and its initial graphs are learned from the multiple data points.

213

Multiview clustering via adaptively weighted procrustes (AWP) [28] uses a self-weighted

214

procustes to recovery discrete label from the spectral embedding.

215

4.3. Experiment Setting

216

To ensure generality, the parameters in all the comparison methods follow the setting in their

217

paper. In addition, for the method that needs to use K-means clustering to obtain the final

218

clustering indicators, we run K-means clustering 20 times to weaken the influence of clustering

219

centroid random initialization, and report the mean performance. For our method, we search the

13

220

number of adaptive neighbors from 5 to 50 in constructing a similarity graph, with searching step

221

equalling 5, and tune the parameters λ1 , λ2 from {0.001, 0.01, 0.1, 1, 10, 100, 1000} for all datasets.

222

For the evaluation metrics, we use normalized mutual information (NMI), accuracy (ACC),

223

adjusted rand index (AR), F-score, Precision, Recall and Purity to comprehensively evaluate

224

the performance. These evaluation metrics are most used in much of previous works [53, 54, 55,

225

56, 57, 58]. Also, a higher value of these metrics indicates better performance.

14

Table 2: Clustering results on synthetic datasets

Dataset

Synthetic Data 1

Synthetic Data 2

Synthetic Data 3

Method

ACC

NMI

F-score

Precision

Recall

AR

Purity

K-means

0.9880

0.9121

0.9767

0.9794

0.9740

0.9525

0.9880

SC

0.5760

0.0025

0.6759

0.5109

0.9983

0.0007

0.5760

DiMSC

0.7990

0.2671

0.6895

0.6804

0.6996

0.3563

0.7990

AMGL

0.9940

0.9468

0.9883

0.9889

0.9877

0.9761

0.9940

SwMC

0.9980

0.9809

0.9961

0.9967

0.9955

0.9920

0.9980

ECMSC

0.9890

0.9114

0.9787

0.9784

0.9790

0.9564

0.9890

AWP

0.9970

0.9732

0.9941

0.9950

0.9933

0.9880

0.9970

MVCF

0.5760

0.0025

0.6759

0.5109

0.9983

0.0007

0.5760

MVGL

0.9370

0.7116

0.8831

0.8927

0.8737

0.7636

0.9370

DCMSC

1.0000

1.0000

1.0000

1.0000

1.0000

1.0000

1.0000

K-means

0.3345

0.0000

0.3267

0.3266

0.3267

0.0000

0.3345

SC

0.3367

0.0129

0.4958

0.3311

0.9867

0.0000

0.3400

DiMSC

0.4180

0.0244

0.3514

0.3441

0.3636

0.0201

0.4198

AMGL

0.3500

0.0013

0.3280

0.3275

0.3284

0.0000

0.3533

SwMC

0.5607

0.5097

0.5900

0.4811

0.7648

0.3049

0.6367

ECMSC

0.3467

0.0006

0.3278

0.3271

0.3286

0.0000

0.3467

AWP

0.3567

0.0016

0.4975

0.3311

1.0000

0.0000

0.3567

MVCF

0.3367

0.0129

0.4958

0.3311

0.9867

0.0000

0.3400

MVGL

0.3333

0.0000

0.3317

0.3268

0.3368

0.0000

0.3333

DCMSC

0.5233

0.1183

0.4133

0.4104

0.4949

0.1198

0.5233

K-means

0.7700

0.2234

0.6429

0.6417

0.6440

0.2880

0.7700

SC

0.5050

0.0096

0.6622

0.4975

0.9900

0.0000

0.5050

DiMSC

0.7200

0.1449

0.5931

0.5927

0.5935

0.1895

0.7200

AMGL

0.8750

0.5616

0.7858

0.7625

0.8106

0.5604

0.8750

SwMC

1.0000

1.0000

1.0000

1.0000

1.0000

1.0000

1.0000

ECMSC

0.5150

0.0006

0.4956

0.4954

0.4963

0.0000

0.5150

AWP

0.8850

0.5837

0.7997

0.7795

1.0000

0.5910

0.8850

MVCF

0.5050

0.0096

0.6622

0.4975

0.9900

0.0000

0.5050

MVGL

0.5000

0.0000

0.6644

0.4975

1.0000

0.0000

0.5000

DCMSC

0.9200

0.670515 0.8532

0.8425

0.8642

0.7042

0.9200

Table 3: Clustering results on real world datasets

Dataset

handwritten

Yale

Caltech101-7

bbcsport

Method

ACC

NMI

F-score

Precision

Recall

AR

Purity

K-means

0.8364

0.7761

0.7439

0.7376

0.7504

0.7153

0.8402

SC

0.9400

0.8990

0.8922

0.8882

0.9029

0.8802

0.9404

DiMSC

0.7232

0.6319

0.5972

0.5937

0.6007

0.5523

0.7232

AMGL

0.8674

0.8872

0.8469

0.8300

0.8658

0.8295

0.8734

SwMC

0.8022

0.8859

0.8087

0.7327

0.9074

0.7846

0.8183

ECMSC

0.8580

0.8476

0.8146

0.8103

0.8188

0.7939

0.8580

AWP

0.9615

0.9161

0.9248

0.9242

0.9353

0.9165

0.9615

MVCF

0.8753

0.7878

0.7730

0.7690

0.7770

0.7477

0.8753

MVGL

0.9195

0.8659

0.8315

0.8113

0.8527

0.8124

0.9195

DCMSC

0.9720

0.9381

0.9451

0.9448

0.9990

0.9390

0.9720

K-means

0.5642

0.6425

0.4381

0.3944

0.4954

0.3970

0.5755

SC

0.6515

0.7039

0.5125

0.4659

0.5716

0.4773

0.6688

DiMSC

0.7088

0.7398

0.5713

0.5416

0.6058

0.5418

0.7155

AMGL

0.5712

0.6306

0.3845

0.3288

0.4648

0.3370

0.5803

SwMC

0.6370

0.6721

0.4348

0.3768

0.5158

0.3919

0.6430

ECMSC

0.6606

0.6886

0.4555

0.4030

0.5236

0.4152

0.6667

AWP

0.6424

0.7101

0.5378

0.4964

0.5867

0.5051

0.6485

MVCF

0.5148

0.5683

0.3589

0.3298

0.3944

0.3132

0.5279

MVGL

0.6182

0.6263

0.3919

0.3196

0.5067

0.3428

0.6182

DCMSC

0.7576

0.7473

0.5896

0.5642

0.9297

0.5616

0.7576

K-means

0.4797

0.5563

0.5139

0.8354

0.3738

0.3599

0.8712

SC

0.7840

0.7008

0.8017

0.8835

0.7338

0.6950

0.8931

DiMSC

0.5030

0.5074

0.5171

0.8527

0.3711

0.3698

0.8298

AMGL

0.6650

0.5354

0.6193

0.6449

0.5962

0.3958

0.8479

SwMC

0.6466

0.4731

0.6160

0.5953

0.6562

0.3391

0.7905

ECMSC

0.6716

0.5496

0.6539

0.8043

0.5508

0.4959

0.8229

AWP

0.7062

0.6910

0.7497

0.9035

0.6407

0.6320

0.8915

MVCF

0.3703

0.3256

0.3955

0.6453

0.2852

0.2085

0.7837

MVGL

0.6900

0.5505

0.6417

0.6609

0.6236

0.4275

0.8555

DCMSC

0.8372

0.7134

0.8291

0.9566

1.0000

0.7175

0.9281

K-means

0.8420

0.7311

0.7802

0.7516

0.8141

0.7070

0.8523

SC

0.9081

0.7948

0.8519

0.7974

0.9144

0.8012

0.9081

DiMSC

0.8828

0.7454

0.8425

0.8576

0.8615

0.7917

0.8828

AMGL

0.9724

0.9044

0.9507

0.9388

0.9630

0.9350

0.9724

SwMC

0.6502

0.6211

0.6108

0.4570

0.9398

0.4220

0.6712

ECMSC

0.3879

0.0728

0.2480

0.7967

0.0226

0.3971

AWP

0.9779

0.9266

0.3782 16 0.9574

0.9675

0.9798

0.9443

0.9779

MVCF

0.8640

0.7336

0.8123

0.7732

0.8555

0.7495

0.8640

MVGL

0.7059

0.7279

0.6868

0.5413

0.9395

0.5509

0.7426

DCMSC

0.9853

0.9502

0.9694

0.9756

0.9837

0.9598

0.9853

K-means

0.5775

0.4991

0.5308

0.5727

0.5020

0.4027

0.7080

SC

0.6133

0.5714

0.5777

0.5420

0.6243

0.4369

0.7290

226

4.4. Clustering Results

227

4.4.1. Results on Synthetic Data

228

The clustering results of different methods on three synthetic datasets are given in Table 2.

229

The maximum value of each evaluation is indicated in bold. As can be seen from the Table 2, our

230

method gets better and more accurate results than other methods. This indicates our method is

231

robust to noises.

232

4.4.2. Results on Real Data

233

234

Table 3 gives the clustering results of different methods on six real world datasets. From the comparison results, we get the following observations:

235

(1) In general, our method outperforms all methods of comparison. For example, the results

236

of our method in the Yale dataset outperform the second performer (DiMSC) about 5, 1 and 4

237

percentage in terms of ACC, NMI and Purity, respectively. These previous methods only capture

238

diversities in data space, which would obtain a suboptimal label matrix duo to noises and diversities

239

in it. In addition, some previous methods conduct clustering on original data. The redundancy

240

information would decrease the clustering performance a lot.

241

(2) Overall, the clustering results of multi-view clustering methods outperform those of single

242

view methods. This intuitively demonstrates the superiority of multiple views. In multi-view case,

243

the underlying complementary knowledge benefits clustering performance.

244

(3) AWP uses a self-weighted procustes to recovery discrete label from the spectral embedding.

245

However, assigning view weight only guarantees capturing diversity in data space. The noises and

246

diversity in label space are not taken into consideration in AWP. Thus, our method can achieve

247

better result than AWP.

248

249

4.5. Ablation Study of The Proposed Model

250

In this subsection, we present some ablation study of our proposed model. Specifically, the

251

results without capturing diversity and consistency in data space, the results without capturing

252

diversity and consistency in label space, the results without using spectral embedding have been

253

analyzed.

254

Effectiveness of capturing diversity and consistency in data space: In order to validate

255

the effectiveness of capturing diversity and consistency in data space, we use Eq.(10) without 17

Table 4: Ablation study of the proposed model

Dataset

handwritten

Yale

Caltech101-7

bbcsport

3sources

LandUse-21

Method

ACC

NMI

F-score

Precision

Recall

AR

Purity

DCMSC-no-SE

0.9090

0.8374

0.8316

0.8261

1.0000

0.8129

0.9090

DCMSC-no-DCDS

0.9735

0.9407

0.9479

0.9475

1.0000

0.9421

0.9735

DCMSC-no-DCLS

0.9640

0.9213

0.9296

0.9288

0.9304

0.9218

0.9640

DCMSC

0.9720

0.9381

0.9451

0.9448

0.9990

0.9390

0.9720

DCMSC-no-SE

0.5333

0.6000

0.3984

0.3784

0.7285

0.3571

0.5455

DCMSC-no-DCDS

0.7212

0.7380

0.5802

0.5554

0.9515

0.5516

0.7212

DCMSC-no-DCLS

0.6242

0.6877

0.4903

0.4418

1.0000

0.4532

0.6424

DCMSC

0.7576

0.7473

0.5896

0.5642

0.9297

0.5616

0.7576

DCMSC-no-SE

0.8331

0.7044

0.8545

0.9340

0.9980

0.7705

0.9009

DCMSC-no-DCDS

0.8358

0.7094

0.8208

0.9563

0.9981

0.7037

0.9261

DCMSC-no-DCLS

0.5414

0.6241

0.6158

0.8892

1.0000

0.4754

0.8820

DCMSC

0.8372

0.7134

0.8291

0.9566

1.0000

0.7175

0.9281

DCMSC-no-SE

0.7574

0.6201

0.7398

0.7383

0.7413

0.6580

0.8033

DCMSC-no-DCDS

0.9816

0.9373

0.9607

0.9652

1.0000

0.9485

0.9816

DCMSC-no-DCLS

0.9835

0.9436

0.9668

0.9739

0.9598

0.9565

0.9835

DCMSC

0.9853

0.9502

0.9694

0.9756

0.9837

0.9598

0.9853

DCMSC-no-SE

0.5266

0.3919

0.4802

0.5541

0.4238

0.3492

0.6213

DCMSC-no-DCDS

0.8462

0.7587

0.7859

0.8476

0.9967

0.7261

0.8698

DCMSC-no-DCLS

0.7396

0.7157

0.7063

0.7957

0.6350

0.6300

0.8166

DCMSC

0.8402

0.7574

0.7809

0.8456

0.9939

0.7197

0.8698

DCMSC-no-SE

0.2671

0.3235

0.1697

0.1525

1.0000

0.1238

0.3052

DCMSC-no-DCDS

0.3067

0.3571

0.1949

0.1895

0.9896

0.1538

0.3286

DCMSC-no-DCLS

0.2990

0.3557

0.1948

0.1853

1.0000

0.1528

0.3133

DCMSC

0.3081

0.3634

0.1936

0.1851

0.9425

0.1517

0.3276

18

256

parameters α. We represent this method as “DCMSC-no-DCDS”, and the corresponding results

257

on the six datasets are shown in Table 4. From the results on handwritten and 3sources, DCMSC-

258

no-DCDS achieves better performance than DCMSC. The reason may be that the self-weighted

259

strategy sometimes would not capture well diversity and consistency in data space. Besides, The

260

diverse part R may disturb the effectiveness of the self-weighted strategy. From the results on the

261

leftover datasets, capturing diversity and consistency in data space can improve the final results.

262

Effectiveness of capturing diversity and consistency in label space: In order to validate

263

the effectiveness of capturing diversity and consistency in label space, we remove R from Eq.(10)

264

and represent the method as “DCMSC-no-DCLS”. The results on the six datasets obtained by

265

“DCMSC-no-DCLS” are shown in Table 4. As can be seen from the seven metrics on six datasets,

266

capturing diversity and consistency in data space can distinctly improve the final results, which

267

demonstrates that capturing diversity and consistency in data space can learn more robust label

268

matrix for multi-view clustering task.

269

Effectiveness of using spectral embedding: In our proposed model as formulated by

270

Eq.(10), using low dimensional spectral embedded features aims to press the noises and redun-

271

dancy information. In order to validate the effectiveness of using spectral embedding, we replace

272

spectral embedding F with original data X from Eq.(6) and represent the method as “DCMSC-no-

273

SE”. The results on the six datasets obtained by “DCMSC-no-SE” are shown in Table 4. As can

274

be seen from the seven metrics on six datasets, the low dimensional spectral embedded features can

275

distinctly improve the final results, which demonstrates that the spectral embedding can press the

276

noises and redundancy information for multi-view clustering task.

277

4.6. Parameter Sensitivity Analysis and Convergence Study

278

4.6.1. Parameter Sensitivity Analysis

279

there are three parameters λ1 , λ2 and the number of adaptive neighbors in our method. We

280

show the performances of proposed method in six real world datasets with different combinations

281

of λ1 and λ2 values in terms of ACC, NMI and Purity, respectively, given in Figure 1 - Figure 3.

282

In addition, the performances of proposed method in six real world datasets with different number

283

of adaptive neighbors are shown in Figure 4. From these figures, we get following observations:

284

(1) In particular, our method is sensitive to the parameter λ2 , but less sensitive to the parameter

285

λ1 on the six real world datasets. As can be seen, when λ2 changes from 1 to 1000, our method

19

286

can achieve stable and relatively feasible results.

287

(2) The Caltech101-7, Yale and 3sources are quite sensitive to the number of adaptive neighbors

288

in our method. In contrast, others can achieve high performance in a large range of the numbers

289

of adaptive neighbors.

290

4.6.2. Convergence Study

291

As we discussed in previous section, our method converges theoretically. In this subsection, we

292

conduct an experimental study on the convergence of our method on six real world datasets. The

293

convergence curves given in Figure 5 indicate that the objective function values decrease very fast

294

to steadily values within about 5 iterations on these datasets. Therefore, our method converge

295

stably.

(a) handwritten

(b) Yale

(c) Caltech101-7

(d) BBCsport

(e) 3sources

(f) LandUse-21

Figure 1: Performance of proposed method in six real world datasets with different combinations of λ1 and λ2 values in terms of ACC.

20

(a) handwritten

(b) Yale

(c) Caltech101-7

(d) BBCsport

(e) 3sources

(f) LandUse-21

Figure 2: Performance of proposed method in six real world datasets with different combinations of λ1 and λ2 values in terms of NMI.

21

(a) handwritten

(b) Yale

(c) Caltech101-7

(d) BBCsport

(e) 3sources

(f) LandUse-21

Figure 3: Performance of proposed method in six real world datasets with different combinations of λ1 and λ2 values in terms of Purity.

1

1

0.8

0.8

0.6

0.6

0.6

NMI

ACC

0.4

0.4

handwritten Yale Caltech101-7

0.2 0

Purity

1 0.8

5

10

BBCsport 3sources LandUse-21

15 20 25 30 35 40 45 the number of adaptive neighbors

(a) ACC

handwritten Yale Caltech101-7

0.2

50

0

5

10

BBCsport 3sources LandUse-21

15 20 25 30 35 40 45 the number of adaptive neighbors

(b) NMI

0.4 handwritten Yale Caltech101-7

0.2

50

0

5

10

BBCsport 3sources LandUse-21

15 20 25 30 35 40 45 the number of adaptive neighbors

50

(c) Purity

Figure 4: Performance of proposed method with different neighbor values in terms of ACC,NMI and Purity, respectively.

22

2.5

109

2.5

106

8

108

7 2

2

1

objective function values

objective function values

objective function values

6

1.5

1.5

1

5 4 3 2

0.5

0.5

0

0

1

5

10

15

20 25 30 35 the number of iterations

40

45

50

5

10

15

(a) handwritten 108

18

0

50

5

10

15

20 25 30 35 the number of iterations

40

45

50

(c) Caltech101-7

107

15

108

16 14 objective function values

6 objective function values

45

(b) Yale

7

5 4 3 2

12 10 8 6

10

5

4

1 0

40

objective function values

8

20 25 30 35 the number of iterations

2

5

10

15

20 25 30 35 the number of iterations

(d) BBCsport

40

45

50

0

5

10

15

20 25 30 35 the number of iterations

(e) 3sources

40

45

50

0

5

10

15

20 25 30 35 the number of iterations

40

45

50

(f) LandUse-21

Figure 5: The convergence curve of proposed method on handwritten, Yale, Caltech101-7, BBCsport, 3sources, LandUse-21, respectively.

23

296

5. Conclusions

297

In this paper, we propose a novel diversity and consistency learning guided spectral embedding

298

for multi-view clustering. Compared with previous method, we learn an optimal label matrix by

299

capturing the diversity and consistency in the data space and learned label space at the same time.

300

We employ a self-weight strategy to weight each view in data space. In the learned label space,

301

we relax the common clustering label into a consistent part and a diverse part. Moreover, using a

302

row-aware diversity representation and l2,1 -norm can capture the diversities lying in diverse part

303

of label well, so as to make better use of the multiple complementary information. Furthermore,

304

we conduct clustering on low dimensional embedded features instead of original data to suppress

305

noises and redundancy information. Using an augmented Lagrangian multiplier with alternating

306

direction minimization based algorithm can find the optimal solution quickly. Additionally, exten-

307

sive experiments on synthetic datasets and real world datasets demonstrate the superiority of our

308

method.

309

6. Acknowledgments

310

This work was supported in part by the Fundamental Research Funds for the Central Uni-

311

versities, China University of Geosciences (Wuhan) under Grant No.CUG170654, in part by the

312

National Natural Science Foundation of China under Grant No.61701451, 61773392 and 61801414,

313

and in part by Shandong Province Natural Science Foundation under Grants No. ZR2017QF006.

314

References

315

[1] R. J. Schalkoff, Pattern recognition, Wiley Encyclopedia of Computer Science and Engineering.

316

[2] A. Ben-Dor, R. Shamir, Z. Yakhini, Clustering gene expression patterns, Journal of Computa-

317

tional Biology 6 (3-4) (1999) 281–297.

318

[3] C. Tang, X. Liu, P. Wang, C. Zhang, M. Li, L. Wang, Adaptive hypergraph embedded semi-

319

supervised multi-label image annotation, IEEE Transactions on Multimedia (2019) 1–1doi:

320

10.1109/TMM.2019.2909860.

321

322

[4] X. Zhu, C. Tang, P. Wang, H. Xu, M. Wang, J. Tian, Saliency detection via affinity graph learning and weighted manifold ranking, Neurocomputing 312 (2018) 239–250. 24

323

324

[5] L. Lu, R. Vidal, Combined central and subspace clustering for computer vision applications, in: International Conference on Machine Learning, ACM, 2006, pp. 593–600.

325

[6] C. Tang, X. Zhu, X. Liu, L. Wang, Z. Albert, Defusionnet: Defocus blur detection via recur-

326

rently fusing and refining multi-scale deep features, in: IEEE Computer Society Conference on

327

Computer Vision and Pattern Recognition, 2019, pp. 2700–2709.

328

[7] A. McGregor, M. Hall, P. Lorier, J. Brunskill, Flow clustering using machine learning tech-

329

niques, in: International workshop on passive and active network measurement, Springer, 2004,

330

pp. 205–214.

331

332

[8] H. T. Nguyen, A. Smeulders, Active learning using pre-clustering, in: International Conference on Machine Learning, ACM, 2004, p. 79.

333

[9] C. Tang, X. Liu, X. Zhu, J. Xiong, M. Li, J. Xia, X. Wang, L. Wang, Feature selective

334

projection with low-rank embedding and dual laplacian regularization, IEEE Transactions on

335

Knowledge and Data Engineering (2019) 1–1doi:10.1109/TKDE.2019.2911946.

336

337

338

339

[10] J. A. Hartigan, M. A. Wong, Algorithm as 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics) 28 (1) (1979) 100–108. [11] U. V. Luxburg, A tutorial on spectral clustering, Statistics and Computing 17 (4) (2007) 395–416.

340

[12] X. Chen, J. Zhexue Haung, F. Nie, R. Chen, Q. Wu, A self-balanced min-cut algorithm for

341

image clustering, in: IEEE International Conference on Computer Vision, 2017, pp. 2061–2069.

342

[13] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, A. Y. Wu, An

343

efficient k-means clustering algorithm: Analysis and implementation, IEEE Transactions on

344

Pattern Analysis and Machine Intelligence 24 (7) (2002) 881–892.

345

346

347

348

[14] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110. [15] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, IEEE, 2005, pp. 886–893.

25

349

[16] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture

350

classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine

351

Intelligence 24 (7) (2002) 971–987.

352

353

354

355

[17] A. Oliva, A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision 42 (3) (2001) 145–175. [18] S. Bickel, T. Scheffer, Multi-view clustering, in: IEEE International Conference on Data Mining, 2004, pp. 19–26.

356

[19] K. Chaudhuri, S. M. Kakade, K. Livescu, K. Sridharan, Multi-view clustering via canonical

357

correlation analysis, in: Annual International Conference on Machine Learning, ACM, 2009,

358

pp. 129–136.

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

[20] J. Liu, C. Wang, J. Gao, J. Han, Multi-view clustering via joint nonnegative matrix factorization, in: SIAM International Conference on Data Mining, SIAM, 2013, pp. 252–260. [21] A. Kumar, P. Rai, H. Daum, Co-regularized multi-view spectral clustering, in: International Conference on Neural Information Processing Systems, 2011, pp. 1413–1421. [22] Y. Furukawa, B. Curless, S. M. Seitz, R. Szeliski, Towards internet-scale multi-view stereo, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2010, pp. 1434–1441. [23] Q. Yin, S. Wu, R. He, L. Wang, Multi-view clustering via pairwise sparse subspace representation, Neurocomputing 156 (2015) 12–21. [24] C. Xiao, F. Nie, H. Huang, Multi-view k-means clustering on big data, in: International Joint Conference on Artificial Intelligence, 2013, pp. 2598–2604. [25] Y.-M. Xu, C.-D. Wang, J.-H. Lai, Weighted multi-view clustering with feature selection, Pattern Recognition 53 (2016) 25–35. [26] C.-D. Wang, J.-H. Lai, S. Y. Philip, Multi-view clustering based on belief propagation, IEEE Transactions on Knowledge and Data Engineering 28 (4) (2015) 1007–1021. [27] J. Xu, J. Han, F. Nie, X. Li, Re-weighted discriminatively embedded k-means for multi-view clustering, IEEE Transactions on Image Processing 26 (6) (2017) 3016–3027.

26

375

376

377

378

[28] F. Nie, L. Tian, X. Li, Multiview clustering via adaptively weighted procrustes, in: International Conference on Knowledge Discovery and Data Mining, ACM, 2018, pp. 2022–2030. [29] X. Cao, C. Zhang, H. Fu, S. Liu, H. Zhang, Diversity-induced multi-view subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 586–594.

379

[30] A. Gretton, O. Bousquet, A. Smola, B. Schlkopf, Measuring statistical dependence with hilbert-

380

schmidt norms, in: International Conference on Algorithmic Learning Theory, 2005, pp. 63–77.

381

[31] X. Wang, X. Guo, Z. Lei, C. Zhang, S. Z. Li, Exclusivity-consistency regularized multi-view

382

subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017,

383

pp. 1–9.

384

385

[32] F. Nie, J. Li, X. Li, Self-weighted multiview clustering with multiple graphs, in: International Joint Conference on Artificial Intelligence, 2017, pp. 2564–2570.

386

[33] C. Tang, X. Zhu, X. Liu, M. Li, P. Wang, C. Zhang, L. Wang, Learning a joint affinity graph

387

for multiview subspace clustering, IEEE Transactions on Multimedia 21 (7) (2019) 1724–1736.

388

[34] L. Huang, H.-Y. Chao, C.-D. Wang, Multi-view intact space clustering, Pattern Recognition

389

390

391

392

393

86 (2019) 344–353. [35] K. Zhan, F. Nie, J. Wang, Y. Yang, Multiview consensus graph clustering, IEEE Transactions on Image Processing 28 (3) (2018) 1261–1270. [36] C. Zhang, H. Fu, Q. Hu, X. Cao, Y. Xie, D. Tao, D. Xu, Generalized latent multi-view subspace clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence.

394

[37] C. A. R. D. Sousa, An overview on the gaussian fields and harmonic functions method for

395

semi-supervised learning, in: International Joint Conference on Neural Networks, 2015, pp.

396

1–8.

397

[38] R. K. C. Fan, Spectral graph theory, American Mathematical Society, 1997.

398

[39] F. Topsoe, Some inequalities for information divergence and related measures of discrimination,

399

IEEE Transactions on Information Theory 46 (4) (2000) 1602–1609.

27

400

[40] A. Gretton, K. M. Borgwardt, M. Rasch, B. Sch¨ olkopf, A. J. Smola, A kernel method for

401

the two-sample-problem, in: Advances in Neural Information Processing Systems, 2007, pp.

402

513–520.

403

404

[41] J. Tang, X. Hu, H. Gao, H. Liu, Exploiting local and global social context for recommendation, in: International Joint Conference on Artificial Intelligence, 2013, pp. 2712–2718.

405

[42] Z. Lin, R. Liu, Z. Su, Linearized alternating direction method with adaptive penalty for low-

406

rank representation, in: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Wein-

407

berger (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc.,

408

2011, pp. 612–620.

409

[43] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-

410

rank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1)

411

(2013) 171–184.

412

413

414

415

416

417

418

419

[44] Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices, arXiv preprint arXiv:1009.5055. [45] M.Lichman, Uci machine learning repository - census+income dataset, URL http://archive. ics. uci. edu/ml/datasets/Census+ Income. [46] D. Dueck, B. J. Frey, Non-metric affinity propagation for unsupervised image categorization, in: IEEE International Conference on Computer Vision, 2007, pp. 1–8. [47] R. Xia, Y. Pan, L. Du, J. Yin, Robust multi-view spectral clustering via low-rank and sparse decomposition, in: AAAI Conference on Artificial Intelligence, 2014, pp. 2149–2155.

420

[48] A. S. Georghiades, P. N. Belhumeur, D. J. Kriegman, From few to many: Generative models

421

for recognition under variable pose and illumination, in: IEEE International Conference on

422

Automatic Face and Gesture Recognition, 2000, p. 277.

423

[49] Y. Yang, S. Newsam, Bag-of-visual-words and spatial extensions for land-use classification, in:

424

Sigspatial International Conference on Advances in Geographic Information Systems, 2010,

425

pp. 270–279.

28

426

427

[50] K. Zhan, C. Zhang, J. Guan, J. Wang, Graph learning for multiview clustering, IEEE Transactions on Cybernetics 48 (10) (2018) 2887–2895.

428

[51] F. Nie, J. Li, X. Li, Parameter-free auto-weighted multiple graph learning: a framework for

429

multiview clustering and semi-supervised classification, in: International Joint Conference on

430

Artificial Intelligence, 2016, pp. 1881–1887.

431

432

[52] K. Zhan, J. Shi, J. Wang, H. Wang, Y. Xie, Adaptive structure concept factorization for multiview clustering, Neural Computation 30 (4) (2018) 1.

433

[53] C. Tang, X. Liu, M. Li, P. Wang, J. Chen, L. Wang, W. Li, Robust unsupervised feature

434

selection via dual self-representation and manifold regularization, Knowledge-Based Systems

435

145 (2018) 109–120.

436

437

438

439

440

441

[54] C. Tang, X. Zhu, J. Chen, P. Wang, X. Liu, J. Tian, Robust graph regularized unsupervised feature selection, Expert Systems with Applications 96 (2018) 64–76. [55] X. Cao, C. Zhang, C. Zhou, H. Fu, H. Foroosh, Constrained multi-view video face clustering, IEEE Transactions on Image Processing 24 (11) (2015) 4381–4393. [56] X. Liu, Y. Dou, J. Yin, L. Wang, E. Zhu, Multiple kernel k-means clustering with matrixinduced regularization, in: AAAI Conference on Artificial Intelligence, 2016, pp. 1888–1894.

442

[57] X. Liu, S. Zhou, Y. Wang, M. Li, Y. Dou, E. Zhu, J. Yin, Optimal neighborhood kernel

443

clustering with multiple kernels, in: AAAI Conference on Artificial Intelligence, 2017, pp.

444

2266–2272.

445

[58] C. Tang, X. Zhu, X. Liu, L. Wang, Cross-view local structure preserved diversity and consen-

446

sus learning for multi-view unsupervised feature selection, in: AAAI Conference on Artificial

447

Intelligence, 2019, pp. 595–604.

29

448

Zhenglai Li received the BE degree from China University of Geosciences, Wuhan, China, in 2018. Currently, he is a now pursuing the master degree in China University of Geosciences, Wuhan, China. His research interests include multi-view clustering.

Chang Tang received his Ph.D. degree from Tianjin University, Tianjin, China in 2016. He joined the AMRL Lab of the University of Wollongong between Sep. 2014 and Sep. 2015. He is now an associate professor at the School of Computer Science, China University of Geosciences, Wuhan, China. Dr. Tang has published 20+ peer-reviewed papers, including those in highly regarded journals and conferences such as IEEE T-HMS, IEEE SPL, ICCV, CVPR, ACMM, etc. He served on the Technical Program Committees of IJCAI 2018, ICME 2018, AAAI 2019, ICME 2019, IJCAI 2019 and CVPR 2019. His current research interests include machine learning and data mining.

Jiajia Chen received the MS degree from the school of Pharmacy, Nanjing Medical University in 2014. She works as a pharmacist at the Department of Pharmacy, Huai'an Second People's Hospital Affiliated to Xuzhou Medical College. Her research interests include the medical data analysis and medical image process.

449

Cheng Wan is a graduate student in China University of Geosciences, Wuhan. Her current research interests include machine learning, data mining and its applications.

Weiqing Yan received the Ph.D. degree in information and communication engineering from Tianjin University, Tianjin, China, in 2017. She was a visiting student at visual spatial perceived lab, University of California, Berkeley, CA, USA, from September 2015 to September 2016. She is currently a lecture with the School of Computer and Control Engineering, Yantai University, Yantai, Shandong Province, China. Her research interests include 3D image editing, computer graphic, and computer vision.

Xinwang Liu received his PhD degree from National University of Defense Technology (NUDT), China. He is now Assistant Researcher of School of Computer Science, NUDT. His current research interests include kernel learning and unsupervised feature learning. Dr. Liu has published 40+ peer-reviewed papers, including those in highly regarded journals and conferences such as IEEE T-IP, IEEE T-NNLS, ICCV, AAAI, IJCAI, etc. He served on the Technical Program Committees of IJCAI 2016/2017/2018 and AAAI 2016/2017/2018.

450

Conflict of Interest

451

None.

32