Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering
Communicated by Prof. Zhou Xiuzhuang
Journal Pre-proof
Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering Zhenglai Li, Chang Tang, Jiajia Chen, Cheng Wan, Weiqing Yan, Xinwang Liu PII: DOI: Reference:
S0925-2312(19)31110-5 https://doi.org/10.1016/j.neucom.2019.08.002 NEUCOM 21157
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
22 March 2019 11 July 2019 1 August 2019
Please cite this article as: Zhenglai Li, Chang Tang, Jiajia Chen, Cheng Wan, Weiqing Yan, Xinwang Liu, Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.08.002
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.
Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering Zhenglai Lia , Chang Tanga,∗, Jiajia Chenb,∗, Cheng Wana , Weiqing Yanc , Xinwang Liud a School
of Computer Science, China University of Geosciences, Wuhan, 430074, P. R. China of Pharmacy, The Affiliated Huai’an Hospital of Xuzhou Medical University, Huai’an 223002, P. R. China c School of Computer and Control Engineering, Yantai University, Yantai 264005, P. R. China d College of Computer, National University of Defense Technology, Changsha, 410073, P. R. China
b Department
Abstract Multi-view clustering aims to group data points into their classes. Exploiting the complementary information underlying multiple views to benefit the clustering performance is one of the topics of multi-view clustering. Most of existing multi-view clustering methods only constrain diversity and consistency in the data space, but not consider the diversity and consistency in the learned label space. However, It is natural to take the impacts of diversity in the learned label matrix into consideration, because different view would generate different clustering label matrix, in which some labels are consistent and some are diverse. To overcome this issue, we propose a novel multiview clustering method (DCMSC) by constraining diversity and consistency in both the learned clustering label matrix and data space. Specifically, in the learned label space, we relax the learned common label matrix into consistent part and diverse part. Meanwhile, by applying an introduced row-aware diversity representation and l2,1 -norm to constrain diverse part, wrong-labels and the influences of noises on the consistent part are reduced. In the data space, we weight each view by using a self weight strategy. Furthermore, we conduct clustering in spectral embedded spaces instead of original data spaces, which suppresses the effect of noises and decreases redundant information. An augmented Lagrangian multiplier with alternating direction minimization (ALM-ADM) based optimization solution can guarantee the convergence of our method. Extensive experimental results on both synthetic datasets and real-world datasets demonstrate the effectiveness of our method. ∗ Corresponding
authors Email addresses:
[email protected] (Zhenglai Li),
[email protected] (Chang Tang),
[email protected] (Jiajia Chen),
[email protected] (Cheng Wan),
[email protected] (Weiqing Yan),
[email protected] (Xinwang Liu)
Preprint submitted to Elsevier
September 4, 2019
Keywords: Multi-view clustering, spectral embedding, diversity and consistency learning 2010 MSC: 00-01, 99-00
1
1. Introduction
2
As a most fundamental technique in pattern recognition [1, 2, 3], computer vision [4, 5, 6], and
3
machine learning [7, 8, 9], subspace clustering aims to partition a group of data points into k clusters
4
by establishing similarities among them. In the past few years, a great number of classical clustering
5
algorithms( e.g., K-means [10] and Spectral clustering [11] ) have been developed [11, 12, 10, 13].
6
However, these methods mainly focus on dealing with single-view cases, making it hard to find good
7
clusters for multi-view data. In practice, directly using these methods on multi-view data often
8
cannot obtain ideal performance.
9
With the development of information technology, the data we face can usually be described from
10
multiple views in many practical applications. The most common examples are pictures and videos
11
that can be represented as SIFT [14], HOG [15], LBP [16] and GIST [17] pattern. In multi-view data,
12
different perspectives usually have different statistical properties. Therefore, an important challenge
13
of multi-view clustering is to integrate the strengths of heterogeneous features by investigating the
14
structure among various perspectives. Focusing on this issue, a variety of unsupervised multi-view
15
clustering algorithms have been proposed [18, 19, 20, 21, 22, 23]. The work in [18] proposes the
16
multi-version of K-means and expectation-maximization (EM) for multi-view clustering. Bickel
17
and Scheffer [19] constructed a projection for multi-view data, with a correlated clustering label.
18
The work in [20] proposes an nonnegative matrix factorization (NMF) based multi-view clustering
19
method to search a cooperative factorization for multiple views. The proposed approach in [21]
20
forces the similarity matrix of each view to be similar as possible via a co-regularized framework.
21
However, These methods often handle not good with the impacts of diversities between different
22
views, which considerably degrades the performance of clustering. The called diversities between
23
multiple views are that each view has its unique part and others do not exist.
24
In recent years, a large number of unsupervised multi-view clustering methods have been pro-
25
posed to constrain the diversities between different perspectives and achieved reasonable success. A
26
multi-view clustering based on K-means for solving large scale data is proposed in [24]. It combines
27
these different features among multiple views by using the view weights. In addition, the l2,1 -
28
norm constraint makes its model robust to outliers. Xu et al. [25] integrated feature selection into 2
29
multi-view clustering method for high-dimensional data task. Wang et al. [26] proposed a multi-
30
view affinity propagation for clustering. The messages are passed both within individual views and
31
across different views. It is helpful to guarantee the clustering quality and the clustering consistency.
32
Xu et al. [27] proposed a multi-view K-means clustering based on least-absolute residual model
33
and used discriminative feature embedding to solve high dimensional problems. The self-weighted
34
parameters for each view make good use for complementary information between different views.
35
Nie et al. [28] proposed an adaptively weighted procrustes (AWP) approach to scatter multi-view
36
spectral embedding, which weights views with their clustering capacities. Cao et al. [29] proposed
37
a self-representation based method to learn a diversity-induced similarity graph. It uses Hilbert-
38
Schmidt independence criterion (HSIC) [30] as a diversity item to obtain the diversity information
39
of multi-view representations. Then, the Laplacian matrix for each representation makes the ob-
40
jective function smooth. The approach in [31] is a self-representation based method which captures
41
the diversity information of multi-view representation by an introduced position-aware exclusivity
42
term and obtains consistent information by a global clustering indicator. The method in [32] learns
43
a unified a Laplacian rank constrained similarity graph by introducing a self-weighted parameter.
44
Tang et al. [33] learned a joint affinity graph for multi-view subspace clustering. Then, the diversity
45
regularization and graph rank constraint were integrated into a low rank self-representation model.
46
Huang et al. [34] constructed a latent intact space from multiple insufficient views and captured
47
the cluster structure from the intact space at the same time. Zhan et al. [35] learned a consensus
48
graph by decreasing disagreement among views and imposing Laplacian rank constraint. Zhang et
49
al [36] captured underlying complementary information to seek an optimal latent representation for
50
multi view clustering task.
51
Although these methods provide better results than single-view methods in a variety of scenarios,
52
there are still at least two issues to be addressed. Many previous methods tend to only constrain
53
diversity and consistency in the data space, but ignore these properties in the learned label space.
54
It would result wrong-labels when we ignore the impacts of diversities and noises on the learned
55
clustering matrix. Furthermore, consistency always exists in clustering label between different views
56
which makes it more feasible to capture consistency and diversity in label spaces. To overcome this
57
issue, we propose a novel multi-view clustering method to consider the consistency and diversity in
58
the data space and learned clustering label matrix at the same time. The low dimensional features
59
spaces are also exploited to obtain promising results. In general, the contributions of our method
3
60
are summarized as the following three aspects:
61
(1) We propose a novel multi-view clustering method which considers the diversity and consis-
62
tency in the data space and learned label space at the same time to learn a pure and robust label
63
matrix for multi-view clustering task.
64
(2) A row-aware diversity representation is proposed in our method to describe the diversities
65
in clustering label and spectral embedding, respectively. By using the diversity representation and
66
l2,1 -norm, the diversities and noises are restricted efficiently in the learned clustering label.
67
68
(3) Our method conducts clustering on spectral embedded spaces instead of original data spaces, which is helpful to press the noises and redundancy information.
69
(4) An efficient optimization algorithm based on an augmented Lagrangian multiplier with al-
70
ternating direction minimization (ALM-ADM) guarantees the convergence of our method. And
71
extensive experiments on both synthetic datasets and real-world datasets demonstrate the effec-
72
tiveness of our method.
73
Throughout this paper, metrics are written as boldface capital letters and vectors are denoted
74
by boldface lower case letters, respectively. For an arbitrary matrix X ∈
75
76
(p, q)th entry, xp and xq denotes the pth row and qth column of X, XT denotes its transpose of X, q
Pd Pd Pd qPn 2 T r(X) = p=1 xpp , X F = T r(XT X), X 2,1 = p=1 xp = p=1 q=1 Xpq , 1 denotes def
77
vector with all ones, Ind = {Y ∈ {0, 1}n×k | Y1 = 1} denotes a set of indicator matrix.
78
The rest of this paper is organized as follows. We first review a few of the most related works in
79
Section II. Section III gives our proposed method and the optimization algorithm, the complexity
80
and convergence analysis are also presented in this section. Section IV shows our experimental
81
results and discussions. Section V concludes this paper.
82
2. Related Work
83
In this section, we mainly focus on reviewing existing work which is most relevant to our proposed
84
method.
85
2.1. Spectral Embedding Suppose there are a set of data points which denote as xp ∈
4
similarity matrix which demonstrates the relationship of data points. The similarity matrix can be constructed by a kernel function [37]. Spq = exp(−
kxp − xq k22 ) 2σ 2
(1)
the parameter σ is used to adjust the neighborhood size. Next, the spectral embedding of original data can be captured by solving the following eigenvalue decomposition problem [38]: min T r(FT LF) s.t. FT F = I
(2)
F
86
87
88
where L is the Laplacian graph of similarity matrix S. L can be obtained by L = D − S, where D Pn is a diagonal degree matrix consists of Dpq = q=1 Spq . F is the spectral embedding and optimal F are the eigenvectors corresponding to the k smallest eigenvalues of L. In addition, the obtained
89
F is the relaxed solution compared with the discrete label matrix. Spectral rotation and K-means
90
are useful methods to convert the relaxed solution F into a discrete label.
91
2.2. Multiview Clustering via Adaptively Weighted Procrustes
92
By extending the spectral embedding, Nie et al. [28] proposed an adaptively weighted procrustes
93
(AWP) approach to recover the discrete label of spectral embedding. In AWP, a predefined view
94
weight is applied to distinguish the clustering capacity of different views for improving the clustering
95
performance. AWP aims to solve the following optimization problem.
min
Y,R(i)
v X
1
Y − F(i) R(i) 2 F p i=1 i (i) T
s.t. Y ∈ Ind, (R ) R
(i)
(3) def
= I, pi = kY∗ − F
(i)
R∗(i) kF
96
where R(i) represents the rotation of ith view spectral embedding F(i) . pi denotes the ith view
97
clustering capacity which depends on the optimal solution clustering label Y∗ and rotation R(i) ∗ . def
98
The def represents that a variable is defined as something. The Ind means Ind = {Y ∈ {0, 1}n×k |
99
Y1 = 1}, which denotes a set of indicator matrix.
100
2.3. Re-Weighted Discriminatively Embedded K-Means for Multi-View Clustering K-means is a very efficient algorithm and has been widely used. According to[27], for a data set X ∈
U denotes the clustering centroids and V denotes the clustering indicators. In particular, the clustering centroids matrix V and clustering indicators matrix U can be obtained by solving the following optimization problem.
2 min X − UVT F s.t. V ∈ Ind
(4)
U,V
To cluster data described by multi-view features, the usual method is to integrate the heterogeneous features by exploring the complementary information among them. Therefore, K-means can be extended to cope with the multi-view clustering problem. m X
(i)
X − U(i) VT 2 s.t. V ∈ Ind F
min
U(i) ,V
(5)
i=1
101
where X(i) represents the features in ith view, V(i) represents the clustering centroids in ith view
102
and V is the consistent clustering indicators. To cluster high dimension data efficiently and exploit the complementary information between different views, Xu et al. [27] put forward a re-weighted discriminatively embedding based on Kmeans for multi-view clustering (RDEKM). In RDEKM, a least-absolute residual model is applied to reduce the impact of outliers and complete dimension reduction. In brief, the model of RDEKM is formulated as follow. min
W(i) ,U(i) ,V
m X i=1
α(i) W(i)T X(i) − U(i) VT 2F
(6)
s.t. W(i)T W(i) = I, V ∈ Ind α(i) =
1
2kW(i)T X(i) − U(i) VT kF
103
the W(i) ∈
104
method. In addition, a self-weighted parameter α(i) makes the objective function smooth.
105
2.4. Diversity-induced Multiview Subspace Clustering The self-representation model X = XZ is most used to learning a similarity graph, S =
|Z|+|Z|T 2
.
Cao et al. [29] exploits the Hilbert-Schmidt independence criterion (HSIC) to capture the diversity among multiple representation. The proposed diversity-induced multiview subspace clustering
6
(DiMSC) can be formulated as following optimization problem. min i Z
m m X X
(i)
X − X(i) Z(i) 2 + λs tr(Z(i) L(i) Z(i)T ) F i=
+ λv
m X
i=1
(i)
HSIC(Z , Z
(j)
(7)
)
j6=i
106
the second item of the objective function is a local structure constraint to enforce the representation
107
as similar as in original data space.
108
These previous work can handle the diversity and consistency information well in the data space
109
and achieve appreciative clustering results. However, they do not take the diversity and consistency
110
in the learned label space into consideration. This would learn a non-pure and suboptimal label
111
matrix duo to the diversities and noises in it. Different from these previous work, we consider sepa-
112
rating the learned clustering label matrix into a consistent part and a diversity part and making use
113
of an diverse representation and l2,1 -norm constraint to capture diversities and noises in clustering
114
label and data space at the same time.
115
3. Proposed method
116
3.1. Method As aforementioned, we aim to constrain the diversity and consistency in the data space and label space at the same time, then learn an optimal label matrix for multi-view clustering. In our model, we relax the learned clustering label matrix into a consistent part and a diverse part, then integrate them into a multi-view K-means based model. Consequently, we formulate the objective function as follows: min
U(i) ,Y,R(i)
m X i=1
2 α(i) F(i)T − U(i) (Y + R(i) )T F
s.t. Y ∈ Ind, α(i) =
1
(8)
2kF(i)T − U(i) (Y + R(i) )T kF
117
which α(i) is a self weighted parameter, which indicates the clustering capability of i-th view. Y
118
is regraded as the consistent part of label matrix. R(i) is treated as the diverse part of i-th label
119
matrix. F(i) is a commendable low dimension feature embedding of original data X(i) . Compared
120
with the original data X(i) , F(i) suppresses the redundancy information of i-th view and the impacts
7
121
of the noises. Thus, we conduct clustering on the low dimension feature F(i) instead of original
122
data X(i) .
123
For the diverse part R(i) , a number of criteria for constraining similarity and diversity can
124
be used, such as Kullback-Leibler (KL) divergence [39], Hilbert- Schmidt independence criterion
125
(HSIC) [30] and maximum mean discrepancy [40]. They are not intuitive and efficient enough to
126
describe the diversities in different spectral embedding and clustering labels, respectively. Here, we
127
introduce a row-aware diversity representation item to describe the diversities lying in clustering
128
label of different perspectives.
129
130
131
(i)
(i)
(i)
(i)
(j)
(j)
(j)
Definition1 : Given two matrix R(i) = [r1 ; r2 ; ...; rp ; ...; rn ] and R(j) = [r1 ; r2 ; ...; rp ; Pn (j) (i) (j)T ...; rn ]. The diversity between R(i) and R(j) is define as p=1 rp rp . In the definition, the diversity representation is based on an inner product in each row. The
132
larger the value of the inner product, the higher the similarity between the two rows. Therefore,
133
we call that the defined diverse representation is row-aware. In addition, we predefine a parameter βijp as a prior information to control the diversities (i)
(j)
between rp and rp more precisely. Specifically, if the diversities between the pth row of spectral embedding in ith and jth view are larger, we should impose βijp larger weight to obtain the larger (i)
(j)
diversities between rp and rp . Here, we use a log function [41] to get βijp . The log function limits (j) the value of the diversity information βijp within [0, 1] and is a decreasing function of f(i) p fp .
1
βijp =
134
(9)
(i)T ) 1 + log(f(i) p fp
where f(i) p is the i-th view p-th row of low dimensional spectral embedded features F. By integrating these constraints into Eq.(8), the final optimization function of our method is formulated as follows: min
U(i) ,Y,R(i)
+ λ1
m X i=1
2 α(i) F(i)T − U(i) (Y + R(i) )T F
m m n X X X
(j)T βijp r(i) + λ2 p rp
i=1 j6=i,j=1 p=1
s.t. Y ∈ Ind, α(i) =
m X
(i)
R 2,1
(10)
i=1
1 2kF
(i)T
(i)
− U (Y + R(i) )T kF
135
where λ1 , λ2 are two balance parameters. The third term l2,1 -norm constraint is used to suppress
136
the noises in the learned label matrix. 8
137
As we can see from Eq.(10), compared with previous work, we simultaneously constrain the
138
diversity and consistency in the data space and learned label space. A self-weighted approach is
139
utilized to measure the clustering capability of each view. Then, we divide the learned label matrix
140
into a consistent part and a diverse part. A row-aware diversity constraint captures diversities lying
141
in label space. finally, we can learn a pure and optimal label matrix for multi-view clustering task.
142
3.2. Optimization
143
In this section, we employ the augmented Lagrange multiplier with alternating direction mini-
144
mizing (ALM-ADM) strategy [42] to optimize our objection function. By separating the objective
145
function, we have three subproblems here, Y-subproblem, R-subproblem and U-subproblem. Y-subproblem: when U(1) , U(2) , ..., U(m) , R(1) , R(2) , ..., R(m) are fixed, Eq.(10) can be rewritten as: m X
min Y
i=1
2 α(i) F(i)T − U(i) (Y + R(i) )T F
(11)
s.t. Y ∈ Ind
It is worth noting that, each row of Y is independent. By separating F(i) , Y and R(i) into independent vectors. Then the function can be converted as follow: min Y
m X
α(i)
i=1
n X
(i)T
T 2
fp − U(i) (yp + r(i) p ) 2 p=1
s.t. Y ∈ Ind, yp ∈ Y, Yp,c ∈ {0, 1},
C X
(12)
Yp,c = 1
c=1
In addition, each row of identify matrix IC is a candidate of yp . IC = {e1 , e2 , ..., eC }, yp ∈ {e1 , e2 , ..., eC }
(13)
Thus, the optimal solution of Eq.(12) is one of the row of IC that can make the objective function Eq.(14) reach the minimum. c∗ = arg min c
m X i=1
α(i)
n X
(iT )
T 2
fp − U(i) (ec + r(i) p ) 2
(14)
p=1
R-subproblem when Y, U(1) , U(2) , ..., U(m) are fixed, Eq.(10) can be rewritten as:
2 min α(i) F(i)T − U(i) (Y + R(i) )T F R(i)
+ λ1
m n X X
rp(i) r(j)T βijp + λ2 T r(R(i)T D(i) R(i) ) p
j6=i,j=1 p=1
9
(15)
where Di is a diagonal matrix corresponding to the ith view [43]. Thus, it is defined as: 1 D(i) pp = (i) 2
2 rp 2
(16)
Intuitively, each row of R(i) can be updated independently. Thus, by separating X(i) , Y and R(i) into independent vectors, taking the derivative of Eq.(15), and setting it to zero, the Eq.(15) is set as: (i)
α(i)(−2f(i)T U(i) + 2yp U(i)T U(i) + 2rP U(i)T U(i) ) p + 2λ1
n m X X
(j)
(i) rP βijp + 2λ2 r(i) p Dpp = 0
(17)
p=1 j6=i,j=1
U-subproblem when Y, R(1) , R(2) , ..., R(m) are fixed, Eq.(10) can be rewritten as: min U(i)
m X
(i)T
2
F − U(i) (Y + R(i) )T F
(18)
i=1
Taking the derivative of Eq.(18), and setting it to zero.
− 2F(i)T Y − 2F(i)T R(i) + 2U(i) YT Y + 2U(i) YT R(i) + 2U(i) R(i)T Y + 2U(i) R(i)T R(i) = 0
146
(19)
3.3. Complexity and Convergence
147
Convergence Analysis:
148
The work in [43] and [44] has proved the stable convergence of ALM-ADM with two subproblems.
149
While, it is difficult to give joint proof of the convergence of ALM-ADM with more than two
150
subproblems theoretically. There are three subproblems in algorithm 1, i.e., Y-subproblem, R-
151
subproblem and U-subproblem. In Y-subproblem, Y can obtain its optimal solution by searching
152
a set of ec . The convergence of R-subproblem is guaranteed in [43]. U-subproblem solves a least-
153
absolute residual model to obtain optimal solution. Thus, the convergence of each subproblem can
154
be guaranteed. In addition, experimental results on synthetic data and real world datasets show
155
that our method has strong and stable convergence.
156
Computational Complexity:
157
Our method consists of three subproblems. The complexity of updating Y is O(nk 2 ), where n
158
and k are the number of samples and the clusters respectively. The complexity of updating each 10
Algorithm 1: Diversity and Consistency Learning Guided Spectral embedding for Multi-view Clustering (DCMSC) Input: initial label matrix Y, spectral embedding F(i) , the number of clusters k, β via Eq.(9), max number of iterations Tmax , threshold .; Output: label matrix Y, clustering centroids matrix U(i) , the diversity part R(i) ; 1
Initialize Ri , U(i) , α(i) = 1/m;
2
while obj(t) − obj(t − 1) < or t > Tmax do
3
Update valuable D(i) via Eq.(16);
4
Update valuable R(i) via Eq.(17);
5
Update valuable U(i) via Eq.(19);
6
Update valuable Y via Eq.(14);
7
Update valuable α(i) via Eq.(8);
8
end
9
return R(i) , U(i) , Y;
159
row of R(i) is O(k 3 ). Thus, the complexity of updating the entire R(i) is O(nk 3 ). For updating
160
U(i) , the complexity is O(k 3 ). In all, the total complexity of our method is O(nk 2 + nk 2 + k 3 ) for
161
each iteration.
162
4. Experiment
163
4.1. Datasets
164
165
We experiment with synthetic datasets and six widely used real world datasets to evaluate our method. Table 1 gives the information of these datasets. These real world datasets are as follows:
166
HandWritten [45] consists of 2000 handwritten digital binary images (0 ∼ 9), each containing
167
200 digital points. We use the published six perspectives for multi-view clustering, where these
168
features are 76 Fourier coefficients of the character shapes (FOU), 216 profile correlations (FAC),
169
64 Karhunen-love coefficients (KAR), 240 pixel averages in 2×3 windows (PIX), 74 Zernike moment
170
(ZER) and 6 morphological (MOR) features.
171
Caltech101-7 [46] contains 8677 object recognition images belonging to 101 classes. We choose
172
7 categories and 6 features that are widely used, i,e., 48 dimension Gabor features, 40 dimension 11
Table 1: Information of the multi-view datasets
Dataset
Samples
Views
Clusters
Synthetic data 1
1000
3
2
Synthetic data 2
200
2
2
Synthetic data 3
300
2
3
handwritten
2000
6
10
Caltech101-7
1474
6
7
BBCsport
544
2
5
Yale
165
3
15
3sources
169
3
6
LandUse-21
2100
3
21
173
wavelet moments (WM), 254 dimension GENTRIST features, 1984 dimension HOG features, 512
174
dimension GIST features, and 928 dimension LBP features.
175
176
177
178
179
180
181
182
183
184
185
186
187
188
BBCSport [47] contains documents from the BBC Sport website corresponding to sports news in 5 topical areas, associated with 3183 dimension and 3203 dimension features respectively. Yale [48] consists of 165 gray scale face maps belonging to 15 categories. Each category has a different facial expression and form. 3Sources is a multi-view text dataset, collected from three well-known online news sources: BBC, Reuters and Guardian. LandUse-21 [49] consists of 2100 satellite images belonging to 21 classes. We use three types of features to clustering. Synthetic datasets are used to study the robustness of our proposed method to noises. The synthetic datasets are as follows: Synthetic data 1: this dataset composes of 3 views 1000 data points which is generated by a two-component Gaussian mixture model, given by [21]. Synthetic data 2: this dataset is a two-moon dataset which consists of 2 views 200 data points belonging to 2 clusters, obtained by adding 0.12 and 0.14 percentage of noise.
189
Synthetic data 3: there are 300 data points belonging to three clusters in this dataset. Two
190
views add 0.14 and 0.16 percentage of noise respectively. The Synthetic data 2 and 3 are obtained
12
191
in [50].
192
4.2. Compared Methods
193
194
195
196
We compare our method with 2 single view clustering methods and 7 multi-view clustering methods to show the superiority of our method. K-means and spectral Clustering (SC) are classical clustering method, which are widely used due to their simplicity and efficiency.
197
Diversity-induced multi-view subspace clustering (DiMSC) [29] uses Hilbert-Schmidt
198
independence criterion (HSIC) [30] as a diversity item to capture the complementary information
199
of multi-view representations.
200
Parameter-free auto-weighted multiple graph learning (AMGL) [51] proposes a frame-
201
work for multi-view clustering and semi-supervised tasks via the reformulation of the standard
202
spectral clustering model.
203
Exclusivity-consistency regularized multi-view subspace clustering (ECMSC) [31]
204
captures the complementary information of multi-view representation by introducing a position-
205
aware exclusivity term.
206
207
Self-weighted multiview clustering with multiple graphs (SwMC) [32] learns a unified similarity graph by introducing a self-weighted parameter.
208
Adaptive structure concept factorization for multiview clustering (MVCF) [52] pro-
209
poses a concept factorization-based multi-view clustering which jointly optimize the graph matrix
210
to make full use the correlated information between multiple views.
211
Graph learning for multiview clustering (MVGL) [50] proposes a graph learning-based
212
method without a predefined graph, and its initial graphs are learned from the multiple data points.
213
Multiview clustering via adaptively weighted procrustes (AWP) [28] uses a self-weighted
214
procustes to recovery discrete label from the spectral embedding.
215
4.3. Experiment Setting
216
To ensure generality, the parameters in all the comparison methods follow the setting in their
217
paper. In addition, for the method that needs to use K-means clustering to obtain the final
218
clustering indicators, we run K-means clustering 20 times to weaken the influence of clustering
219
centroid random initialization, and report the mean performance. For our method, we search the
13
220
number of adaptive neighbors from 5 to 50 in constructing a similarity graph, with searching step
221
equalling 5, and tune the parameters λ1 , λ2 from {0.001, 0.01, 0.1, 1, 10, 100, 1000} for all datasets.
222
For the evaluation metrics, we use normalized mutual information (NMI), accuracy (ACC),
223
adjusted rand index (AR), F-score, Precision, Recall and Purity to comprehensively evaluate
224
the performance. These evaluation metrics are most used in much of previous works [53, 54, 55,
225
56, 57, 58]. Also, a higher value of these metrics indicates better performance.
14
Table 2: Clustering results on synthetic datasets
Dataset
Synthetic Data 1
Synthetic Data 2
Synthetic Data 3
Method
ACC
NMI
F-score
Precision
Recall
AR
Purity
K-means
0.9880
0.9121
0.9767
0.9794
0.9740
0.9525
0.9880
SC
0.5760
0.0025
0.6759
0.5109
0.9983
0.0007
0.5760
DiMSC
0.7990
0.2671
0.6895
0.6804
0.6996
0.3563
0.7990
AMGL
0.9940
0.9468
0.9883
0.9889
0.9877
0.9761
0.9940
SwMC
0.9980
0.9809
0.9961
0.9967
0.9955
0.9920
0.9980
ECMSC
0.9890
0.9114
0.9787
0.9784
0.9790
0.9564
0.9890
AWP
0.9970
0.9732
0.9941
0.9950
0.9933
0.9880
0.9970
MVCF
0.5760
0.0025
0.6759
0.5109
0.9983
0.0007
0.5760
MVGL
0.9370
0.7116
0.8831
0.8927
0.8737
0.7636
0.9370
DCMSC
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
K-means
0.3345
0.0000
0.3267
0.3266
0.3267
0.0000
0.3345
SC
0.3367
0.0129
0.4958
0.3311
0.9867
0.0000
0.3400
DiMSC
0.4180
0.0244
0.3514
0.3441
0.3636
0.0201
0.4198
AMGL
0.3500
0.0013
0.3280
0.3275
0.3284
0.0000
0.3533
SwMC
0.5607
0.5097
0.5900
0.4811
0.7648
0.3049
0.6367
ECMSC
0.3467
0.0006
0.3278
0.3271
0.3286
0.0000
0.3467
AWP
0.3567
0.0016
0.4975
0.3311
1.0000
0.0000
0.3567
MVCF
0.3367
0.0129
0.4958
0.3311
0.9867
0.0000
0.3400
MVGL
0.3333
0.0000
0.3317
0.3268
0.3368
0.0000
0.3333
DCMSC
0.5233
0.1183
0.4133
0.4104
0.4949
0.1198
0.5233
K-means
0.7700
0.2234
0.6429
0.6417
0.6440
0.2880
0.7700
SC
0.5050
0.0096
0.6622
0.4975
0.9900
0.0000
0.5050
DiMSC
0.7200
0.1449
0.5931
0.5927
0.5935
0.1895
0.7200
AMGL
0.8750
0.5616
0.7858
0.7625
0.8106
0.5604
0.8750
SwMC
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
ECMSC
0.5150
0.0006
0.4956
0.4954
0.4963
0.0000
0.5150
AWP
0.8850
0.5837
0.7997
0.7795
1.0000
0.5910
0.8850
MVCF
0.5050
0.0096
0.6622
0.4975
0.9900
0.0000
0.5050
MVGL
0.5000
0.0000
0.6644
0.4975
1.0000
0.0000
0.5000
DCMSC
0.9200
0.670515 0.8532
0.8425
0.8642
0.7042
0.9200
Table 3: Clustering results on real world datasets
Dataset
handwritten
Yale
Caltech101-7
bbcsport
Method
ACC
NMI
F-score
Precision
Recall
AR
Purity
K-means
0.8364
0.7761
0.7439
0.7376
0.7504
0.7153
0.8402
SC
0.9400
0.8990
0.8922
0.8882
0.9029
0.8802
0.9404
DiMSC
0.7232
0.6319
0.5972
0.5937
0.6007
0.5523
0.7232
AMGL
0.8674
0.8872
0.8469
0.8300
0.8658
0.8295
0.8734
SwMC
0.8022
0.8859
0.8087
0.7327
0.9074
0.7846
0.8183
ECMSC
0.8580
0.8476
0.8146
0.8103
0.8188
0.7939
0.8580
AWP
0.9615
0.9161
0.9248
0.9242
0.9353
0.9165
0.9615
MVCF
0.8753
0.7878
0.7730
0.7690
0.7770
0.7477
0.8753
MVGL
0.9195
0.8659
0.8315
0.8113
0.8527
0.8124
0.9195
DCMSC
0.9720
0.9381
0.9451
0.9448
0.9990
0.9390
0.9720
K-means
0.5642
0.6425
0.4381
0.3944
0.4954
0.3970
0.5755
SC
0.6515
0.7039
0.5125
0.4659
0.5716
0.4773
0.6688
DiMSC
0.7088
0.7398
0.5713
0.5416
0.6058
0.5418
0.7155
AMGL
0.5712
0.6306
0.3845
0.3288
0.4648
0.3370
0.5803
SwMC
0.6370
0.6721
0.4348
0.3768
0.5158
0.3919
0.6430
ECMSC
0.6606
0.6886
0.4555
0.4030
0.5236
0.4152
0.6667
AWP
0.6424
0.7101
0.5378
0.4964
0.5867
0.5051
0.6485
MVCF
0.5148
0.5683
0.3589
0.3298
0.3944
0.3132
0.5279
MVGL
0.6182
0.6263
0.3919
0.3196
0.5067
0.3428
0.6182
DCMSC
0.7576
0.7473
0.5896
0.5642
0.9297
0.5616
0.7576
K-means
0.4797
0.5563
0.5139
0.8354
0.3738
0.3599
0.8712
SC
0.7840
0.7008
0.8017
0.8835
0.7338
0.6950
0.8931
DiMSC
0.5030
0.5074
0.5171
0.8527
0.3711
0.3698
0.8298
AMGL
0.6650
0.5354
0.6193
0.6449
0.5962
0.3958
0.8479
SwMC
0.6466
0.4731
0.6160
0.5953
0.6562
0.3391
0.7905
ECMSC
0.6716
0.5496
0.6539
0.8043
0.5508
0.4959
0.8229
AWP
0.7062
0.6910
0.7497
0.9035
0.6407
0.6320
0.8915
MVCF
0.3703
0.3256
0.3955
0.6453
0.2852
0.2085
0.7837
MVGL
0.6900
0.5505
0.6417
0.6609
0.6236
0.4275
0.8555
DCMSC
0.8372
0.7134
0.8291
0.9566
1.0000
0.7175
0.9281
K-means
0.8420
0.7311
0.7802
0.7516
0.8141
0.7070
0.8523
SC
0.9081
0.7948
0.8519
0.7974
0.9144
0.8012
0.9081
DiMSC
0.8828
0.7454
0.8425
0.8576
0.8615
0.7917
0.8828
AMGL
0.9724
0.9044
0.9507
0.9388
0.9630
0.9350
0.9724
SwMC
0.6502
0.6211
0.6108
0.4570
0.9398
0.4220
0.6712
ECMSC
0.3879
0.0728
0.2480
0.7967
0.0226
0.3971
AWP
0.9779
0.9266
0.3782 16 0.9574
0.9675
0.9798
0.9443
0.9779
MVCF
0.8640
0.7336
0.8123
0.7732
0.8555
0.7495
0.8640
MVGL
0.7059
0.7279
0.6868
0.5413
0.9395
0.5509
0.7426
DCMSC
0.9853
0.9502
0.9694
0.9756
0.9837
0.9598
0.9853
K-means
0.5775
0.4991
0.5308
0.5727
0.5020
0.4027
0.7080
SC
0.6133
0.5714
0.5777
0.5420
0.6243
0.4369
0.7290
226
4.4. Clustering Results
227
4.4.1. Results on Synthetic Data
228
The clustering results of different methods on three synthetic datasets are given in Table 2.
229
The maximum value of each evaluation is indicated in bold. As can be seen from the Table 2, our
230
method gets better and more accurate results than other methods. This indicates our method is
231
robust to noises.
232
4.4.2. Results on Real Data
233
234
Table 3 gives the clustering results of different methods on six real world datasets. From the comparison results, we get the following observations:
235
(1) In general, our method outperforms all methods of comparison. For example, the results
236
of our method in the Yale dataset outperform the second performer (DiMSC) about 5, 1 and 4
237
percentage in terms of ACC, NMI and Purity, respectively. These previous methods only capture
238
diversities in data space, which would obtain a suboptimal label matrix duo to noises and diversities
239
in it. In addition, some previous methods conduct clustering on original data. The redundancy
240
information would decrease the clustering performance a lot.
241
(2) Overall, the clustering results of multi-view clustering methods outperform those of single
242
view methods. This intuitively demonstrates the superiority of multiple views. In multi-view case,
243
the underlying complementary knowledge benefits clustering performance.
244
(3) AWP uses a self-weighted procustes to recovery discrete label from the spectral embedding.
245
However, assigning view weight only guarantees capturing diversity in data space. The noises and
246
diversity in label space are not taken into consideration in AWP. Thus, our method can achieve
247
better result than AWP.
248
249
4.5. Ablation Study of The Proposed Model
250
In this subsection, we present some ablation study of our proposed model. Specifically, the
251
results without capturing diversity and consistency in data space, the results without capturing
252
diversity and consistency in label space, the results without using spectral embedding have been
253
analyzed.
254
Effectiveness of capturing diversity and consistency in data space: In order to validate
255
the effectiveness of capturing diversity and consistency in data space, we use Eq.(10) without 17
Table 4: Ablation study of the proposed model
Dataset
handwritten
Yale
Caltech101-7
bbcsport
3sources
LandUse-21
Method
ACC
NMI
F-score
Precision
Recall
AR
Purity
DCMSC-no-SE
0.9090
0.8374
0.8316
0.8261
1.0000
0.8129
0.9090
DCMSC-no-DCDS
0.9735
0.9407
0.9479
0.9475
1.0000
0.9421
0.9735
DCMSC-no-DCLS
0.9640
0.9213
0.9296
0.9288
0.9304
0.9218
0.9640
DCMSC
0.9720
0.9381
0.9451
0.9448
0.9990
0.9390
0.9720
DCMSC-no-SE
0.5333
0.6000
0.3984
0.3784
0.7285
0.3571
0.5455
DCMSC-no-DCDS
0.7212
0.7380
0.5802
0.5554
0.9515
0.5516
0.7212
DCMSC-no-DCLS
0.6242
0.6877
0.4903
0.4418
1.0000
0.4532
0.6424
DCMSC
0.7576
0.7473
0.5896
0.5642
0.9297
0.5616
0.7576
DCMSC-no-SE
0.8331
0.7044
0.8545
0.9340
0.9980
0.7705
0.9009
DCMSC-no-DCDS
0.8358
0.7094
0.8208
0.9563
0.9981
0.7037
0.9261
DCMSC-no-DCLS
0.5414
0.6241
0.6158
0.8892
1.0000
0.4754
0.8820
DCMSC
0.8372
0.7134
0.8291
0.9566
1.0000
0.7175
0.9281
DCMSC-no-SE
0.7574
0.6201
0.7398
0.7383
0.7413
0.6580
0.8033
DCMSC-no-DCDS
0.9816
0.9373
0.9607
0.9652
1.0000
0.9485
0.9816
DCMSC-no-DCLS
0.9835
0.9436
0.9668
0.9739
0.9598
0.9565
0.9835
DCMSC
0.9853
0.9502
0.9694
0.9756
0.9837
0.9598
0.9853
DCMSC-no-SE
0.5266
0.3919
0.4802
0.5541
0.4238
0.3492
0.6213
DCMSC-no-DCDS
0.8462
0.7587
0.7859
0.8476
0.9967
0.7261
0.8698
DCMSC-no-DCLS
0.7396
0.7157
0.7063
0.7957
0.6350
0.6300
0.8166
DCMSC
0.8402
0.7574
0.7809
0.8456
0.9939
0.7197
0.8698
DCMSC-no-SE
0.2671
0.3235
0.1697
0.1525
1.0000
0.1238
0.3052
DCMSC-no-DCDS
0.3067
0.3571
0.1949
0.1895
0.9896
0.1538
0.3286
DCMSC-no-DCLS
0.2990
0.3557
0.1948
0.1853
1.0000
0.1528
0.3133
DCMSC
0.3081
0.3634
0.1936
0.1851
0.9425
0.1517
0.3276
18
256
parameters α. We represent this method as “DCMSC-no-DCDS”, and the corresponding results
257
on the six datasets are shown in Table 4. From the results on handwritten and 3sources, DCMSC-
258
no-DCDS achieves better performance than DCMSC. The reason may be that the self-weighted
259
strategy sometimes would not capture well diversity and consistency in data space. Besides, The
260
diverse part R may disturb the effectiveness of the self-weighted strategy. From the results on the
261
leftover datasets, capturing diversity and consistency in data space can improve the final results.
262
Effectiveness of capturing diversity and consistency in label space: In order to validate
263
the effectiveness of capturing diversity and consistency in label space, we remove R from Eq.(10)
264
and represent the method as “DCMSC-no-DCLS”. The results on the six datasets obtained by
265
“DCMSC-no-DCLS” are shown in Table 4. As can be seen from the seven metrics on six datasets,
266
capturing diversity and consistency in data space can distinctly improve the final results, which
267
demonstrates that capturing diversity and consistency in data space can learn more robust label
268
matrix for multi-view clustering task.
269
Effectiveness of using spectral embedding: In our proposed model as formulated by
270
Eq.(10), using low dimensional spectral embedded features aims to press the noises and redun-
271
dancy information. In order to validate the effectiveness of using spectral embedding, we replace
272
spectral embedding F with original data X from Eq.(6) and represent the method as “DCMSC-no-
273
SE”. The results on the six datasets obtained by “DCMSC-no-SE” are shown in Table 4. As can
274
be seen from the seven metrics on six datasets, the low dimensional spectral embedded features can
275
distinctly improve the final results, which demonstrates that the spectral embedding can press the
276
noises and redundancy information for multi-view clustering task.
277
4.6. Parameter Sensitivity Analysis and Convergence Study
278
4.6.1. Parameter Sensitivity Analysis
279
there are three parameters λ1 , λ2 and the number of adaptive neighbors in our method. We
280
show the performances of proposed method in six real world datasets with different combinations
281
of λ1 and λ2 values in terms of ACC, NMI and Purity, respectively, given in Figure 1 - Figure 3.
282
In addition, the performances of proposed method in six real world datasets with different number
283
of adaptive neighbors are shown in Figure 4. From these figures, we get following observations:
284
(1) In particular, our method is sensitive to the parameter λ2 , but less sensitive to the parameter
285
λ1 on the six real world datasets. As can be seen, when λ2 changes from 1 to 1000, our method
19
286
can achieve stable and relatively feasible results.
287
(2) The Caltech101-7, Yale and 3sources are quite sensitive to the number of adaptive neighbors
288
in our method. In contrast, others can achieve high performance in a large range of the numbers
289
of adaptive neighbors.
290
4.6.2. Convergence Study
291
As we discussed in previous section, our method converges theoretically. In this subsection, we
292
conduct an experimental study on the convergence of our method on six real world datasets. The
293
convergence curves given in Figure 5 indicate that the objective function values decrease very fast
294
to steadily values within about 5 iterations on these datasets. Therefore, our method converge
295
stably.
(a) handwritten
(b) Yale
(c) Caltech101-7
(d) BBCsport
(e) 3sources
(f) LandUse-21
Figure 1: Performance of proposed method in six real world datasets with different combinations of λ1 and λ2 values in terms of ACC.
20
(a) handwritten
(b) Yale
(c) Caltech101-7
(d) BBCsport
(e) 3sources
(f) LandUse-21
Figure 2: Performance of proposed method in six real world datasets with different combinations of λ1 and λ2 values in terms of NMI.
21
(a) handwritten
(b) Yale
(c) Caltech101-7
(d) BBCsport
(e) 3sources
(f) LandUse-21
Figure 3: Performance of proposed method in six real world datasets with different combinations of λ1 and λ2 values in terms of Purity.
1
1
0.8
0.8
0.6
0.6
0.6
NMI
ACC
0.4
0.4
handwritten Yale Caltech101-7
0.2 0
Purity
1 0.8
5
10
BBCsport 3sources LandUse-21
15 20 25 30 35 40 45 the number of adaptive neighbors
(a) ACC
handwritten Yale Caltech101-7
0.2
50
0
5
10
BBCsport 3sources LandUse-21
15 20 25 30 35 40 45 the number of adaptive neighbors
(b) NMI
0.4 handwritten Yale Caltech101-7
0.2
50
0
5
10
BBCsport 3sources LandUse-21
15 20 25 30 35 40 45 the number of adaptive neighbors
50
(c) Purity
Figure 4: Performance of proposed method with different neighbor values in terms of ACC,NMI and Purity, respectively.
22
2.5
109
2.5
106
8
108
7 2
2
1
objective function values
objective function values
objective function values
6
1.5
1.5
1
5 4 3 2
0.5
0.5
0
0
1
5
10
15
20 25 30 35 the number of iterations
40
45
50
5
10
15
(a) handwritten 108
18
0
50
5
10
15
20 25 30 35 the number of iterations
40
45
50
(c) Caltech101-7
107
15
108
16 14 objective function values
6 objective function values
45
(b) Yale
7
5 4 3 2
12 10 8 6
10
5
4
1 0
40
objective function values
8
20 25 30 35 the number of iterations
2
5
10
15
20 25 30 35 the number of iterations
(d) BBCsport
40
45
50
0
5
10
15
20 25 30 35 the number of iterations
(e) 3sources
40
45
50
0
5
10
15
20 25 30 35 the number of iterations
40
45
50
(f) LandUse-21
Figure 5: The convergence curve of proposed method on handwritten, Yale, Caltech101-7, BBCsport, 3sources, LandUse-21, respectively.
23
296
5. Conclusions
297
In this paper, we propose a novel diversity and consistency learning guided spectral embedding
298
for multi-view clustering. Compared with previous method, we learn an optimal label matrix by
299
capturing the diversity and consistency in the data space and learned label space at the same time.
300
We employ a self-weight strategy to weight each view in data space. In the learned label space,
301
we relax the common clustering label into a consistent part and a diverse part. Moreover, using a
302
row-aware diversity representation and l2,1 -norm can capture the diversities lying in diverse part
303
of label well, so as to make better use of the multiple complementary information. Furthermore,
304
we conduct clustering on low dimensional embedded features instead of original data to suppress
305
noises and redundancy information. Using an augmented Lagrangian multiplier with alternating
306
direction minimization based algorithm can find the optimal solution quickly. Additionally, exten-
307
sive experiments on synthetic datasets and real world datasets demonstrate the superiority of our
308
method.
309
6. Acknowledgments
310
This work was supported in part by the Fundamental Research Funds for the Central Uni-
311
versities, China University of Geosciences (Wuhan) under Grant No.CUG170654, in part by the
312
National Natural Science Foundation of China under Grant No.61701451, 61773392 and 61801414,
313
and in part by Shandong Province Natural Science Foundation under Grants No. ZR2017QF006.
314
References
315
[1] R. J. Schalkoff, Pattern recognition, Wiley Encyclopedia of Computer Science and Engineering.
316
[2] A. Ben-Dor, R. Shamir, Z. Yakhini, Clustering gene expression patterns, Journal of Computa-
317
tional Biology 6 (3-4) (1999) 281–297.
318
[3] C. Tang, X. Liu, P. Wang, C. Zhang, M. Li, L. Wang, Adaptive hypergraph embedded semi-
319
supervised multi-label image annotation, IEEE Transactions on Multimedia (2019) 1–1doi:
320
10.1109/TMM.2019.2909860.
321
322
[4] X. Zhu, C. Tang, P. Wang, H. Xu, M. Wang, J. Tian, Saliency detection via affinity graph learning and weighted manifold ranking, Neurocomputing 312 (2018) 239–250. 24
323
324
[5] L. Lu, R. Vidal, Combined central and subspace clustering for computer vision applications, in: International Conference on Machine Learning, ACM, 2006, pp. 593–600.
325
[6] C. Tang, X. Zhu, X. Liu, L. Wang, Z. Albert, Defusionnet: Defocus blur detection via recur-
326
rently fusing and refining multi-scale deep features, in: IEEE Computer Society Conference on
327
Computer Vision and Pattern Recognition, 2019, pp. 2700–2709.
328
[7] A. McGregor, M. Hall, P. Lorier, J. Brunskill, Flow clustering using machine learning tech-
329
niques, in: International workshop on passive and active network measurement, Springer, 2004,
330
pp. 205–214.
331
332
[8] H. T. Nguyen, A. Smeulders, Active learning using pre-clustering, in: International Conference on Machine Learning, ACM, 2004, p. 79.
333
[9] C. Tang, X. Liu, X. Zhu, J. Xiong, M. Li, J. Xia, X. Wang, L. Wang, Feature selective
334
projection with low-rank embedding and dual laplacian regularization, IEEE Transactions on
335
Knowledge and Data Engineering (2019) 1–1doi:10.1109/TKDE.2019.2911946.
336
337
338
339
[10] J. A. Hartigan, M. A. Wong, Algorithm as 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics) 28 (1) (1979) 100–108. [11] U. V. Luxburg, A tutorial on spectral clustering, Statistics and Computing 17 (4) (2007) 395–416.
340
[12] X. Chen, J. Zhexue Haung, F. Nie, R. Chen, Q. Wu, A self-balanced min-cut algorithm for
341
image clustering, in: IEEE International Conference on Computer Vision, 2017, pp. 2061–2069.
342
[13] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, A. Y. Wu, An
343
efficient k-means clustering algorithm: Analysis and implementation, IEEE Transactions on
344
Pattern Analysis and Machine Intelligence 24 (7) (2002) 881–892.
345
346
347
348
[14] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110. [15] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, IEEE, 2005, pp. 886–893.
25
349
[16] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture
350
classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine
351
Intelligence 24 (7) (2002) 971–987.
352
353
354
355
[17] A. Oliva, A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision 42 (3) (2001) 145–175. [18] S. Bickel, T. Scheffer, Multi-view clustering, in: IEEE International Conference on Data Mining, 2004, pp. 19–26.
356
[19] K. Chaudhuri, S. M. Kakade, K. Livescu, K. Sridharan, Multi-view clustering via canonical
357
correlation analysis, in: Annual International Conference on Machine Learning, ACM, 2009,
358
pp. 129–136.
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
[20] J. Liu, C. Wang, J. Gao, J. Han, Multi-view clustering via joint nonnegative matrix factorization, in: SIAM International Conference on Data Mining, SIAM, 2013, pp. 252–260. [21] A. Kumar, P. Rai, H. Daum, Co-regularized multi-view spectral clustering, in: International Conference on Neural Information Processing Systems, 2011, pp. 1413–1421. [22] Y. Furukawa, B. Curless, S. M. Seitz, R. Szeliski, Towards internet-scale multi-view stereo, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2010, pp. 1434–1441. [23] Q. Yin, S. Wu, R. He, L. Wang, Multi-view clustering via pairwise sparse subspace representation, Neurocomputing 156 (2015) 12–21. [24] C. Xiao, F. Nie, H. Huang, Multi-view k-means clustering on big data, in: International Joint Conference on Artificial Intelligence, 2013, pp. 2598–2604. [25] Y.-M. Xu, C.-D. Wang, J.-H. Lai, Weighted multi-view clustering with feature selection, Pattern Recognition 53 (2016) 25–35. [26] C.-D. Wang, J.-H. Lai, S. Y. Philip, Multi-view clustering based on belief propagation, IEEE Transactions on Knowledge and Data Engineering 28 (4) (2015) 1007–1021. [27] J. Xu, J. Han, F. Nie, X. Li, Re-weighted discriminatively embedded k-means for multi-view clustering, IEEE Transactions on Image Processing 26 (6) (2017) 3016–3027.
26
375
376
377
378
[28] F. Nie, L. Tian, X. Li, Multiview clustering via adaptively weighted procrustes, in: International Conference on Knowledge Discovery and Data Mining, ACM, 2018, pp. 2022–2030. [29] X. Cao, C. Zhang, H. Fu, S. Liu, H. Zhang, Diversity-induced multi-view subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 586–594.
379
[30] A. Gretton, O. Bousquet, A. Smola, B. Schlkopf, Measuring statistical dependence with hilbert-
380
schmidt norms, in: International Conference on Algorithmic Learning Theory, 2005, pp. 63–77.
381
[31] X. Wang, X. Guo, Z. Lei, C. Zhang, S. Z. Li, Exclusivity-consistency regularized multi-view
382
subspace clustering, in: IEEE Conference on Computer Vision and Pattern Recognition, 2017,
383
pp. 1–9.
384
385
[32] F. Nie, J. Li, X. Li, Self-weighted multiview clustering with multiple graphs, in: International Joint Conference on Artificial Intelligence, 2017, pp. 2564–2570.
386
[33] C. Tang, X. Zhu, X. Liu, M. Li, P. Wang, C. Zhang, L. Wang, Learning a joint affinity graph
387
for multiview subspace clustering, IEEE Transactions on Multimedia 21 (7) (2019) 1724–1736.
388
[34] L. Huang, H.-Y. Chao, C.-D. Wang, Multi-view intact space clustering, Pattern Recognition
389
390
391
392
393
86 (2019) 344–353. [35] K. Zhan, F. Nie, J. Wang, Y. Yang, Multiview consensus graph clustering, IEEE Transactions on Image Processing 28 (3) (2018) 1261–1270. [36] C. Zhang, H. Fu, Q. Hu, X. Cao, Y. Xie, D. Tao, D. Xu, Generalized latent multi-view subspace clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence.
394
[37] C. A. R. D. Sousa, An overview on the gaussian fields and harmonic functions method for
395
semi-supervised learning, in: International Joint Conference on Neural Networks, 2015, pp.
396
1–8.
397
[38] R. K. C. Fan, Spectral graph theory, American Mathematical Society, 1997.
398
[39] F. Topsoe, Some inequalities for information divergence and related measures of discrimination,
399
IEEE Transactions on Information Theory 46 (4) (2000) 1602–1609.
27
400
[40] A. Gretton, K. M. Borgwardt, M. Rasch, B. Sch¨ olkopf, A. J. Smola, A kernel method for
401
the two-sample-problem, in: Advances in Neural Information Processing Systems, 2007, pp.
402
513–520.
403
404
[41] J. Tang, X. Hu, H. Gao, H. Liu, Exploiting local and global social context for recommendation, in: International Joint Conference on Artificial Intelligence, 2013, pp. 2712–2718.
405
[42] Z. Lin, R. Liu, Z. Su, Linearized alternating direction method with adaptive penalty for low-
406
rank representation, in: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, K. Q. Wein-
407
berger (Eds.), Advances in Neural Information Processing Systems 24, Curran Associates, Inc.,
408
2011, pp. 612–620.
409
[43] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-
410
rank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1)
411
(2013) 171–184.
412
413
414
415
416
417
418
419
[44] Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices, arXiv preprint arXiv:1009.5055. [45] M.Lichman, Uci machine learning repository - census+income dataset, URL http://archive. ics. uci. edu/ml/datasets/Census+ Income. [46] D. Dueck, B. J. Frey, Non-metric affinity propagation for unsupervised image categorization, in: IEEE International Conference on Computer Vision, 2007, pp. 1–8. [47] R. Xia, Y. Pan, L. Du, J. Yin, Robust multi-view spectral clustering via low-rank and sparse decomposition, in: AAAI Conference on Artificial Intelligence, 2014, pp. 2149–2155.
420
[48] A. S. Georghiades, P. N. Belhumeur, D. J. Kriegman, From few to many: Generative models
421
for recognition under variable pose and illumination, in: IEEE International Conference on
422
Automatic Face and Gesture Recognition, 2000, p. 277.
423
[49] Y. Yang, S. Newsam, Bag-of-visual-words and spatial extensions for land-use classification, in:
424
Sigspatial International Conference on Advances in Geographic Information Systems, 2010,
425
pp. 270–279.
28
426
427
[50] K. Zhan, C. Zhang, J. Guan, J. Wang, Graph learning for multiview clustering, IEEE Transactions on Cybernetics 48 (10) (2018) 2887–2895.
428
[51] F. Nie, J. Li, X. Li, Parameter-free auto-weighted multiple graph learning: a framework for
429
multiview clustering and semi-supervised classification, in: International Joint Conference on
430
Artificial Intelligence, 2016, pp. 1881–1887.
431
432
[52] K. Zhan, J. Shi, J. Wang, H. Wang, Y. Xie, Adaptive structure concept factorization for multiview clustering, Neural Computation 30 (4) (2018) 1.
433
[53] C. Tang, X. Liu, M. Li, P. Wang, J. Chen, L. Wang, W. Li, Robust unsupervised feature
434
selection via dual self-representation and manifold regularization, Knowledge-Based Systems
435
145 (2018) 109–120.
436
437
438
439
440
441
[54] C. Tang, X. Zhu, J. Chen, P. Wang, X. Liu, J. Tian, Robust graph regularized unsupervised feature selection, Expert Systems with Applications 96 (2018) 64–76. [55] X. Cao, C. Zhang, C. Zhou, H. Fu, H. Foroosh, Constrained multi-view video face clustering, IEEE Transactions on Image Processing 24 (11) (2015) 4381–4393. [56] X. Liu, Y. Dou, J. Yin, L. Wang, E. Zhu, Multiple kernel k-means clustering with matrixinduced regularization, in: AAAI Conference on Artificial Intelligence, 2016, pp. 1888–1894.
442
[57] X. Liu, S. Zhou, Y. Wang, M. Li, Y. Dou, E. Zhu, J. Yin, Optimal neighborhood kernel
443
clustering with multiple kernels, in: AAAI Conference on Artificial Intelligence, 2017, pp.
444
2266–2272.
445
[58] C. Tang, X. Zhu, X. Liu, L. Wang, Cross-view local structure preserved diversity and consen-
446
sus learning for multi-view unsupervised feature selection, in: AAAI Conference on Artificial
447
Intelligence, 2019, pp. 595–604.
29
448
Zhenglai Li received the BE degree from China University of Geosciences, Wuhan, China, in 2018. Currently, he is a now pursuing the master degree in China University of Geosciences, Wuhan, China. His research interests include multi-view clustering.
Chang Tang received his Ph.D. degree from Tianjin University, Tianjin, China in 2016. He joined the AMRL Lab of the University of Wollongong between Sep. 2014 and Sep. 2015. He is now an associate professor at the School of Computer Science, China University of Geosciences, Wuhan, China. Dr. Tang has published 20+ peer-reviewed papers, including those in highly regarded journals and conferences such as IEEE T-HMS, IEEE SPL, ICCV, CVPR, ACMM, etc. He served on the Technical Program Committees of IJCAI 2018, ICME 2018, AAAI 2019, ICME 2019, IJCAI 2019 and CVPR 2019. His current research interests include machine learning and data mining.
Jiajia Chen received the MS degree from the school of Pharmacy, Nanjing Medical University in 2014. She works as a pharmacist at the Department of Pharmacy, Huai'an Second People's Hospital Affiliated to Xuzhou Medical College. Her research interests include the medical data analysis and medical image process.
449
Cheng Wan is a graduate student in China University of Geosciences, Wuhan. Her current research interests include machine learning, data mining and its applications.
Weiqing Yan received the Ph.D. degree in information and communication engineering from Tianjin University, Tianjin, China, in 2017. She was a visiting student at visual spatial perceived lab, University of California, Berkeley, CA, USA, from September 2015 to September 2016. She is currently a lecture with the School of Computer and Control Engineering, Yantai University, Yantai, Shandong Province, China. Her research interests include 3D image editing, computer graphic, and computer vision.
Xinwang Liu received his PhD degree from National University of Defense Technology (NUDT), China. He is now Assistant Researcher of School of Computer Science, NUDT. His current research interests include kernel learning and unsupervised feature learning. Dr. Liu has published 40+ peer-reviewed papers, including those in highly regarded journals and conferences such as IEEE T-IP, IEEE T-NNLS, ICCV, AAAI, IJCAI, etc. He served on the Technical Program Committees of IJCAI 2016/2017/2018 and AAAI 2016/2017/2018.
450
Conflict of Interest
451
None.
32