Low-rank Nonnegative Matrix Factorization on Stiefel Manifold
Journal Pre-proof
Low-rank Nonnegative Matrix Factorization on Stiefel Manifold Ping He, Xiaohua Xu, Jie Ding, Baichuan Fan PII: DOI: Reference:
S0020-0255(19)31112-0 https://doi.org/10.1016/j.ins.2019.12.004 INS 15047
To appear in:
Information Sciences
Received date: Revised date: Accepted date:
5 July 2019 28 November 2019 2 December 2019
Please cite this article as: Ping He, Xiaohua Xu, Jie Ding, Baichuan Fan, Low-rank Nonnegative Matrix Factorization on Stiefel Manifold, Information Sciences (2019), doi: https://doi.org/10.1016/j.ins.2019.12.004
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Inc.
Low-rank Nonnegative Matrix Factorization on Stiefel Manifold Ping Hea , Xiaohua Xua,∗, Jie Dingb , Baichuan Fana a
b
Department of Computer Science, Yangzhou University, Yangzhou 225009, China China Institute of FTZ Supply Chain, Shanghai Maritime University, Shanghai 201306, China
Abstract Low rank is an important but ill-posed problem in the development of nonnegative matrix factorization (NMF) algorithms because the essential information is often encoded in a low-rank intrinsic data matrix, whereas noise and outliers are contained in a residue matrix. Most existing NMF approaches achieve low rank by directly specifying the dimensions of the factor matrices. A few others impose the low rank constraint on the factor matrix and use the alternating direction method of multipliers to solve the optimization problem. In contrast to previous approaches, this paper proposes a novel method for low-rank nonnegative matrix factorization on a Stiefel manifold (LNMFS), which utilizes the low rank structure of intrinsic data and transforms it into a Frobenius norm of the latent factors. To obtain orthogonal factors as distinct patterns, we further impose orthogonality constraints by assuming that the basis matrix lies on a Stiefel manifold. In addition, to improve the robustness of the data in a manifold structure, we incorporate the ∗
Corresponding author: Department of Computer Science, Yangzhou University, Yangzhou 225009, e-mail:
[email protected]
Preprint submitted to Elsevier
December 3, 2019
graph smoothness of the coefficient matrix. Finally, we develop an efficient alternative iterative algorithm to solve the optimization problem and provide proof of its convergence. Extensive experiments on real-world datasets demonstrate the superiority of the proposed method compared with other representative NMF-based algorithms. Keywords: Nonnegative matrix factorization, Low rank, Orthogonal Constraints, Stiefel manifold, Graph smoothness
1
2
1. Introduction With the rapid development of data collection techniques, high-dimensional
3
data, such as online documents, electronic books, photographs, video series in
4
social medias, and health data, are becoming increasingly common. Accord-
5
ingly, generalized computational methods are required for extracting intrinsic
6
low-dimensional information. Although great success has been achieved by
7
various dimension-reduction methods [10], such as singular value decomposi-
8
tion (SVD) [21], principal component analysis [19], and independent compo-
9
nent analysis [9], few of them can provide a straightforward physical meaning
10
for the decomposition results.
11
In recent years, nonnegative matrix factorization (NMF) [42, 49] has at-
12
tracted increasing attention owing to its straightforward interpretability of
13
nonnegative results [22]. Technically, NMF seeks to identify a product of
14
two nonnegative matrices that provides a good approximation to the original
15
matrix. Therefore, the main difference of NMF from the other dimension re-
16
duction methods (e.g., SVD) is that NMF allows only non-subtractive combi-
17
nations of nonnegative components. This nonnegativity constraint eventually 2
18
leads to the parts-based representation of NMF.
19
Various methods have been proposed to extend the standard NMF by con-
20
sidering different aspects in additional constraints. One of the most widely
21
used constrained NMF methods is sparse NMF (SNMF) [14, 17, 31]. SNMF
22
introduces a weighted penalty term to enhance sparsity in the factor ma-
23
trices, leading to a local-based representation. In addition to sparsity con-
24
straints, another method for achieving sparsity is orthogonal NMF (ONMF)
25
[12, 23, 25]. ONMF removes the redundancy in data representation by im-
26
posing the orthogonality constraints on the factor matrices because under
27
the condition of non-negativity, orthogonality constraints necessarily result
28
in sparseness. As the orthogonal constraint set is a submanifold of the Stiefel
29
manifold, Choi et al. [8, 16] perform ONMF by applying the natural gra-
30
dient method in a Stiefel submanifold to achieve lower computational cost.
31
To promote exact orthogonality, Zhang et. al. [45] exploit the sparsity of
32
non-negative orthogonal solutions to divide the global problem into a series
33
of local optimizations. As semi-NMF becomes popular, Zhang et. al. [48]
34
remove the non-negativity constraints on the basis matrix and develop an ef-
35
ficient orthogonal semi-NMF algorithm over a Stiefel manifold. To maintain
36
the local geometrical structure in the low-dimensional data representation,
37
manifold NMF models [39, 7, 47, 44] incorporate the manifold structure into
38
the NMF framework. A representative manifold NMF model is graph reg-
39
ularized NMF (GNMF) [7], which penalizes the graph smoothness term in
40
the standard NMF model. In addition to these non-probabilistic NMF mod-
41
els, there are also probabilistic NMF approaches, which are based on, for
42
example, variational Bayesian inference [4] and Gibbs sampling. However, as
3
43
real-world data are often contaminated, different algorithms have been de-
44
veloped to address the problem of noise and outliers. Robust NMF [46][50]
45
(RNMF) assumes that the corrupted entries are sparse and enforces a spar-
46
sity constraint on the residual matrix. Based on RNMF, F´evotte et al. [15]
47
adopt a group-sparse outlier residual term to handle possible nonlinear ef-
48
fects (GRNMF). In recent years, low rank constraint has received growing
49
attention in the development of robust matrix decomposition methods [41]
50
because the essential data information is often encoded in the low-rank in-
51
trinsic data matrix, whereas unexpected noise and outliers are encoded in
52
a residue matrix. However, existing low-rank NMF models [28] impose the
53
low rank constraint on the outer product of the basis matrix rather than the
54
intrinsic data, which is less intuitive. Additionally, they adopt the alternat-
55
ing direction method of multipliers (ADMM) to solve the objective function,
56
which introduces higher computational cost into the optimization of NMF.
57
In this study, we propose a novel low-rank NMF learning method on the
58
Stiefel manifold (LNMFS) that imposes three additional constraints on the
59
standard NMF. First, LNMFS adds a low rank constraint on the intrinsic
60
data by penalizing its nuclear norm. To simplify the optimization, we trans-
61
form the nuclear norm of the intrinsic data matrix into a convex Frobenius
62
norm of the latent factors based on a well-known theorem [33]. Second, for
63
obtaining distinct patterns for easier interpretation, LNMFS assumes the
64
basis matrix lying on a Stiefel manifold, implying that different factors are
65
orthogonal to each other. Third, to improve the robustness for the data in a
66
manifold structure, LNMFS incorporates the graph smoothness constraint of
67
the coefficient matrix. The proposed method is formulated as an optimization
4
68
problem solved by an iterative multiplicative algorithm. We analyze the time
69
complexity of the proposed optimization algorithm and prove the theoretical
70
convergence of the optimization scheme. Extensive experimental results on
71
clustering problems demonstrate the effectiveness of the proposed approach
72
in comparison with nine representative NMF-based algorithms. The superior
73
performance of LNMFS on noisy datasets indicates its noise robustness. In
74
addition, the empirical convergence performance of LNMFS is demonstrated
75
by its rapid converge on the experimental datasets. Furthermore, we dis-
76
cuss the influence of the regularization parameters as well as the embedding
77
dimension on the clustering performance of LNMFS.
78
79
There are several aspects of the proposed approach that should be mentioned:
80
1. In real-world applications, the collected data are often contaminated by
81
noise and outliers. To reduce the interference of noise and discover the
82
underlying factors, LNMFS utilizes the low-rank structure of the intrin-
83
sic data matrix and transforms it into a Frobenius-norm regularization
84
of the latent factors.
85
2. LNMFS incorporates the low rank constraint with the graph smooth-
86
ness constraint on the Stiefel manifold to utilize the local geometry of
87
the data manifold and reduce the redundancy of the data representa-
88
tion.
89
3. LNMFS is formulated as an optimization problem with a well-defined
90
objective function. Two efficient multiplicative updating rules are pro-
91
vided for the optimization of the basis and coefficient matrices. Both
92
the theoretical proof and empirical analysis are provided to demon5
93
strate the convergence of the optimization scheme.
94
4. Experimental results demonstrate that LNMFS outperforms nine rep-
95
resentative NMF-based algorithms in clustering tasks on various real-
96
world datasets. The comparison on noisy datasets further demonstrates
97
the noise robustness of LNMFS.
98
The remainder of this paper is organized as follows. Section 2 provides a
99
brief introduction to the Stiefel manifold and representative NMF algorithms.
100
Section 3 elaborates the formulation of the proposed LNMFS framework and
101
provides an efficient algorithm for solving the optimization problem. Section
102
4 presents extensive experimental results by LNMFS in comparison with nine
103
representative NMF-based methods. Finally, Section 5 concludes this paper
104
and provides possible directions for future work.
105
2. Preliminaries
106
Herein, we provide preliminaries regarding optimization on the Stiefel
107
manifold. For clarity, Table 1 summarizes important notations used in this
108
paper.
109
2.1. Stiefel Manifold
110
Let S denote the set of orthonormal matrices, S , {U ∈ Rm×d : UT U = Id }
111
112
(1)
where Id ∈ Rd×d is the identity matrix. The set S is called (orthogonal) Stiefel manifold, whose dimension is dim(S) = md − 21 d(d + 1). The tangent 6
Table 1: Summary of notations
Notation
113
Description
X
observed data matrix
Y
intrinsic data matrix without noise in X
E
error matrix introduced by noise and outliers
U
basis matrix
V
coefficient matrix
L
Laplacian matrix of graph
W
similarity matrix
Q
objective function
xi
ith column vector in X
vi
ith column vector in V
vi
ith row vector in V
n
number of data points in X
m
dimension of data points in X
r
rank of Y
d
embedding dimension of matrix factorization
λ
weight of the graph smoothness constraint
β
weight of the low-rank constraint
space to S at the point U is obtained by differentiating UT U = I, yielding TU S = {Z ∈ Rm×d : UT Z + ZT U = 0}
7
(2)
114
where UT Z ∈ Rd×d is a skew-symmetric matrix. On the tangent space TU S
115
for any U ∈ S, we introduce the canonical inner product [13] as its metric, 1 hZ1 , Z2 ic = tr(ZT1 (I − UUT )Z2 ) 2
116
whereas the Euclidean inner product is hZ1 , Z2 ie = tr(ZT1 Z2 )
117
118
119
120
(3)
(4)
Then, we can view S as a sub-manifold of a Riemannian manifold. Let Q be an objective function defined on the Stiefel manifold. We use
e U Q to represent the gradient of Q at U in the Euclidean space ∇U Q and ∇ D E e U Q, Z ; and on the Stiefel manifold, respectively. Let h∇U Q, Zie = ∇ c
121
then, this is equivalent to
e U Q)T (I − 1 UUT )Z) tr((∇U Q)T Z) = tr((∇ 2
122
123
124
125
e U Q: By solving Eq. (5), we can obtain the representation of ∇
2.2. Orthogonal NMF
e U Q = ∇U Q − U(∇U Q)T U ∇
(5)
(6)
Given a nonnegative observed data matrix X = [x1 , x2 , · · · , xn ] ∈ Rm×n , +
th xi ∈ Rm observed data point, NMF seeks to factorize X + represents the i
126
n×d into two nonnegative matrices, that is, U ∈ Rm×d and V ∈ R+ (d +
127
min(m, n)). X ≈ UVT 8
(7)
128
Owing to the non-negativity constraints, NMF provides a parts-based data
129
representation, which is easy to interpret in many application scenarios. In
130
Eq. (7), U is regarded as the basis matrix, whereas V is regarded as the
131
coefficient matrix and is used for tasks such as clustering. Both U and V can
132
be obtained by minimizing a cost function that quantifies the approximation
133
error to X. The most commonly used cost function is the square of the
134
Euclidean distance.
2 min X − UVT F
(8)
U≥0,V≥0 135
We note that the cost function (8) is non-convex in both variables, but it is
136
convex in U or V only. Therefore, the objective function can be solved by
137
an iterative algorithm [22] with the following multiplicative rules: U←U
XV , UVT V
V ←U
XT U VUT U
(9)
138
where and the fraction bar denote elementwise matrix product and divi-
139
sion, respectively.
140
Based on standard NMF, Ding et al. [12] proposed ONMF models, which
141
impose orthogonality constraints on the basis and coefficient matrices. We
142
refer to the orthogonal NMF with orthogonality constraints on U as ONMF-
143
U, that is,
2 min X − UVT F
U≥0,V≥0 UT U=I
(10)
144
and to the orthogonal NMF with orthogonality constraints on V as ONMF-
145
V. There are two major approaches for solving the objective functions of
146
ONMF. One is proposed by Ding et al. and considers additive orthogonality
9
147
constraints based on the Lagrangian multipliers method. Another is devel-
148
oped by Choi [8] and directly uses the true gradient (natural gradient) on
149
the Stiefel manifold. Comparatively, the latter has lower computational com-
150
plexity [39] than Ding’s method. For ONMF-U on the Stiefel manifold, the
151
multiplicative update rules are as follows: U←U
152
XV XT U , V ← V UVT XT U VUT U
(11)
3. Proposed Method
153
In this section, we propose the novel LNMFS algorithm, which incorpo-
154
rates the low-rank constraint on the intrinsic data, the orthogonality con-
155
straint on the basis matrix, and the graph smoothness constraint on the
156
coefficient matrix into the standard NMF framework.
157
3.1. Objective Function
158
159
160
Without loss of generality, the observed data matrix (X ∈ Rm×n ) can be +
considered the sum of two parts: the intrinsic data matrix (Y ∈ Rm×n ) and + the error matrix E ∈ Rm×n .
161
We first assume that the intrinsic data lie on a low-dimensional manifold,
162
which implies that Y has a low rank. Low rank is an important but ill-posed
163
problem in the development of NMF-based algorithms because the essential
164
information is often encoded in the low-rank intrinsic data matrix, whereas
165
noise and outliers are contained in the error matrix. Formally, the initial
10
166
objective function is min 12 kX − Yk2F + λ kYk∗
(12)
s.t. Y = UVT , U ≥ 0, V ≥ 0 167
168
In Eq. (12), kYk∗ is the nuclear norm of Y that plays the role of regularizing the rank of Y. By replacing Y with UVT , we can rewrite Eq. (12) as min
U≥0,V≥0
169
1
X − UVT 2 + λ UVT F ∗ 2
According to the matrix H¨older inequality,
UVT ≤ kUk kVk F F ∗
170
(13)
(14)
As kUkF kVkF ≤ 21 (kUk2F + kVk2F ), we have
UVT ≤ 1 (kUk2 + kVk2 ) F F ∗ 2
(15)
171
Therefore, the nuclear-norm regularized objective function (Eq. (13)) is
172
equivalent to the following non-convex problem: min
U≥0,V≥0
1
X − UVT 2 + λ (kUk2 + kVk2 ) F F F 2 2
(16)
173
Secondly, in several real-world applications, non-overlapped bases are of-
174
ten preferable in parts-based interpretation (e.g., image segmentation). For
175
obtaining distinct parts-based patterns, we suppose that the basis matrix lies
11
176
on the Stiefel manifold, leading to min
U≥0,V≥0 UT U=I
1
X − UVT 2 + λ (kUk2 + kVk2 ) F F F 2 2
(17)
177
We note that the Frobenius norm of U is fixed under the orthogonality
178
constraint (kUk2F = tr(UT U) = d). Therefore, Eq. (17) can be written in a
179
more concise form as follows: min
U≥0,V≥0 UT U=I
1
X − UVT 2 + λ kVk2 F F 2 2
(18)
180
Finally, to preserve the local geometric structure of the underlying data
181
manifold, LNMFS also incorporates the graph smoothness regularization
182
term into the objective function. min
U≥0,V≥0 UT U=I
Q=
1
X − UVT 2 + λ kVk2 + β tr(VT LV) F F 2 2 2 1 2
P
(19)
2
183
In Eq. (19), tr(VT LV) =
184
regularizer, L = D − W is the Laplacian matrix, v i represents the coefficient
i,j
wij kv i − v j k is the graph smoothness
185
of the ith data point, W = [wij ]n×n is the pairwise similarity matrix, and
186
D = diag(W1) is the diagonal degree matrix. The parameters λ and β
187
control the weight of the low rank and graph smoothness regularization terms,
188
respectively.
12
189
190
191
3.2. Optimization We minimize the objective function Eq. (19) by first computing the derivative of Q with respect to U and V in the Euclidean space. ∇U Q = UVT V − XV
(20)
∇V Q = (VUT U + λV + βDV) − (XT U + βWV)
(21)
192
193
194
e U Q) As U lies on a Stiefel manifold, we compute the natural gradient of U (∇ on the Stiefel manifold.
e U Q = ∇U Q − U(∇U Q)T U ∇
= (UVT V − XV) − U(UVT V − XV)T U
(22)
= UVT XT U − XV 195
Using the KKT conditions and KKT complementary slackness, that is, e U Q = 0, U ∇
196
V ∇V Q = 0
(23)
we obtain the multiplicative update rules of U and V as follows: XVt Ut Vt T XT Ut
(24)
XT Ut + βWVt Vt Ut T Ut + λVt + βDVt
(25)
Ut+1 ← Ut 197
Vt+1 ← Vt
13
Algorithm 1 Algorithm of LNMFS Input: Nonnegative observed data matrix X Pairwise similarity matrix W Parameters λ and β Output: Nonnegative factor matrices U and V 1: t ← 0, randomly initialize U0 on Stiefel manifold and V0 2: Compute D ← diag(W1) 3: repeat XVt 4: Ut+1 ← Ut t t T T t UV X U Vt+1 ← Vt
5:
XT Ut + βWVt Vt Ut T Ut + λVt + βDVt
t←t+1 7: until the convergence criterion is satisfied 8: return Ut+1 , Vt+1 6:
198
Algorithm 1 summarizes the procedures of the proposed LNMFS algorithm.
199
Theorem 1 provides the convergence proof of the multiplicative updating
200
rules.
201
3.3. Convergence
202
To prove the convergence of the two updating rules in Eqs. (24) and
203
(25), we start from the definition of the auxiliary function G. Let QU (V)
204
represent the objective function Q(U, V) when U is fixed.
205
˜ is an auxiliary function to QU (V) if and only if the Definition 1. G(V, V)
206
207
208
˜ ≥ QU (V), G(V, V) = QU (V) are satisfied. conditions G(V, V)
˜ is an upper bound of QU (V) and is As the auxiliary function G(V, V)
˜ the optimization of QU (V) can be replaced by the iterative tight for V = V, 14
209
update V(t+1) = arg min G(V, V(t) ) V
210
(26)
because by combining Eq. (26) with Definition 1, we have
QU (V(t+1) ) ≤ G(V(t+1) , V(t) ) ≤ G(V(t) , V(t) )
(27)
= QU (V(t) ) 211
Lemma 1. Given a semi-definite similarity matrix W, ˜ T WV) ˜ − 2tr(VT WV) ˜ −tr(VT WV) ≤ tr(V
212
(28)
Proof. As W is semi-definite, ˜ T W(V − V)) ˜ 0 ≤ tr((V − V)
(29)
˜ T WV) ˜ − 2tr(VT WV) ˜ = tr(VT WV) + tr(V 213
we obtain ˜ T WV) ˜ − 2tr(VT WV) ˜ −tr(VT WV) ≤ tr(V
(30)
214
215
Theorem 1. The objective function in Eq. (19) is nonincreasing under the
216
updating rules in Eqs. (24) and (25).
15
217
Proof. For all k ∈ {1, 2, . . . , d}, we define uik v˜jk η˜ijk = P , uil v˜jl
(31)
l
218
which satisfies
P
k
η˜ijk = 1. QU (V) can be decomposed into two parts, QU (V) = EU (V) + RU (V)
219
where EU (V) =
220
RU (V) =
(32)
1
X − UVT 2 F 2
(33)
β λ kVk2F + tr(VT LV) 2 2
(34)
221
EU (V) evaluates the approximation error, whereas RU (V) summarizes the
222
regularization terms related to V. As EU (V) is convex, by using Jensen’s
223
inequality, we have
EU (V) =
1 2
=
1 2
P P (xij − uik vjk )2 i,j
k
P P u vjk 2 (xij − η˜ijk ηik ) ˜ijk i,j
k
PPP 1
≤
2
i
j
k
η˜ijk (xij −
(35)
uik vjk 2 ) η˜ijk
˜ , GE (V, V) 224
˜ = Therefore, GE (V, V)
225
tion to EU (V).
1 2
P P P i
j
k
η˜ijk (xij −
16
uik vjk 2 ) η˜ijk
is the auxiliary func-
226
Moreover, based on Lemma 1, whose element-wise formulation is −
227
X i,j,k
X
wij vik vjk ≤
X
wij v˜ik v˜jk − 2
i,j,k
wij v˜ik vjk
(36)
i,j,k
we have RU (V) =
λ 2
=
λ 2
≤
λ 2
P j,k
P j,k
P j,k
−β
2 vjk +
β 4
2 vjk +
β 2
2 + vjk
β 2
P
P
i,j,k
P j,k
P j,k
wij (vik − vjk )2
2 dj vjk −
β 2
2 + dj vjk
β 2
P
wij vik vjk
i,j,k
P
wij v˜ik v˜jk
(37)
i,j,k
wij v˜ik vjk
i,j,k
˜ , GR (V, V) 228
229
230
˜ = Hence, GR (V, V)
λ 2
P j,k
β 2
2 vjk +
P j,k
2 dj vjk +
β 2
P
i,j,k
the auxiliary function to RU (V).
wij v˜ik v˜jk −β
P
wij v˜ik vjk is
i,j,k
˜ as an auxiliary function to QU (V). We now define G(V, V) ˜ = GE (V, V) ˜ + GR (V, V) ˜ G(V, V) =
1 2
P
i,j,k
+
β 2
−β
η˜ijk (xij −
P j,k
2 dj vjk +
P
uik vjk 2 ) η˜ijk β 2
wij v˜ik vjk
i,j,k
≥ QU (V)
17
P
i,j,k
+
λ 2
P j,k
wij v˜ik v˜jk
2 vjk
(38)
231
232
We note that G(V, V) = QU (V) is trivially met. By taking the derivative
˜ with respect to vjk , we have of G(V, V)
˜ = P(P uil v˜jl )uik vjk − P xij uik + λvjk ∇vjk G(V, V) v˜jk i
i
l
+β
P j,k
=
dj vjk − β
v
−β
P
j,k
v
jk dj v˜jk v˜jk
wij v˜ik
i
h
i h i T V ˜ TU − X U jk VU ˜ V jk
˜ jk + λ[V]
h i
V ˜ V jk
h i ˜ − β WV
jk
h i h i V ˜ + β DV ˜ V jk
(40) jk
jk
˜ be equal to 0, we have the update rule of vjk : By letting ∇vjk G(V, V)
vjk = v˜jk h
236
P
Eq. (39) can be rewritten in the following matrix form. ˜ = ∇vjk G(V, V)
235
(39)
i
l
+β + λ˜ vjk v˜jk jk
234
wij v˜ik
i
PP P v ( uil v˜jl )uik v˜jk − xij uik jk i
233
P
h
˜ XT U + βWV
i
jk
˜ T U + λV ˜ + βDV ˜ VU
i
(41) jk
which is equivalent to the update rule of V in matrix form in Eq. (25). ˜ is an auxiliary function, QU (V) is nonincreasing under this As G(V, V) 18
237
238
update rule. The update rule for U can be obtained similarly by using the natural
239
gradient in the Stiefel manifold.
240
3.4. Time Complexity
241
The time complexity of the update rule for U (Eq. (24)) should be
242
analyzed in two parts: the numerator and the denominator. As n and m
243
represent the number and dimension of the data points in X, respectively,
244
the time complexity of computing the numerator (XV) is O(mnd), and the
245
complexity of the denominator (UVT XT U) is O(mnd). Therefore, if the
246
updates converge after τ iterations, the computational time cost of updating
247
U is O(mndτ ).
248
The time complexity of the update rule for V (Eq. (25)) is analyzed
249
similarly. The computation of the numerator (XT U+βWV) costs O(mnd)+
250
O(n2 d) = O(nd·max(n, m)). The computation of the denominator (VUT U+
251
λV + βDV) costs O(mnd) + O(nd) = O(mnd). Therefore, if the updates
252
converge after τ iterations, the time complexity of updating V is O(ndτ ·
253
max(n, m)).
254
To summarize, the overall time complexity of LNMFS is O(mndτ ) +
255
O(ndτ ·max(n, m)) = O(ndτ ·max(n, m)), where τ is the number of iterations
256
before convergence.
257
4. Experiments
258
In this section, we evaluate the performance of the proposed LNMFS
259
algorithm in comparison with nine widely used nonnegative matrix decom-
260
position methods on real-world datasets. First, we compare the clustering 19
Table 2: Summary of experimental datasets
Dataset
# Instances
# Dimensions
# Clusters
Yale
165
4096
15
Wine
178
13
3
Pendigits
477
16
3
Breast
683
9
2
COIL20
1440
1024
20
Optical digits
5620
16
10
TDT2
10212
36771
95
NC vs AD
120
90
2
NC vs MCI
120
90
2
NC vs MCI vs AD
180
90
3
261
performance and runtime performance of the ten NMF-based methods, in-
262
cluding the proposed LNMFS, on various clustering tasks. Subsequently, we
263
compare the clustering performance of the ten NMF-based algorithms on in-
264
creasingly noisy datasets. Then, we illustrate the empirical convergence of
265
LNMFS on the experimental datasets, and we discuss the effect of the regu-
266
larization parameters on the clustering performance of LNMFS. Finally, we
267
demonstrate the relationship of the embedding dimension and the clustering
268
performance of LNMFS in comparison with other NMF-based algorithms.
269
4.1. Experimental Settings
270
We evaluate the performance of LNMFS in comparison with nine repre-
271
sentative nonnegative matrix decomposition methods: standard NMF (NMF),
272
orthogonal NMF (ONMF-U and ONMF-V), SNMF, GNMF, RNMF, GRNMF,
273
probabilistic NMF (PNMF) and greedy orthogonal pivoting algorithm for 20
274
NMF (GOPA). Among the compared algorithms, ONMF-U imposes orthog-
275
onality constraints on the basis matrix to produce distinctive parts. ONMF-
276
V imposes orthogonality constraints on the coefficient matrix to produce
277
distinctive cluster indicating vectors. SNMF regularizes the sparsity of the
278
coefficient matrix to reduce redundancy. GNMF incorporates a manifold reg-
279
ularization term to maintain the local geometrical structure. RNMF enforces
280
a sparsity constraint on the residual matrix to improve model robustness.
281
GRNMF adopts a group-sparse outlier residual term to handle possible non-
282
linear effects. PNMF introduces variational Bayesian inference to achieve
283
deterministic convergence to the solution of NMF instead of relying on ran-
284
dom draws [4]. GOPA fully exploits the sparsity of non-negative orthogonal
285
solutions to promotes exact orthogonality [45].
286
287
288
289
290
291
We compare the ten NMF-based algorithms on the real-world experimental datasets summarized in Table 2. • Yale is a face dataset composed of 165 grayscale images from 15 individuals. • Wine is a chemical analysis dataset that quantifies 13 constituents found in three types of wines.
292
• Pendigits is a pen-based recognition dataset of handwritten digits from
293
UCI. We adopt a subset of the original dataset, which is composed of
294
3, 8 and 9 three digits.
295
• Breast dataset comes from Breast Cancer Wisconsin Dataset in UCI.
296
The missing data and the first attribute (ID number) of the dataset
297
are removed. 21
298
299
300
301
• COIL20 consists of 1440 32 × 32 grayscale images for 20 objects at different angles in a 360◦ rotation.
• Optical digits is an optical recognition dataset of handwritten digits from 43 people.
302
• TDT2 is a collection of text documents that include six months of
303
material drawn on a daily basis from six English news sources. The
304
documents appearing in two or more categories are removed.
305
Furthermore, we apply the compared algorithms to the Alzheimer’s disease
306
image dataset (Figure 1) from Alzheimer’s Disease Neuroimaging Initiative
307
(ADNI) database. Alzheimer’s disease (AD) [40, 18] is an irreversible, pro-
308
gressive neurological brain disorder that slowly destroys brain cells, causes
309
memory and thinking skill losses, and ultimately loss of the ability to carry
310
out simple tasks. In this experiment, we adopt 180 magnetic resonance imag-
311
ing (MRI) images, including 60 normal control (NC), 60 mild cognitive im-
312
pairment (MCI, a prodromal stage of AD) and 60 AD patients. From each
313
MRI scan, we extract 90 cortical thickness features because cortical thickness
314
is known to be associated with memory performance, cognitive decline and
315
progression to dementia [5], and it has been a widely used predictor for AD
316
diagnosis [29].
317
In the preprocessing of the Alzheimer’s disease image dataset, we first per-
318
form anterior commissure–posterior commissure correction and skull-striping.
319
Then, we segment the brain images into gray matter, white matter, and CSF
320
three parts, where the gray matter is further divided into 90 regions of interest
321
(ROIs) after registration with the automated anatomical labeling template 22
(a) Axial view
(b) Sagittal view
(c) Coronal View
Figure 1: Illustration of a sample MRI scan from the Alzheimer’s disease image dataset
322
[36]. Subsequently, we calculate the cortical thickness of each ROI between
323
the inner and outer surface as the features of MRI images [11]. The entire
324
process is performed in CAT12 (Computational Anatomy Toolbox). With
325
the extracted cortical thickness features, we perform two binary clustering
326
tasks (NC vs AD, NC vs MCI) and one three-way clustering task (AD vs
327
MCI vs NC) on the Alzheimer’s disease dataset (Table 2).
328
For implementation, we adopt the standard NMF with multiplicative up-
329
date strategy in Matlab1 [3]. For ONMF-U, ONMF-V, SNMF, RNMF and
330
PNMF, we use their implementations in the comprehensive package NM-
331
Flibrary2 . The codes of GNMF3 , GRNMF4 and GOPA5 are provided by
332
their authors. As both GNMF and LNMFS are graph-dependent, we pro-
333
vide them with the same pairwise similarity matrix (W) on each dataset. For
334
the text document datasets, we compute their normalized tf-idf features and 1
https://ww2.mathworks.cn/help/stats/nnmf.html https://github.com/hiroyuki-kasai/NMFLibrary 3 http://www.cad.zju.edu.cn/home/dengcai/Data/GNMF.html 4 https://www.irit.fr/˜Cedric.Fevotte/publications.html 5 https://github.com/kzhang980/ORNMF 2
23
335
use cosine function to calculate the document similarities. For the non-text
336
datasets, we construct 0/1 neighbor graphs, where wij = 1 if the j th data
337
point is one of the k nearest neighbors of the ith data point or vice versa,
338
and wij = 0 otherwise.
339
For each dataset, we follow the existing NMF models [45, 48, 46] and set
340
the embedding dimension of the matrix decomposition to be the same as the
341
number of clusters, so that each dimension of the latent feature space corre-
342
sponds to a distinct cluster. As LNMFS assumes that the basis matrix lies
343
on a Stiefel manifold, we initialize its U as a random orthogonal matrix. The
344
initialization strategy is as follows. First, we generate a random nonnegative
345
matrix in the range (0, 1). Then, we keep only the maximal value in each
346
row unchanged and set the other elements to 0. Finally, we perform normal-
347
ization to ensure that U0 U0 = I. The initial V of LNMFS is randomly
348
initialized in the range (0, 1). For the other NMF-based algorithms, their
349
initial U and V are randomly initialized following their default settings. For
350
each algorithm, the maximal number of iterations is set to 500, and the con-
351
vergence tolerance to 10−5 . The regularization parameters of each algorithm,
352
if any, are optimized over the range 10(−5:0.5:5) . For the graph-dependent al-
353
gorithms (i.e., GNMF and LNMFS), we construct a 3-NN graph on Yale and
354
TDT2 datasets, and a 10-NN graph for the other datasets using the (stan-
355
dardized) Euclidean distance. After the nonnegative matrix decomposition,
356
the k-means algorithm is applied to the coefficient matrix (V) to produce
357
the clustering results.
T
24
Table 3: Comparison of clustering performance in terms of Rand index Algorithm
358
Yale
Wine
Pendigits
Breast
COIL20
NMF
0.9010±0.0091
0.6499±0.0127
0.7215±0.0726
0.9082±0.0213
0.9495±0.0051
ONMF-U
0.8948±0.0109
0.6751±0.0248
0.6984±0.0814
0.9113±0.0089
0.9514±0.0073
ONMF-V
0.7514±0.0808
0.6111±0.0590
0.7130±0.0742
0.7309±0.0817
0.9201±0.0335
SNMF
0.8892±0.0090
0.7034±0.0431
0.8054±0.0058
0.8085±0.0461
0.9199±0.0075
GNMF
0.9165±0.0073
0.7045±0.0135
0.8985±0.0658
0.9404±0.0006
0.9664±0.0032
RNMF
0.9158±0.0062
0.6667±0.0206
0.8300±0.0450
0.9253±0.0050
0.9566±0.0031
GRNMF
0.9117±0.0062
0.6990±0.0207
0.8352±0.0298
0.9267±0.0000
0.9555±0.0024
PNMF
0.8957±0.0070
0.7050±0.0132
0.7854±0.0000
0.9146±0.0024
0.9448±0.0073
GOPA
0.8922±0.0307
0.6922±0.0468
0.7145±0.0607
0.9307±0.0027
0.9512±0.0045
LNMFS
0.9248±0.0027
0.8971±0.0028
0.9300±0.0028
0.9421±0.0013
0.9733±0.0017
Algorithm
Optical Digits
TDT2
NC vs AD
NC vs MCI
NC vs MCI vs AD
NMF
0.9208±0.0081
0.9102±0.0014
0.4975±0.0026
0.4987±0.0059
0.5588±0.0052
ONMF-U
0.9111±0.0083
0.9101±0.0011
0.5849±0.0071
0.5453±0.0165
0.5777±0.0096
ONMF-V
0.9031±0.0273
0.8301±0.0266
0.5046±0.0087
0.5016±0.0068
0.5526±0.0057
SNMF
0.9359±0.0051
—
0.5314±0.0162
0.5179±0.0218
0.5529±0.0528
GNMF
0.9573±0.0067
0.9484±0.0047
0.6304±0.0000
0.5765±0.0001
0.5983±0.0007
RNMF
0.9283±0.0064
—
0.5399±0.0239
0.5182±0.0139
0.5616±0.0133
GRNMF
0.9232±0.0069
—
0.6052±0.0017
0.5464±0.0039
0.5907±0.0077
PNMF
0.9112±0.0066
—
0.4993±0.0000
0.4989±0.0001
0.5535±0.0051
GOPA
0.9003±0.0095
—
0.5905±0.0000
0.5566±0.0076
0.5787±0.0028
LNMFS
0.9649±0.0077
0.9546±0.0030
0.6392±0.0000
0.5905±0.0000
0.6019±0.0011
4.2. Performance Comparison
359
We evaluate the clustering performance of ten NMF-based algorithms us-
360
ing four criteria: Rand index [2], normalized mutual information (NMI) [34],
361
F-measure [30], and purity [27] (Tables 3–6). Among the four criteria, Rand
362
index focuses on the ratio of “good choices”, NMI is defined using mutual
363
information, F-measure considers both precision and recall rate, and purity
364
measures the extent to which clusters contain a single class. Twenty test runs
365
were conducted to compute the average clustering performance and standard
366
deviation for each criterion. “—” indicates failure to produce clustering re25
Table 4: Comparison of clustering performance in terms of NMI Algorithm
Yale
Wine
Pendigits
Breast
COIL20
NMF
0.5422±0.0409
0.4206±0.0259
0.5094±0.0712
0.7115±0.0439
0.7392±0.0157
ONMF-U
0.5522±0.0262
0.3404±0.0737
0.5003±0.0803
0.7166±0.0208
0.7553±0.0198
ONMF-V
0.4493±0.0454
0.2807±0.0523
0.5035±0.0599
0.3802±0.1349
0.6856±0.0443
SNMF
0.4620±0.0191
0.4239±0.0152
0.6120±0.0111
0.5228±0.0918
0.6894±0.0112
GNMF
0.6049±0.0204
0.4491±0.0100
0.7625±0.0951
0.7925±0.0016
0.8439±0.0044
RNMF
0.5934±0.0170
0.4268±0.0237
0.6341±0.0674
0.7514±0.0124
0.7666±0.0073
GRNMF
0.5840±0.0214
0.4058±0.0175
0.6314±0.0523
0.7546±0.0000
0.7476±0.0102
PNMF
0.4776±0.0203
0.4421±0.0202
0.5519±0.0000
0.7252±0.0056
0.7138±0.0109
GOPA
0.5343±0.0420
0.3812±0.1122
0.4702±0.0815
0.7649±0.0068
0.7556±0.0176
LNMFS
0.6369±0.0142
0.7856±0.0036
0.8166±0.0100
0.7965±0.0039
0.8588±0.0064
Algorithm
Optical Digits
TDT2
NC vs AD
NC vs MCI
NC vs MCI vs AD
NMF
0.6605±0.0228
0.6808±0.0063
0.0026±0.0039
0.0043±0.0087
0.0224±0.0065
ONMF-U
0.6323±0.0269
0.6367±0.0097
0.1334±0.0107
0.0726±0.0250
0.0779±0.0128
ONMF-V
0.6773±0.0248
0.6165±0.0184
0.0135±0.0140
0.0086±0.0102
0.0127±0.0049
SNMF
0.7263±0.0123
—
0.0567±0.0224
0.0405±0.0309
0.0531±0.0217
GNMF
0.8612±0.0091
0.7993±0.0080
0.2069±0.0000
0.1196±0.0002
0.0944±0.0009
RNMF
0.6862±0.0175
—
0.0705±0.0382
0.0371±0.0247
0.0382±0.0089
GRNMF
0.6619±0.0195
—
0.1643±0.0010
0.0751±0.0074
0.0879±0.0131
PNMF
0.6430±0.0132
—
0.0055±0.0000
0.0045±0.0001
0.0094±0.0025
GOPA
0.5561±0.0362
—
0.1423±0.0000
0.0896±0.0117
0.0804±0.0040
LNMFS
0.8770±0.0155
0.8112±0.0062
0.2229±0.0000
0.1403±0.0000
0.1005±0.0026
367
sults in reasonable time. Moreover, we report the average runtime and actual
368
number of iterations for each algorithm on the experimental datasets (Table
369
7). All the experiments are performed in MATLAB R2018a on a PC with
370
3.7 GHz Intel Core i7 and 16 GB RAM.
371
In Tables 3-6, it can be seen that the proposed LNMFS exhibits the
372
best clustering performance. This is particularly obvious on the Wine and
373
Pendigits datasets, where LNMFS achieves an improvement of Rand index
374
by 0.2 and 0.1, respectively, over the best among the other algorithms. This
375
demonstrates the effectiveness of LNMFS in leveraging the power of low-rank
26
Table 5: Comparison of clustering performance in terms of F-measure Algorithm
Yale
Wine
Pendigits
Breast
COIL20
NMF
0.3178±0.0403
0.5838±0.0228
0.6583±0.0598
0.9169±0.0180
0.5473±0.0306
ONMF-U
0.3343±0.0230
0.5662±0.0280
0.6354±0.0759
0.9193±0.0078
0.5732±0.0390
ONMF-V
0.1984±0.0377
0.5002±0.0407
0.6376±0.0528
0.7715±0.0756
0.4388±0.0823
SNMF
0.2477±0.0186
0.5784±0.0124
0.7212±0.0073
0.8354±0.0380
0.4355±0.0214
GNMF
0.4001±0.0284
0.6028±0.0161
0.8546±0.0824
0.9451±0.0005
0.6937±0.0196
RNMF
0.3852±0.0230
0.6141±0.0188
0.7559±0.0528
0.9319±0.0044
0.5946±0.0163
GRNMF
0.3680±0.0260
0.5929±0.0169
0.7593±0.0415
0.9331±0.0000
0.5815±0.0178
PNMF
0.2611±0.0180
0.5915±0.0175
0.6914±0.0000
0.9224±0.0021
0.5148±0.0267
GOPA
0.3170±0.0468
0.5649±0.0531
0.6225±0.0653
0.9367±0.0024
0.5646±0.0307
LNMFS
0.4447±0.0209
0.8459±0.0041
0.8951±0.0042
0.9467±0.0012
0.7406±0.0149
Algorithm
Optical Digits
TDT2
NC vs AD
NC vs MCI
NC vs MCI vs AD
NMF
0.6153±0.0344
0.2069±0.0123
0.4997±0.0113
0.5040±0.0082
0.3466±0.0064
ONMF-U
0.5769±0.0330
0.1820±0.0086
0.5833±0.0069
0.5422±0.0173
0.3919±0.0178
ONMF-V
0.5861±0.0578
0.2268±0.0269
0.5060±0.0123
0.5015±0.0091
0.3415±0.0108
SNMF
0.6889±0.0210
—
0.5483±0.0273
0.5705±0.0529
0.3891±0.0331
GNMF
0.8015±0.0246
0.6801±0.0454
0.6298±0.0000
0.5739±0.0002
0.3952±0.0006
RNMF
0.6514±0.0283
—
0.5504±0.0244
0.5331±0.0239
0.3585±0.0101
GRNMF
0.6244±0.0282
—
0.6331±0.0004
0.5451±0.0077
0.3907±0.0126
PNMF
0.5780±0.0220
—
0.5106±0.0000
0.4992±0.0022
0.3361±0.0067
GOPA
0.5149±0.0432
—
0.5891±0.0000
0.5541±0.0080
0.3793±0.0030
LNMFS
0.8330±0.0329
0.7246±0.0239
0.6394±0.0000
0.5872±0.0000
0.4027±0.0001
376
and graph smoothness on the latent Stiefel manifold. Moreover, the stan-
377
dard deviation of LNMFS is relatively low compared with that of the other
378
algorithms on most datasets. This suggests the stability of the proposed
379
approach. Among the other nine NMF-based algorithms, GNMF exhibits
380
the second best clustering performance on most experimental datasets. To-
381
gether with LNMFS, it illustrates the importance of preserving the local
382
manifold data structure in clustering tasks. In addition to GNMF, ONMF-U
383
and GOPA are another two NMF-based algorithms closely related to LN-
384
MFS because they both impose orthogonal constraints on U. Compared
27
Table 6: Comparison of clustering performance in terms of purity Algorithm
Yale
Wine
Pendigits
Breast
COIL20
NMF
0.5858±0.0343
0.8124±0.0466
0.8136±0.0325
0.9517±0.0124
0.6993±0.0225
ONMF-U
0.6518±0.0227
0.7222±0.0319
0.8274±0.0366
0.9535±0.0049
0.7310±0.0168
ONMF-V
0.6685±0.0549
0.6744±0.0954
0.8049±0.0248
0.7890±0.1218
0.6856±0.0214
SNMF
0.5148±0.0257
0.7202±0.0615
0.8420±0.0054
0.8920±0.0274
0.7188±0.0145
GNMF
0.6412±0.0289
0.7514±0.0446
0.9244±0.0398
0.9693±0.0003
0.8372±0.0134
RNMF
0.6394±0.0184
0.8562±0.0529
0.8651±0.0309
0.9612±0.0027
0.7231±0.0235
GRNMF
0.6267±0.0264
0.7522±0.0605
0.8625±0.0280
0.9619±0.0000
0.7108±0.0167
PNMF
0.5300±0.0235
0.7270±0.0547
0.8197±0.0000
0.9553±0.0013
0.6806±0.0180
GOPA
0.6121±0.0284
0.6983±0.0405
0.7845±0.0528
0.9641±0.0015
0.7174±0.0229
LNMFS
0.7151±0.0087
0.9149±0.0027
0.9439±0.0023
0.9702±0.0001
0.8452±0.0089
Algorithm
Optical Digits
TDT2
NC vs AD
NC vs MCI
NC vs MCI vs AD
NMF
0.7523±0.0266
0.4152±0.0156
0.5629±0.0539
0.5954±0.0364
0.4572±0.0184
ONMF-U
0.7227±0.0377
0.3664±0.0132
0.7100±0.0084
0.6546±0.0263
0.5414±0.0555
ONMF-V
0.7736±0.0194
0.5756±0.0390
0.5871±0.0313
0.5687±0.0302
0.4425±0.0382
SNMF
0.8123±0.0198
—
0.6762±0.0768
0.7409±0.1480
0.5664±0.1133
GNMF
0.9088±0.0156
0.7700±0.0384
0.7583±0.0000
0.7000±0.0002
0.5139±0.0085
RNMF
0.7845±0.0283
—
0.6629±0.0399
0.6500±0.0599
0.4758±0.0302
GRNMF
0.7551±0.0257
—
0.7329±0.0018
0.6538±0.0060
0.5233±0.0304
PNMF
0.7156±0.0213
—
0.6250±0.0000
0.5662±0.0122
0.4303±0.0217
GOPA
0.6590±0.0401
—
0.7167±0.0000
0.6733±0.0107
0.4764±0.0092
LNMFS
0.9207±0.0180
0.7985±0.0195
0.7667±0.0000
0.7417±0.0000
0.5722±0.0001
385
with standard NMF, ONMF-V and SNMF, they achieve superior clustering
386
results on most datasets. In contrast, ONMF-V (another orthogonal NMF
387
algorithm), has inferior clustering performance to that of ONMF-U and even
388
standard NMF on most datasets. PNMF has comparable clustering perfor-
389
mance with that of non-probabilistic standard NMF. Furthermore, the robust
390
NMF models (RNMF and GRNMF) rank approximately between GNMF
391
and ONMF-U. Their advantage lies in their model robustness by imposing
392
(group) sparsity constraints on the residue matrix. In fact, LNMFS also en-
393
hances the model robustness, but in a different manner, that is, by imposing
28
A
ORBsupmed. R ACG. R TPOsup. R
TPOsup. L
OLF. R
INS. L
DCG. R
DCG. L
STG. L
ORBsupmed. R
FFG. L
SMG. L
INS. L
DCG. L INS. L
ORBsupmed. R
LING. R
STG. L FFG. L
LING. R
SMG. L
ACG. R
SMG. L
TPOsup. L PCUN. L
Prefrontal
PCUN. L
PCUN. L
DCG. L
FFG. L
TPOsup. R
Frontal
TPOsup. L
TPOmid. L
TPOmid. L
B
Parietal
Temporal SFGmed. L ACG. L OLF. L ROL. L
ORBsup. R REC. R TPOmid. R ROL. R
ITG. L FFG. R
ACG. L
SMG. R ORBsup. R REC. R
SMG. R
SMG. R
SFGmed. L
INS. R
ROL. R
ROL. R INS. R
INS. R FFG. R
IOG. L
Occipital
SFGmed. L ACG. L ORBsup. R
FFG. R
IOG. L
ROL. L ITG. L
Subcortical
REC. R IOG. L
(I) Axial view
TPOmid. R
TPOmid. R
ITG. L
(II) Sagittal view
(III) Coronal view
Figure 2: (A) Significant brain ROIs in the first basis vector learnt by LNMFS from the Alzheimer’s disease dataset (NC vs AD). (B) Significant brain ROIs in the second basis vector learnt by LNMFS from the Alzheimer’s disease dataset (NC vs AD). The three different views (i.e., axial,Sagittal sagittal, brainView ROIs are depicted using Axial View View and coronal) ofCoronal BrainNet Viewer [43]. The size of each node (i.e., ROI) is proportional to its value in the basis vector. Different ROIs are rendered with different colors according to their anatomical locations, as suggested by [38].
394
the low-rank constraint on the intrinsic data matrix. This contributes to the
395
superior clustering performance of LNMFS as well. Finally, we notice that all
396
algorithms fail to produce as good clustering performance on the Alzheimer’s
397
disease dataset as on the other datasets. This may be because MRI images
398
contain a significant amount of noise, which may be introduced by the op-
399
erator, equipment and environment [26]. Nevertheless, LNMFS still exhibits
400
better clustering performance than the other NMF-based algorithms.
401
To explain the improved clustering performance of LNMFS, we take the
402
Alzheimer’s disease dataset (NC vs AD) as an example, and show its de-
29
403
composed basis matrix (U) as the extracted brain ROI features in Figure 2.
404
In Figure 2(A), ORBsupmed.R (right superior frontal gyrus, medial orbital)
405
and bilateral TPOsup, that is, TPOsup.L and TPOsup.R (temporal pole,
406
superior temporal gyrus), rank the top three significant brain ROIs in the
407
first basis vector of LNMFS. This agrees with the fact in neuroscience that
408
orbitofrontal cortex is involved in the cognitive process of decision-making,
409
and superior temporal gyrus is involved in the comprehension of language
410
and is among the earliest areas affected by Alzheimer’s disease [32]. There-
411
fore, these physically interpretable features extracted by the basis matrix
412
further improve the practicability of the proposed approach.
413
Regarding runtime performance (Table 7), NMF is the most efficient al-
414
gorithm among the ten NMF-based methods under comparison. LNMFS
415
costs less than 4 s on nine out of ten datasets, and it is the third most effi-
416
cient algorithm on the largest TDT2 dataset. However, half of the compared
417
algorithms, including SNMF, RNMF, GRNMF, PNMF and GOPA, fail to
418
produce any decomposition result on TDT2 dataset in reasonable time. Fur-
419
thermore, GOPA requires the smallest number of iterations (listed in the
420
parentheses), but it does not necessarily lead to the least runtime. Most of
421
the evaluated NMF-based algorithms (including LNMFS) reach the maxi-
422
mal number of iterations (500) before they satisfy the convergence tolerance
423
(10−5 ). However, this by no means indicates that they do not converge be-
424
cause the actual number of iterations depends on the magnitude of the spec-
425
ified convergence tolerance. Later (Section 4.4), we will see that LNMFS in
426
fact convergences quite rapidly.
30
Table 7: Comparison of averaged runtime performance (s). The numbers in the parentheses indicate the actual number of iterations. Algorithm
Yale
Wine
Pendigits
Breast
COIL20
NMF
3.02 (500)
0.02 (450)
0.02 (274)
0.01 (312)
4.18 (259)
ONMF-U
2.79 (500)
0.04 (500)
0.05 (500)
0.05 (500)
7.56 (500)
ONMF-V
3.20 (500)
0.04 (500)
0.05 (500)
0.04 (500)
8.06 (500)
SNMF
4.26 (500)
0.04 (500)
0.08 (500)
0.10 (500)
24.19 (500)
GNMF
0.57 (500)
0.02 (500)
0.04 (500)
0.06 (500)
3.42 (500)
RNMF
7.34 (500)
0.05 (500)
0.07 (500)
0.07 (500)
20.50 (500)
GRNMF
15.04 (500)
0.07 (500)
0.11 (499)
0.11 (160)
37.07 (500)
PNMF
162.54 (500)
0.25 (500)
0.46 (500)
0.34 (500)
533.12 (500)
GOPA
0.47 (89)
0.08 (18)
0.31 (31)
0.44 (23)
11.45 (55)
LNMFS
1.03 (500)
0.06 (500)
0.16 (500)
0.06 (386)
3.82 (500)
Optical Digits
TDT2
NC vs AD
Algorithm
427
NC vs MCI NC vs MCI vs AD
NMF
1.20 (485)
7.72 (2)
0.01 (242)
0.01 (236)
0.02 (242)
ONMF-U
1.25 (500)
2097.57 (500)
0.06 (500)
0.07 (500)
0.07 (500)
ONMF-V
1.14 (500)
2486.86 (500)
0.06 (500)
0.06 (500)
0.07 (500)
SNMF
32.96 (500)
—
0.07 (500)
0.07 (500)
0.09 (500)
GNMF
16.62 (500)
106.71 (500)
0.02 (500)
0.02 (500)
0.03 (500)
RNMF
9.57 (500)
—
0.09 (500)
0.09 (500)
0.11 (500)
GRNMF
6.77 (500)
—
0.06 (320)
0.07 (237)
0.18 (465)
PNMF
49.99 (500)
—
0.32 (500)
0.33 (500)
0.57 (500)
GOPA
121.90 (35)
—
0.03 (22)
0.03 (23)
0.07 (24)
LNMFS
2.04 (500)
876.71 (500)
0.06 (500)
0.06 (500)
0.10 (500)
4.3. Noise Robustness
428
In addition to the real-world noisy datasets such as Alzheimers’ disease
429
datasets (Section 4.2), herein, we evaluate LNMFS on Yale dataset with
430
added Salt & Pepper noise (Figure 3), in comparison with the other nine
431
NMF-based algorithms (Figure 4). Salt & Pepper noise is a common type of
432
corruption in images. Unfortunately, it is a challenging task to remove Salt &
433
Pepper noise in computer vision because it contaminates each pixel by zero
434
or the maximum pixel value, and its distribution violates the common noise 31
r = 0.05
r = 0.10
r = 0.15
r = 0.20
r = 0.25
r = 0.30
r = 0.35
r = 0.40
r = 0.45
r = 0.50
r = 0.05
r = 0.10
r = 0.15
r = 0.20
r = 0.25
r = 0.30
r = 0.35
r = 0.40
r = 0.45
r = 0.50
Figure 3: Example face images contaminated by Salt & Pepper noise with different noise ratios (from 5% to 50%).
(A) NMI
(B) Rand Index
(C) F-measure
(D) Purity
Figure 4: Clustering performance curves of ten NMF-based algorithms on Yale dataset with different noise ratios.
435
assumption in traditional learning methods. To demonstrate the robustness
436
of LNMFS, we vary the percentage of corrupted pixels from 5% to 50%. For
437
each case of additive Salt & Pepper noise, we conduct twenty test runs and
438
report the average clustering performance evaluated in terms of four different
439
criteria.
440
Figure 4 shows that LNMFS consistently exhibits superior performance
441
to that of the other algorithms despite the change of the noise ratios. This is
442
particularly obvious when the noise ratio is smaller than 0.2 or greater than
443
0.45. The two robust NMF models, that is, RNMF and GRNMF, surpris-
444
ingly exhibit comparable clustering performance with that of standard NMF.
445
This may be because their noise assumption violates the Salt & Pepper noise
446
distribution. ONMF-U exhibits considerably better clustering performance
32
447
than NMF and ONMF-V. Furthermore, GOPA has relatively stable clus-
448
tering performance when the noise ratio is smaller than or equal to 0.35,
449
but its performance starts to decline when the noise ratio grows larger than
450
0.4. Together with LNMFS, they demonstrate the effectiveness of orthogo-
451
nal constraints on U for handling Salt & Pepper noise. Finally, the Rand
452
index curve of ONMF-V is increasing as the noise ratio increases because its
453
clustering result becomes more balanced on noisier datasets.
454
4.4. Empirical Convergence
455
In Section 3.3, we theoretically proved the convergence of the multiplica-
456
tive updating rules of LNMFS. We will now investigate the empirical conver-
457
gence performance of LNMFS on the experimental datasets.
458
Figure 5 shows the convergence of the relative difference of U (Figure
459
5(A)) and V (Figure 5(B)), which are decomposed by LNMFS, as the number
460
of iterations increases. It can be seen that both U and V converge rapidly,
461
and U converges slightly faster than V. In the first 100 iterations, the relative
462
difference curves of U and V drop sharply, then decline slowly, and gradually
463
become stable. This demonstrates that LNMFS is a convergent algorithm.
464
LNMFS reaches the maximal number of iterations in the experiments (Table
465
7) because its maximal relative difference of U and V is close to but not
466
below the specified convergence tolerance (10−5 ) at the maximal number of
467
iterations.
468
4.5. Parameter Influence
469
The proposed LNMFS algorithm has two regularization parameters, the
470
weight of the low-rank regularization term λ and the weight of the graph 33
A
Yale
Wine
Pendigits
Breast
COIL20
Optical Digits
TDT2
NC vs AD
NC vs MCI
NC vs MCI vs AD
Yale
Wine
Pendigits
Breast
COIL20
Optical Digits
TDT2
NC vs AD
NC vs MCI
NC vs MCI vs AD
B
of LRNMF on the datasets.of(A) relative Figure 5: 5: Empirical Empiricalconvergence convergence of LNMFS. (A)experimental Relative difference U,The defined as ||Ut+1 Ut ||F t ||Ut+1 −Ut ||F t th di↵erence of U, defined as , with the increasing number of iterations, U , with increasing number iteration. t ||Ut ||F of iterations. U indicates U in the t ||U ||F
t+1
t
V ||F ||V relative −V ||Fdi↵erence of V, defined as ||V indicates U indifference the t thofiteration. (B)asThe , (B) Relative V, defined , with increasing number of iterations. ||Vt ||F ||Vt ||F t with the increasing number of iterations. V indicates V in the tth iteration. t+1
t
471 471
keep consistent with the previous setting, regularization smoothness regularization term β.experimental To discuss the effecteach of the regulariza-
472 472
parameter changesweover the range of 10[ 5:0.5:5]performance . tion parameters, consider the clustering of LNMFS under
474 474
In Fig. 6, we can thetypical same dataset, influence of the regdifferent settings of λ see andthat β onforfour datasetsthe (Figure 6). For consisularization parameters very similar patterns di↵erent evaluation tency with the previousshow experimental setting, eachacross regularization parameter
475 475
(−5:0.5:5) the influence of the regularization paramcriteria. over But for changes the di↵erent range 10datasets, .
473 473
34
A
B
C
D
(I) Wine
(II) Pendigits
(III) COIL20
(IV) NC vs AD
Figure 6: Influence of the regularization parameters (the low-rank regularization parameter λ and the graph smoothness parameter β) on the clustering performance of LNMFS in terms of four evaluation criteria. Each regularization parameter changes over the range 10(−5:0.5:5) . Parameter influence on (A) Rand index, (B) NMI, (C) F-measure and (D) purity.
476
In Fig. 6, it can be seen that for the same dataset, the influence of the
477
regularization parameters exhibit highly similar patterns in terms of different
478
evaluation criteria. However, for different datasets, the influence of the regu-
479
larization parameters varies. Despite the differences, the superior clustering
480
performance of LNMFS is achieved not only for a specific parameter setting
481
but also in a range of parameter settings. For instance, on the Wine dataset,
482
most of the parameter settings for LNMFS can lead to higher Rand index val-
35
483
ues than those by the other NMF-based algorithms. An empirical suggestion
484
for the regularization parameter setting of LNMFS is (β = 105 , λ = 101.5 ),
485
but better performance can be achieved if the regularization parameters are
486
optimized over the range 10(−5:0.5:5) .
487
4.6. Embedding Dimension
488
In the previous experiments, we set the embedding dimension of the NMF-
489
based algorithms to be the same as the number of clusters (Section 4.2).
490
We now study the clustering performance of LNMFS under different em-
491
bedding dimension settings, in comparison with the other nine NMF-based
492
algorithms on three typical datasets (Figure 7). For each algorithm, twenty
493
runs are conducted to obtain the average clustering performance. For each
494
embedding dimension, the regularization parameters (if any) are optimized
495
over the range 10(−5:0.5:5) .
496
In Figure 7, the ten NMF-based algorithms exhibit diverse clustering per-
497
formance curves on different datasets. LNMFS exhibits superior clustering
498
performance compared with the other algorithms for most embedding dimen-
499
sions and in terms of most evaluation criteria. An unexpected exception is on
500
the NC vs AD dataset, where the purity performance of SNMF is exception-
501
ally higher than that of the other algorithms when the embedding dimension
502
is larger than 3. The underlying reason is that SNMF partitions most of
503
the data points into a single cluster. The extraordinarily high purity and
504
F-measure of GRNMF when the embedding dimension equals 1 may be ex-
505
plained similarly. Therefore, purity and F-measure may be more susceptible
506
to unbalanced clustering results than Rand index and NMI.
507
On the same dataset, LNMFS exhibits similar performance curves with 36
A
B
C
D
(I) Wine
(II) COIL20
(III) NC vs AD
Figure 7: Clustering performance curves of ten NMF-based algorithms under different embedding dimension settings. (A) Rand index, (B) NMI, (C) F-measure, and (D) purity curves.
37
508
respect to different evaluation criteria. As the embedding dimension in-
509
creases, the clustering performance of LNMFS does not necessarily improve;
510
rather, LNMFS often achieves the optimal or near optimal clustering perfor-
511
mance when the embedding dimension equals the number of clusters (e.g.,
512
when d = 3 on Wine dataset, d = 20 on COIL20 dataset, and d = 2 on NC
513
vs AD dataset). This may be because when the embedding dimension co-
514
incides with the rank of the intrinsic data, the low-rank regularization term
515
can be maximally utilized. Nevertheless, this rule does not apply to the other
516
NMF-based algorithms. For instance, GNMF has the most stable clustering
517
performance curves under different embedding dimensions. In contrast with
518
LNMFS, standard NMF and RNMF exhibit inferior clustering performance
519
on NC vs AD dataset when the embedding dimension equals the cluster
520
number.
521
5. Conclusion and Future Study
522
The characteristic feature of NMF is its interpretable parts-based data
523
representation. In this study, we proposed a novel NMF-based algorithm
524
using three different types of additional constraints.
525
First, it addresses the problem of noise and outliers by incorporating
526
a low-rank constraint. This is different from existing robust NMF models
527
[46, 15, 20, 24], which assume that the introduced noise is sparse and impose
528
an l1 /l21 norm constraint on the noise matrix; rather, LNMFS penalizes the
529
nuclear norm of the intrinsic data matrix and transforms it into a sum of con-
530
vex Frobenius norms of factor matrices. This strategy was initially proposed
531
for low-rank matrix decomposition [33][6]. Although there are some NMF 38
532
models that also impose a low-rank constraint, they either impose it on a fac-
533
tor matrix [28] or aim for nonnegative matrix completion [35]. In addition,
534
they use ADMM to solve the optimization problem, which is considerably
535
slower and less concise than approximating with the Frobenius norm in a
536
unified objective function. Secondly, LNMFS incorporates the orthogonality
537
constraints on a Stiefel manifold into the model. The advantage of enhancing
538
orthogonality is that it can reduce the redundancy of the data representation
539
and improve the interpretability of the decomposition results. However, LN-
540
MFS does not use the retraction operator [1][37] because it remains an open
541
problem to ensure the convergence of NMF while retracting on the Stiefel
542
manifold. Finally, LNMFS integrates the graph smoothness constraint to
543
utilize the local geometry of the data manifold [7]. An alternative iterative
544
optimization algorithm was developed for LNMFS, and the proof of its con-
545
vergence was presented. Experiments on real-world datasets demonstrated
546
the superiority of LNMFS compared with nine representative NMF-based
547
algorithms on clustering tasks.
548
In future work, we will investigate the incorporation of user-provided
549
information (in terms of “must-link” and “cannot-link” constraints) into the
550
proposed LNMFS algorithm for semi-supervised NMF.
551
Acknowledgment
552
This work was supported in part by the Chinese National Natural Sci-
553
ence Foundation under Grant Nos. 61402395, 61906100 and 61802336, Nat-
554
ural Science Foundation of Jiangsu Province under contracts BK20151314,
555
BK20140492 and BK20180822, Jiangsu Overseas Research and Training Pro39
556
gram for University Prominent Young and Middle-aged Teachers and Pres-
557
idents, Jiangsu Government Scholarship for Overseas Studies, and Natural
558
Science Foundation of Education Department of Jiangsu Province under con-
559
tract 18KJB520040.
560
References
561
562
[1] Absil, P. A., Mahony, R., Sepulchre, R., 2009. Optimization algorithms on matrix manifolds. Princeton University Press.
563
[2] Bagga, A., Baldwin, B., 1998. Entity-based cross-document coreferenc-
564
ing using the vector space model. In: Meeting of the Association for
565
Computational Linguistics and International Conference on Computa-
566
tional Linguistics. pp. 79–85.
567
[3] Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., Plemmons,
568
R. J., 2007. Algorithms and applications for approximate nonnegative
569
matrix factorization. Computational Statistics & Data Analysis 52 (1),
570
155–173.
571
[4] Brouwer, T., Frellsen, J., Lio, P., 2017. Comparative study of inference
572
methods for bayesian nonnegative matrix factorisation. In: Machine
573
Learning and Knowledge Discovery in Databases - European Confer-
574
ence, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017,
575
Proceedings, Part I. Skopje, Macedonia, pp. 513 – 529.
576
[5] Busovaca, E., Zimmerman, M. E., Meier, I. B., Griffith, E. Y., Grieve,
577
S. M., Korgaonkar, M. S., Williams, L. M., Brickman, A. M., 2016. Is
40
578
the alzheimers disease cortical thickness signature a biological marker
579
for memory? Brain Imaging & Behavior 10 (2), 1–7.
580
[6] Cabral, R., Torre, F. D. L., Costeira, J. a. P., Bernardino, A., 2013.
581
Unifying nuclear norm and bilinear factorization approaches for low-rank
582
matrix decomposition. In: Proceedings of the 2013 IEEE International
583
Conference on Computer Vision. ICCV ’13. IEEE Computer Society,
584
Washington, DC, USA, pp. 2488–2495.
585
URL http://dx.doi.org/10.1109/ICCV.2013.309
586
[7] Cai, D., He, X., Han, J., Huang, T. S., 2011. Graph regularized nonneg-
587
ative matrix factorization for data representation. IEEE Transactions
588
on Pattern Analysis and Machine Intelligence 33 (8), 1548–1560.
589
[8] Choi, S., 2008. Algorithms for orthogonal nonnegative matrix factoriza-
590
tion. In: IEEE International Joint Conference on Neural Networks. pp.
591
1828–1832.
592
[9] Comon, P., Jutten, C., 2010. Handbook of Blind Source Separation:
593
Independent Component Analysis and Separation. Academic Press.
594
[10] Cunningham, J. P., Ghahramani, Z., 2015. Linear dimensionality reduc-
595
tion: Survey, insights, and generalizations. Journal of Machine Learning
596
Research 16, 2859–2900.
597
URL http://jmlr.org/papers/v16/cunningham15a.html
598
599
[11] Dahnke, R., Yotter, R. A., Gaser, C., 2013. Cortical thickness and central surface estimation. Neuroimage 65 (1), 336–348.
41
600
[12] Ding, C., Li, T., Peng, W., Park, H., 2006. Orthogonal nonnegative
601
matrix t-factorizations for clustering. In: Proceedings of the 12th ACM
602
SIGKDD International Conference on Knowledge Discovery and Data
603
Mining. KDD ’06. pp. 126–135.
604
[13] Edelman, A., Arias, T. A., Smith, S. T., 1998. The geometry of algo-
605
rithms with orthogonality constraints. Society for Industrial and Applied
606
Mathematics.
607
[14] Eggert, J., Korner, E., 2004. Sparse coding and nmf. In: IEEE Inter-
608
national Joint Conference on Neural Networks, 2004. Proceedings. pp.
609
2529–2533 vol.4.
610
[15] Fvotte, C., Dobigeon, N., 2015. Nonlinear hyperspectral unmixing with
611
robust nonnegative matrix factorization. IEEE Transactions on Image
612
Processing 24 (12), 4810–9.
613
614
615
616
[16] Fvotte, C., Idier, J., 2011. Algorithms for nonnegative matrix factorization with the β-divergence. Neural Computation 23 (9), 2421 – 2456. [17] Hoyer, P. O., 2004. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5 (1), 1457–1469.
617
[18] Jalili, M., 2016. Graph theoretical analysis of alzheimer’s disease: dis-
618
crimination of ad patients from healthy subjects. Information Sciences
619
384, S0020025516306193.
620
[19] Jolliffe, I. T., Cadima, J., 2016. Principal component analysis: a review
621
and recent developments. Philos Trans A Math Phys Eng Sci 374 (2065),
622
20150202. 42
623
[20] Kong, D., Huang, H., Huang, H., 2011. Robust nonnegative matrix fac-
624
torization using l21-norm. In: ACM International Conference on Infor-
625
mation and Knowledge Management. pp. 673–682.
626
[21] Lathauwer, L. D., Moor, B. D., Vandewalle, J., 2000. A Multilinear Sin-
627
gular Value Decomposition. Society for Industrial and Applied Mathe-
628
matics.
629
630
[22] Lee, D. D., Seung, H. S., 1999. Learning the parts of objects by nonnegative matrix factorization. Nature 401 (6755), 788.
631
[23] Li, T., Ding, C., 2007. The relationships among various nonnegative ma-
632
trix factorization methods for clustering. In: International Conference
633
on Data Mining. pp. 362–371.
634
[24] Li, X., Cui, G., Dong, Y., 2017. Graph regularized non-negative low-
635
rank matrix factorization for image clustering. IEEE Transactions on
636
Cybernetics 47 (11), 3840–3853.
637
[25] Li, Z., Wu, X., Peng, H., 2010. Nonnegative matrix factorization on
638
orthogonal subspace. Pattern Recognition Letters 31 (9), 905–911.
639
[26] Macovski, A., 1996. Noise of mri. Magnetic Resonance Medcine 36 (3),
640
641
642
643
494–497. [27] Manning, C. D., Raghavan, P., Schtze, H., 2008. Introduction to Information Retrieval. Cambridge University Press. [28] Pan, J., Ng, M. K., 2018. Orthogonal nonnegative matrix factorization
43
644
by sparsity and nuclear norm optimization. SIAM Journal on Matrix
645
Analysis and Applications 39 (2), 856875.
646
[29] Racine, A. M., Brickhouse, M., Wolk, D. A., Dickerson, B. C., 2018.
647
The personalized alzheimer’s disease cortical thickness index predicts
648
likely pathology and clinical progression in mild cognitive impairment.
649
Alzheimers & Dementia Diagnosis Assessment & Disease Monitoring,
650
S2352872918300150.
651
652
[30] Rijsbergen, C. J. V., 1974. Foundation of evaluation. Journal of Documentation 30 (4), 365–373.
653
[31] Schmidt, M. N., Larsen, J., Hsiao, F. T., 2007. Wind noise reduction
654
using non-negative sparse coding. In: Machine Learning for Signal Pro-
655
cessing, 2007 IEEE Workshop on. pp. 431–436.
656
[32] SL, D., GW, V. H., MD, C., A., P., 2010. Parcellation of human tem-
657
poral polar cortex: A combined analysis of multiple cytoarchitectonic,
658
chemoarchitectonic, and pathological markers. Journal of Comparative
659
Neurology 514 (6), 595–623.
660
661
[33] Srebro, N., Shraibman, A., 2005. Rank, trace-norm and max-norm. In: Conference On Learning Theory. pp. 545–560.
662
[34] Strehl, A., Strehl, E., Ghosh, J., Mooney, R., 2000. Impact of simi-
663
larity measures on web-page clustering. In: In Workshop on Artificial
664
Intelligence for Web Search (AAAI 2000. AAAI, pp. 58–64.
665
[35] Sun, D. L., Mazumder, R., 2013. Non-negative matrix completion for 44
666
bandwidth extension: A convex optimization approach. In: IEEE In-
667
ternational Workshop on Machine Learning for Signal Processing. pp.
668
1–10.
669
[36] Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F.,
670
Etard, O., Delcroix, N., Mazoyer, B., Joliot, M., 2002. Automated
671
anatomical labeling of activations in spm using a macroscopic anatomi-
672
cal parcellation of the mni mri single-subject brain. Neuroimage 15 (1),
673
273–289.
674
675
[37] Vandereycken, B., 2013. Low-rank matrix completion by riemannian optimization. SIAM Journal on Optimization 23 (2), 12141236.
676
[38] Wang, K., Liang, M., Wang, L., Tian, L., Zhang, X., Li, K., Jiang,
677
T., 2007. Altered functional connectivity in early alzheimer’s disease: a
678
resting-state fmri study. Human Brain Mapping 28 (10), 967.
679
[39] Wang, Y. X., Zhang, Y. J., 2013. Nonnegative matrix factorization: A
680
comprehensive review. IEEE Transactions on Knowledge & Data Engi-
681
neering 25 (6), 1336–1353.
682
683
[40] Weller, J., Budson, A., 2018. Current understanding of alzheimers disease diagnosis and treatment. F1000research 7, 1161.
684
[41] Wright, J., Ganesh, A., Rao, S., Peng, Y. G., Ma, Y., 2009. Robust
685
principal component analysis: Exact recovery of corrupted low-rank ma-
686
trices via convex optimization. In: International Conference on Neural
687
Information Processing Systems. pp. 2080–2088.
45
688
[42] Wu, W., Kwong, S., Yu, Z., Jia, Y., Wei, G., 2018. Nonnegative ma-
689
trix factorization with mixed hypergraph regularization for community
690
detection. Information Sciences 435, 263–281.
691
[43] Xia, M., Wang, J., Yong, H., 2013. Brainnet viewer: A network visual-
692
ization tool for human brain connectomics. Plos One 8 (7), e68910.
693
[44] Yang, J., Yang, S., Fu, Y., Li, X., Huang, T., 2008. Non-negative graph
694
embedding. In: Computer Vision and Pattern Recognition, 2008. CVPR
695
2008. IEEE Conference on. pp. 1–8.
696
[45] Zhang, K., Zhang, S., Liu, J., Wang, J., Zhang, J., 2019. Greedy or-
697
thogonal pivoting algorithm for non-negative matrix factorization. In:
698
Proceedings of the 36th International Conference on Machine Learning.
699
pp. 7493–7501.
700
[46] Zhang, L., Chen, Z., Zheng, M., Xiaofei, H. E., 2011. Robust non-
701
negative matrix factorization. Frontiers of Electrical & Electronic Engi-
702
neering in China 6 (2), 192–200.
703
[47] Zhang, T., Fang, B., Tang, Y. Y., He, G., Wen, J., 2008. Topology
704
preserving non-negative matrix factorization for face recognition. IEEE
705
Transactions on Image Processing A Publication of the IEEE Signal
706
Processing Society 17 (4), 574–84.
707
[48] Zhang, W. E., Tan, M., Sheng, Q., Yao, L., Shi, Q., 10 2016. Efficient
708
orthogonal non-negative matrix factorization over stiefel manifold. In:
709
Proceedings of the 25th ACM International on Conference on Informa-
710
tion and Knowledge Management. pp. 1743–1752. 46
711
[49] Zhang, X., Gao, H., Li, G., Zhao, J., Li, Z., 2018. Multi-view clustering
712
based on graph-regularized nonnegative matrix factorization for object
713
recognition. Information Sciences 432, 463–478.
714
[50] Zhao, R., Tan, V. Y. F., 2016. Online nonnegative matrix factoriza-
715
tion with outliers. In: Proceeding of IEEE International Conference on
716
Acoustics, Speech and Signal Processing. pp. 2662–2666.
717
Author Contributions
718
Ping He contributed to developing and implementing the method, con-
719
ducting experiments, visualizing and analyzing the results, and writing the
720
manuscript. Xiaohua Xu contributed to developing and implementing the
721
method, providing theoretical analysis of the method, and directing the work.
722
Jie Ding contributed to the revision of the manuscript. Baichuan Fan con-
723
tributed to conducting the experiments.
724
Declaration of interests
725
The authors declare that they have no known competing financial inter-
726
ests or personal relationships that could have appeared to influence the work
727
reported in this paper.
47