Information Sciences 483 (2019) 192–205
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
Possibility measure based fuzzy support function machine for set-based fuzzy classificationsR Jiqiang Chen a,b,c, Qinghua Hu a, Xiaoping Xue b,∗, Minghu Ha c, Litao Ma c, Xuchang Zhang c, Zhipeng Yu c a
School of Computer Science and Technology, Tianjin University, Tianjin 300072, PR China Department of Mathematics, Harbin Institute of Technology, Harbin 150001, PR China c School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, PR China b
a r t i c l e
i n f o
Article history: Received 31 July 2018 Revised 5 January 2019 Accepted 9 January 2019 Available online 11 January 2019 Keywords: Support function machine Possibility measure Membership degree Support function Set-valued data
a b s t r a c t In real-world applications, there are many set-based fuzzy classifications. However, the current researches have some limitations in solving such classifications. Therefore, a method called possibility measure based fuzzy support function machine (PMFSFM) is discussed in this work. Firstly, two notes are provided as improvement of SFM in theoretical and experimental perspective. Secondly, a set-based fuzzy classification in Euclidean space Rd is converted into a function-based task in Banach space C(S) based on support function and membership degree. Thirdly, a fuzzy optimization problem based on possibility measure is derived and some properties are discussed. Subsequently, a PMFSFM for set-based fuzzy classification is constructed, and it can give both the fuzzy class and the membership degree of a given input to the fuzzy class. Experiment results concerning water quality evaluation in fuzzy environment show the effectiveness of PMFSFM. © 2019 Published by Elsevier Inc.
1. Introduction In real-world applications, there are many set-based fuzzy classification schemes, where the sample is represented with a set of vectors and it does not clearly belong to one of the decision classes, but belong to each class with a certain membership degree. For example, in water quality evaluation, on the one hand, multiple repeated measurements are often used to reduce the value uncertainty [19], and then we can obtain a set of vectors describing the same sample if a feature vector is extracted from each measurement. On the other hand, the evaluation grades of water quality (such as Clean, Comparatively clean, etc.) are described by fuzzy language with imprecise boundaries, and the water quality data are not exactly assigned to any evaluation grade, but belong to each evaluation grade with a certain membership degree. Therefore, the water quality evaluation is a set-based fuzzy classification. In other applications, there are also set-based fuzzy classifications [13,25]. Now, this kind of classifications is becoming an important research topic in the fields of machine learning and pattern recognition [3,15–17,23,27,34,35,38,39]. Currently, there are two kinds of methods to deal with these classifications. One is to compute the statistics of the original data (such as mean and median) and describe the original data with a vector, such as SVMs [7,11,18,32] and other R ∗
Fully documented templates are available in the elsarticle package on CTAN. Corresponding author. E-mail addresses:
[email protected] (J. Chen),
[email protected],
[email protected] (X. Xue).
https://doi.org/10.1016/j.ins.2019.01.022 0020-0255/© 2019 Published by Elsevier Inc.
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
193
methods [4,12,36,39]. In this kind of methods, a set-valued object is represented by a vector-valued sample, but some classification information (such as the variance reflecting fluctuation information) may be lost in the reprocessing. The other one is to make some assumptions for the datasets in advance, such as obeying single Gauss [1], Gauss mixed model [2], subspace [14] and manifold model [26,30], and then develop set-based classifiers directly. However, some classification tasks may lie on the assumptions, others may not. In this sense, these methods do not work. To overcome the shortcomings of the above methods, Chen et al. [3] proposed a support function machine (SFM) for setbased classification, and discussed the separability of set-valued data sets and the existence of support hyperplanes [5]. SFM represents the set-valued data by their support functions, respectively. Then the sets are converted into functions that make up an infinite dimensional Banach space C(S) (whose elements are the continuous functions defined on the unit ball S in Rd ), and then SFM is established in the new space. Consequently, SFM not only retains the classification information of the original set-valued data, but also does not need any assumption in advance. Compared with other methods, SFM achieves a higher accuracy in water quality assessment. However, since SFM does not consider the fuzziness of water quality evaluation grades, it can not deal with the water quality evaluation effectively. Therefore, in order to further improve the classification accuracy, in this paper, membership degree is introduced to describe the degree that an input point belongs to a fuzzy class, possibility measure is introduced to describe the possibility that a fuzzy event occurs. Then, an extended SFM called possibility measure based fuzzy support function machine (PMFSFM) is established for set-based fuzzy classifications. Consequently, the proposed PMFSFM can provide both the fuzzy class and the membership degree of a given input to the fuzzy class, and achieves the highest average accuracy comparing with the other methods. And also, the source codes of SFM are optimized, then the time consumption with PMFSFM is much shorter than that with SFM. Additionally, PMFSFM achieves higher average accuracy than those with LPSVMfast and DT, which verifies the point of view that it is better to represent the original data with a set-valued datum but not a vector-valued datum as discussed above. Such as SVMs [7,11,18,32] and other methods [4,12,36,39]. This paper is structured as follows: Section 2 provides some preliminaries about SFM and possibility measure. Section 3 provides two notes as improvement of SFM in both theoretical and experimental perspectives. Sections 4 and 5 propose a hard margin and a soft margin PMFSFM, and then discuss some of their properties, respectively. Section 6 provides experiments concerning water quality evaluation in fuzzy environment. Finally, Section 7 draws conclusions and briefly discusses the future research directions. 2. Preliminaries In this section, we provide some preliminaries about SFM and possibility measure to better understand the concepts presented in this paper. Definition 1 [10]. The support function σA : Rd → R of a non-empty closed convex set A in Rd is given by σA (x ) = supy∈A {x, y}, x ∈ Rd . It follows directly from Definition 1 that σA (x ) is convex, positive homogeneous σA (α x ) = ασA (x ), α ≥ 0, x ∈ Rd and subadditive σA (x + y ) ≤ σA (x ) + σA (y ), x, y ∈ Rd . And also σα A (x ) = ασA (x ), α ≥ 0, x ∈ Rd and σA+B (x ) = σA (x ) + σB (x ), x ∈ Rd . For simplification, we also denote σA (x ) by σ A . As the Banach space C(S) is not an inner space, the hyperplane in SFM is defined via the following Riesz representation theorem in Banach space. Theorem 1 [21]. Assume that X is a locally compact Hausdorff space, then for any bounded linear functional on C0 (X), there is one and only one complex regular Borel measure μ such that (σ ) = X σ dμ, σ ∈ C0 (X ) and = |μ|(X ), where |μ|(X ) = n sup i=1 |μ (Ai )|, Ai is a part it ion o f X, i = 1, 2, . . . , n, n ≥ 1 is the total variation of μ, denoted by μ simply.
2 [3]. We define M σ ∈ C ( S ) S σ dμ = α , α∈R the hyperplane in C(S), M⊥ ∗ μ ∈ C (S ) S σ dμ = 0, σ ∈ M the orthocomplement of M, μ the vertical of hyperplane M, respectively. For simplicity, we denote ∫S σ dμ by the hyperplane μ(σ ). Definition
Definition 3 [3]. If g(σ ), σ ∈ C(S) in decision function dividing C(S) into two parts is a hyperplane, then we say training set is linearly separable. If g(σ ), σ ∈ C(S) in decision function dividing C(S) into two parts is a hypersurface, then we say training set is linearly inseparable. As the possibility measure describes the possibility that a fuzzy event occurs, some preliminaries are provided first in the following. Definition 4 [37]. Assume that the universe of discourse is a finite set and that all subsets are measurable. A distribution of possibility is a function Pos from 2 to [0,1] such that Axiom 1: Pos{∅} = 0. Axiom 2: Pos{} = 1. Axiom 3: Pos{U ∪ V } = max{Pos{U }, Pos{V }} for any disjoint subsets U and V.
194
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
When is not finite, Axiom 3 can be replaced by the following: For all index sets I, if the subsets Ui, i ∈ I are pairwise disjoint, Pos{∪i∈IUi } = supi∈I {Pos{Ui }}. Definition 5 [8]. Let a˜ and b˜ be two fuzzy numbers. Then the possibility measure of fuzzy event {a˜ ≤ b˜ } is defined by
Pos a˜ ≤ b˜ = sup min{μa˜ (x ), μb˜ (y )} .
(1)
x≤y
Particularly, when b˜ degenerates into a real number b, Eq. (1) becomes
Pos{a˜ ≤ b} = sup {μa˜ (x )|x ≤ b}.
(2)
Theorem 2 [33]. If a˜ = (r1 , r2 , r3 ) is a triangular fuzzy number, then
Pos{a˜ ≤ 0} =
1,
r1 , r1 −r2
0,
r2 ≤ 0, r1 ≤ 0, r2 > 0, r1 > 0.
(3)
Theorem 3 [33]. If a˜ = (r1 , r2 , r3 ) is a triangular fuzzy number, then for any given confidence level λ (0 < λ ≤ 1), we have
Pos{a˜ ≤ 0} ≥ λ ⇔ (1 − λ )r1 + λr2 ≤ 0.
(4)
In applications, fuzzy chance-constrained programming
min f (x )
(5)
s.t. Pos h(x ) · ξ˜ + c ≤ 0 ≥ λ
is often used, where x denotes the decision variable, f(x) denotes the objective function without a fuzzy parameter, h(x) is a function with respect to x, ξ˜ denotes a fuzzy number, c is a real number, h(x ) · ξ˜ + c ≤ 0 denotes the constraint condition, and λ(0 < λ ≤ 1) denotes the given confidence level. Theorem 4 [33]. If ξ˜ = (r1 , r2 , r3 ) in fuzzy chance-constrained programming (5) is a triangular fuzzy number, then the following programming
min f (x )
s.t. (1 − λ ) r1 · h+ (x ) − r3 · h− (x ) + λ · r2 · h(x ) + c ≤ 0 is equivalent to (5), where
h(x ), h(x ) ≥ 0, 0, h+ ( x ) = and h− (x ) = 0, h ( x ) < 0, −h(x ),
(6)
h ( x ) ≥ 0, h ( x ) < 0.
3. Two notes on SFM In this section, we provide two notes on SFM [3] as improvement of SFM in theoretical and experimental perspectives, respectively. Note 1. Theorem 2 in literature [3] aimed to discuss the Hausdorff distance between two parallel hyperplanes M1 and M2 in C(S) and proved
H ( M1 , M2 ) =
|α − β| . μ
(7)
However, it only proved
H (M1 , M2 ) = max {h(M1 , M2 ), h(M2 , M1 )} ≥
|α − β| , μ
(8)
|α − β| , μ
(9)
but missed
H (M1 , M2 ) = max {h(M1 , M2 ), h(M2 , M1 )} ≤ therefore, we provide a proof as a supplement here. Proof. For any ε > 0, select σ ε ∈ C(S) satisfying
S
σε dμ > μ − ε > 0, σε = 1,
where μ = sup σ =1
S
σ dμ , then
(10)
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
195
Table 1 Evaluation results with different methods.
σˆ =
Test number
SFM
SOCP
RHISCRC
SANP
First-fold Second-fold Third-fold Fourth-fold Fifth-fold Sixth-fold Seventh-fold Eighth-fold Ninth-fold Tenth-fold
0.57 0.60 0.60 0.55 0.57 0.53 0.58 0.60 0.57 0.62
0.57 0.60 0.57 0.55 0.38 0.57 0.63 0.57 0.50 0.67
0.37 0.37 0.62 0.50 0.75 0.62 0.50 0.37 0.37 0.62
0 0.13 0 0.13 0 0.17 0 0.14 0.13 0
Average accuracy Variance H-value
0.58 0.03 –
0.56 0.08 1
0.51 0.14 1
0.07 0.07 1
α · σε β · σε ∈ M1 , τˆ = ∈ M2 . σ d μ ε S S σε d μ
By inequality (10) we have
α · σε β · σε = |α − β| < |α − β| . σˆ − τˆ = − S σε d μ μ − ε σ d μ ε S S σε d μ
Let ker μ = {υ|
S
(11)
υ dμ = 0}, then
M1 = σˆ + ker μ, M2 = τˆ + ker μ. For any σ ∈ M1 , there exists υˆ ∈ ker μ such that σ = σˆ + υˆ . Let τ˜ = τˆ + υˆ ∈ M2 , by inequality (11) we have
h(σ , M2 ) = inf
τ ∈M2
σ − τ ≤ σ − τ˜ = σˆ − τˆ <
|α − β| . μ − ε
α −β| Thus, supσ ∈M1 h(σ , M2 ) ≤ | μ −ε . And by the arbitrariness of ε , we have
h(M1 , M2 ) = sup h(σ , M2 ) ≤ σ ∈M1
|α − β| . μ
(12)
|α − β| . μ
(13)
Similarly, we can prove that
h(M2 , M1 ) = sup h(τ , M1 ) ≤ τ ∈M2
By inequalities (12) and (13) we have
H (M1 , M2 ) = max {h(M1 , M2 ), h(M2 , M1 )} ≤ The proof is completed.
|α − β| . μ
(14)
Therefore, by inequalities (8) and (14), we have that the Hausdorff distance between two parallel hyperplanes M1 and M2 is
H ( M1 , M2 ) =
|α − β| . μ
Note 2. From Table 2 in literature [3], we can see that the feature DO of the water quality data is considered twice (namely Column 4 and Column 11) in the experiment, then the experiment results are inaccurate. As the measurements are with uncertainty, there are some deviations from the two measurements. Then, without loss of generality, we delete Column 11 and make the experiment again. The results are provided in Table 1. From Table 1 we can see that the average accuracy with SFM is 0.58, which is lower than that in Table 3 of literature [3]. The reason is that feature DO is considered twice in literature [3], which leads to a larger number of features and a higher classification accuracy. Based on SFM, we will discuss the set-based fuzzy classifications in the following.
196
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205 Table 2 Environmental quality standard for surface water (GB3838-2002). (Unit: mg/L). Grade
pH
DO
CODcr
NH3 -N
TN
TP
CODmn
I II III IV V
6∼9
7.5 6 5 3 2
15 15 20 30 40
0.015 0.5 1.0 1.5 2.0
0.2 0.5 1.0 1.5 2.0
0.02 0.1 0.2 0.3 0.4
2.0 4.0 6.0 10.0 15.0
Table 3 Division of water quality grade. Grade
Division criteria
I “Clean” II “Comparatively clean” III “Lightly polluted” IV “Moderately polluted” V “Heavily polluted” VI “Severely polluted”
Most of the indicators were not detected, and the individual detected indicator is within the standard. All the detected indicators are within the standard. Individual detected indicator exceeds the standard. At least two detected indicators exceed the standard. A considerable number of detected indicators exceed the standard. A considerable number of detected indicators exceed several times as compared with the standard.
4. Hard margin PMFSFM In this part, we only discuss the binary fuzzy classification, and we can extend it via the well-known one-vs-one decomposition strategy [9] for the multiclass fuzzy tasks. Let the fuzzy training set be
T = {(A1 , δ1 ), (A2 , δ2 ), . . . , (Al , δl )},
(15)
where Aj denotes the sets formulating the input data, δ j ∈ [−1, −0.5 ) ∪ (0.5, 1] ( j = 1, 2, . . . , l ) is the label of input Aj , where δ j ∈ [−1, −0.5 ) implies that Aj belongs to fuzzy negative class with membership degree |δ j | and δ j ∈ (0.5, 1] implies that Aj belongs to fuzzy positive class with membership degree δ j . In order to describe the possibility of input A belonging to a fuzzy positive or negative class, we introduce the triangular fuzzy number [33]
y˜ = (r1 , r2 , r3 ) =
2δ2 +δ−2 2 δ +2 , 0.5 < δ ≤ 1, , 2δ − 1, 2δ −3 δ δ
2δ2 +3 2 δ +2 , 2δ + 1, 2δ −δ −2 , −1 ≤ δ < −0.5. δ δ
(16)
Then the fuzzy training set (15) is converted to
T = {(A1 , y˜1 ), (A2 , y˜2 ), . . . , (Al , y˜l )},
(17)
where y˜ j ( j = 1, 2, . . . , l ) is the triangular fuzzy number representing the fuzzy class and the membership degree of Aj to it. Remark 1. If δ = 0.5 or δ = −0.5, then the possibilities of A belonging to the fuzzy positive class and fuzzy negative class are equivalent, and the label of input A is not clear. However, in this paper, we focus on the supervised learning rather that the unsupervised learning. Therefore, we do not consider the case that δ = 0.5 or δ = −0.5. Definition 6. We call the problem of finding a rule according to fuzzy training set (17) and inferring the output y˜ with respect to any input A a set-based fuzzy classification. Similar to the idea of SFM, the fuzzy training set (17) is mapped into the infinite dimensional Banach space C(S), and then, the set-based fuzzy classification is converted into a function-based fuzzy classification. For convenience, we re-rank the fuzzy training set (17) and obtain the following fuzzy training set
T = (σ1 , y˜1 ), (σ2 , y˜2 ), . . . , (σ p , y˜ p ),
σ p+1 , y˜ p+1 , . . . , (σl , y˜l )
(18)
where (σt , y˜t )(t = 1, 2, . . . , p) and (σi , y˜i )(i = p + 1, . . . , l ) represent the fuzzy training points labeled with a fuzzy positive class and a fuzzy negative class, respectively. Definition 7. Let λ (0 < λ ≤ 1) be the given confidence level. For the fuzzy training set (18), if there exists a real number α and a Radon measure μ such that
Pos y˜ S
σ j dμ + α ≥ 1 ≥ λ, j = 1, 2, . . . , l,
we say that fuzzy training set (18) is fuzzy linearly separable with the confidence level λ.
(19)
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
197
Remark 2. In fuzzy training set (18), if all the outputs of the fuzzy training points are 1 or −1, (18) degenerates into the classical training set
T = (σ1 , 1 ), . . . , (σ p , 1 ),
σ p+1 , −1 , . . . , (σl , −1 ) ,
(20)
and the set-based fuzzy classification degenerates into a set-based classification. Theorem 5. For the given confidence level λ(0 < λ ≤ 1), inequality (19) is equivalent to
[(1 − λ )rt3 + λrt2 ] S σt dμ + α ≥ 1, t = 1, 2, . . . , p, (1 − λ )ri1 + λri2 S σi dμ + α ≥ 1, i = p + 1, . . . , l.
(21)
Proof. The proof can be seen in Appendix A.
As triangular fuzzy numbers y˜t = rt1 , rt2 , rt3 and y˜i = ri1 , ri2 , ri3 represent the fuzzy positive class and the fuzzy negative class, respectively, from formula (16), we have
(1 − λ )rt3 + λrt2 > 0, (1 − λ )ri1 + λri2 < 0. Therefore, inequality (21) can be represented as
S σt dμ + α ≥ (1 − λ )rt3 + λrt2 , t = 1, 2, . . . , p, S σi dμ + α ≤ (1 − λ )ri1 + λri2 , i = p + 1, . . . , l.
Definition 8. For fuzzy training set (18), let k+ = mint=1,...,p (1 − λ )rt3 + λrt2 and l − = maxi= p+1,...,l (1 − λ )ri1 + λri2 . We call the two parallel hyperplanes S σt dμ + α = k+ and S σi dμ + α = l − the λ−fuzzy support hyperplanes. With this definition and Theorem 2 in literature [3], it is not difficult to infer that the Hausdorff distance between the two λ−fuzzy support hyperplanes S σ dμ + α = k+ and S σ dμ + α = l − is
|k+ − l − | , μ
(22)
where k+ > 0 and l − < 0 denote two constants. According to maximum margin algorithm (MMA), the aim is to find the optimal separating hyperplane S σ dμ + α = 0. By adjusting α , we can represent the two λ−fuzzy support hyperplanes by S σ dμ + α¯ = β and S σ dμ + α¯ = −β , and 2 the Hausdorff distance (22) is μ . With the given confidence level λ(0 < λ ≤ 1) and Definition 7, we can derive the following fuzzy chance-constrained programming for fuzzy linearly separable training set (18):
min
1
μ,α 2
μ
s.t. Pos y˜ j
S
σ j dμ + α ≥ 1 ≥ λ, j = 1, 2, . . . , l,
(23)
where λ(0 < λ ≤ 1) is the given confidence level, y˜ j is the triangular fuzzy number in fuzzy training set (18), and Pos{·} is the possibility measure of fuzzy event {·}. The algorithm solving optimization problem (23) is called the hard margin PMFSFM.
5. Soft margin PMFSFM In practice, the solution provided by optimization problem (23) is not very satisfactory. In addition, there are many fuzzy linearly inseparable problems. Therefore, we allow some classification errors on the fuzzy training set (18), and replace (23) with its soft margin version, i.e.
1 μ + C ξ j μ,α , ξ 2 j=1 s.t. Pos y˜ j σ j dμ + α ≥ 1 − ξ j ≥ λ, l
min
S
(24)
ξ j ≥ 0, j = 1, 2, . . . , l,
where ξ = (ξ1 , ξ2 , . . . , ξl ) is the slack variable, C > 0 denotes a penalty parameter, λ(0 < λ ≤ 1) is the given confidence level, y˜ j is the triangular fuzzy number in fuzzy training set (18), and Pos{·} is the possibility measure of fuzzy event {·} that is calculated by Definition 5. T
198
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
Theorem 6. For the given confidence level λ(0 < λ ≤ 1), fuzzy chance-constrained programming (24) is equivalent to the following programming:
1 μ + C ξ j 2 l
min
μ,α ,ξ
j=1
σt dμ + α ≥ 1 − ξt , ξt ≥ 0, t = 1, 2, . . . , p, S ( 1 − λ )ri1 + λ · ri2 σi dμ + α ≥ 1 − ξi , ξi ≥ 0, i = p + 1, . . . , l.
s.t. [(1 − λ )rt3 + λ · rt2 ]
(25)
S
Proof. Assume ηt = 1 − ξt and ηi = 1 − ξi . Then, it is not difficult to obtain this conclusion if we substitute ηt and ηi for 1 and −1 in Theorem 5, respectively. Theorem 7. For the fuzzy linearly inseparable training set (18), optimization problem (25) is convex. Proof. The proof can be seen in Appendix B.
Theorem 8. For the fuzzy linearly inseparable training set (18), an optimal solution μ∗ , α ∗ , ξ ists.
∗
of optimization problem (25) ex-
Proof. The proof can be seen in Appendix C. The algorithm solving optimization problem (25) is called the soft margin PMFSFM. As it is a generalization of hard margin PMFSFM, we only discuss an algorithm for soft margin PMFSFM in this paper. In order to solve the above optimization problem, we should calculate the support function of closed convex hull co(Ai ) of set Ai first. As we prove that the above optimization problem (25) is convex, we can use the CVX tool in Matlab to solve the above optimization problem, and obtain the decision function with the optimal solution. We formulate the above idea as the following algorithm. Algorithm 5.1 Step 5.1 Calculate the support functions σ1 , σ2 , . . . , σ p , σ p+1 , σ p+2 , . . . , σl of closed convex sets co(A1 ), co(A2 ), . . . , co(A p ), co(A p+1 ), co(A p+2 ), . . . , co(Al ), respectively. Step 5.2 Divide S into n regions G1 , G2 , . . . , Gn , and estimate the Radon measures μ(G1 ), μ(G2 ), . . . , μ(Gn ), respectively. Step 5.3 Set x j in Gj (do not include the boundary of Gj ) arbitrarily, and calculate nj=1 σi (x j ) · μ(G j ) as an approximate value of S σi (x )dμ (i = 1, 2, . . . , l ). Step 5.4 Solve the following optimization problem
1 μ + C ξ j 2 l
min
μ,α ,ξ
j=1
n
σ i (x j ) · μ (G j ) + α ≥ 1 − ξt , ξt ≥ 0, t = 1, 2, . . . , p, j=1 n ( 1 − λ )ri1 + λ · ri2 σ i (x j ) · μ (G j ) + α ≥ 1 − ξi , ξi ≥ 0, i = p + 1, . . . , l. j=1
s.t. [(1 − λ )rt3 + λ · rt2 ]
(26)
∗
with confidence level λ(0 < λ ≤ 1), and set its optimal solution μ∗n , αn∗ , ξ n as an approximate solution of the optimal solu-
tion μ∗ , α ∗ , ξ
∗
of optimization problem (25).
∗
Step 5.5 With the optimal solution μ∗ , α ∗ , ξ , construct the optimal separating hyperplane the decision function
f (σ ) = sgn(g(σ ) ),
S
σ dμ∗ + α ∗ = 0 to obtain
σ ∈ C ( S ),
(27)
where g(σ ) = S σ dμ∗ + α ∗ . Therefore, for any given testing data σ¯ , by equality (27) we can obtain its fuzzy class f (σ¯ ), where f (σ¯ ) = 1 or −1 implies that σ¯ belongs to fuzzy positive or negative class, respectively. And the membership function can be calculated with
⎧ −1 ⎪ ⎨ϕ+ (g(σ ) ), 0 < g(σ ) ≤−1ϕ+ (1 ), 1, g(σ ) > ϕ+ (1 ), δ ( g( σ ) ) = −1 g( σ ) < 0, ⎪ ⎩ϕ− (g(σ ) ), ϕ− (1 ) ≤ −1 −1, g(σ ) < ϕ− (−1 ),
(28)
where ϕ+ (g(σ ) ) is the regression function obtained by ε -SVR which is constructed by using the following steps: Step 5.6 Construct the training set
{(g(σ1 ), δ1 ), . . . , (g(σ p ), δ p )}. Step 5.7 Select an appropriate ε > 0 and penalty parameter C > 0, and construct ε -SVR.
(29)
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
199
Similarly, ϕ− (g(σ ) ) is obtained with the ε -SVR constructed by using the following steps: Step 5.8 Construct the training set
g
σ p+1 , δ p+1 , . . . , (g(σl ), δl ) .
(30)
Step 5.9 Select an appropriate ε > 0, penalty parameter C > 0, and construct ε -SVR. −1 Here, ϕ+ (1 ) is the value of inverse function ϕ+−1 (g(σ ) ) for ϕ+ (g(σ ) ) with g(σ ) = 1, ϕ−−1 (−1 ) represents the value of −1 inverse function ϕ− (g(σ ) ) for ϕ− (g(σ ) ) with g(σ ) = −1. For any given testing point with input σ¯ , by using (27) and (28), we can obtain f (σ¯ ) and the membership degree δ (g(σ¯ ) ). By formula (16), we can convert δ (g(σ¯ ) ) into a triangular fuzzy number y˜, which is the output of the given testing point. Remark 3. In Algorithm 5.1, defining confidence level λ and penalty parameter C can be formulated as a parameter selection problem, which can be solved via parameter selection methods, such as k-fold cross validation [31] and leave-one-out (LOO) error [28]. 6. Experiments with water quality data in fuzzy environment 6.1. Water quality data in fuzzy environment In order to evaluate the water quality of Fuyang River in Handan, China (see Fig. 1), we set 84 sample points and measured 10 times for each sample point to reduce the uncertainty. Each set-valued input Ai is composed of 10 vectors xi1 , xi2 , . . . , xi10 , that is Ai = {xi1 , xi2 , . . . , xi10 } (i = 1, 2, . . . , 84 ). Every vector xik (k = 1, 2, . . . , 10 ) contains 10 indicators (or features), such as pH, DO, etc. The DO is measured with iodimetry methods, the measurement units of DO, NH3 -N, TN, NO3 , NO2 , PO4 , TP and CODmn are all “mg/L”, the confidence level λ and penalty parameter C are selected with the 10-fold cross-validation [31] based on grid search in intervals [0.6,1] and [2−10 , 27 ], respectively. The evaluation standard of surface water (except the indicators NO3 , NO2 and PO4 ) generally adopts the environmental quality standard for surface water (GB3838-2002), because it has the advantages of timeliness and adaptability. Table 2 lists the environmental quality standard for surface water. And for sample point Ai = {xi1 , xi2 , . . . , xi10 } (i = 1, 2, . . . , 84 ), the jth indicator in xik (k = 1, 2, . . . , 10 ) is assigned to grade G (G = I, II, . . . , V ) if nG = maxG=I,II,...,V nG , where nG is the number that the jth indicator belongs to grade G satisfying VG=I nG = 10. In real-life situations, measurements may be imprecise, the observations and empirical data are often formulated in terms of natural language. These formulations are often represented by L − R fuzzy numbers a˜ [6], whose membership functions are
⎧ ⎨L aα−xa˜ , 1, μa˜ (x ) = ⎩ x−b
R β , a˜
x ≤ a, x ∈ [a, b], x ≥ b.
(31)
Shape functions L(x) and R(x) are non-negative, defined on the positive real line [0, +∞ ), non-increasing, and such that L(0 ) = R(0 ) = 1. Coefficients αa˜ and βa˜ are the left and right spreads, respectively [8]. Particular cases of L − R fuzzy numbers are the trapezoidal and triangular fuzzy numbers. With regard to the water quality evaluation, measurements are imprecise, the grades and the division of water quality grades are formulated in terms of natural language (see Table 3). With the aim of reflecting the actual situation and easiness to practice, domain experts take into account both the division criteria and the prior knowledge of water quality grade, and determine the membership functions of sample points Ai with L − R fuzzy numbers, respectively (see Figs. 2–4), where X1 is the number of detected indicators exceeding the standard, X2 is the number of detected indicators within the standard, and X3 is the number of detected indicators exceeding several times as compared with the standard. And from Fig. 2 we can see that if X1 = 7, then sample point Ai is assigned to grade V or VI with membership degree (MDs) δi = 0.8. After that, according to Fig. 4, if X3 = 5, then sample point Ai is assigned to grade V with MD δi = 0.6. Subsequently, we obtain that the sample point Ai is assigned to grade V with MD δi = 0.8, i.e. fuzzy output y˜ = (0.1, 0.6, 1.1 ). Therefore, we obtain 84 fuzzy set-valued data (Ai , δi ) (i = 1, 2, . . . , 84 ) and some of the training fuzzy set-valued data are listed in Table 4. As the water quality data are divided into six different grades, we extend the proposed PMFSFM for binary cases to accommodate this multiclass tasks via the well-known one-vs-one decomposition strategy [9]. 6.2. Experiment results In order to show the importance of introducing the membership degrees in water quality evaluation, we compare the proposed PMFSFM with some other methods, including SFM [3], second order cone programming (SOCP) approaches for handling uncertain data [24], as well as regularized hull based image set based collaborative representation and classification (RHISCRC) [39]. Meanwhile, in order to compare with the methods that represent the input sets with a statistic (such as 1 10 SVM), we compute the mean xi = 10 j=1 xi j of the 10 measurements xi j and can obtain the datum (xi , yi ) for each sample
200
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
Fig. 1. Fuyang River in Handan city [29].
point, where yi is the grade that xi belongs to. Then we can obtain the evaluation results with LPSVMfast [22] and decision tree (DT) [20]. For the testing data, 10-fold cross validation is conducted. The accuracy of every fold, the average accuracies, the variances and time consumption (seconds) are shown in Table 5, respectively. And also, with the significance level 1 − α = 0.95, t-test is used to calculate the statistical significance, and the h-values (“1” implies that there are significant differences between PMFSFM and SFM, SOCP, RHISCRC, LPSVMfast , DT, respectively) are also provided in Table 5. 6.3. Comparative analysis From Table 5, we find that the proposed PMFSFM achieves the highest average accuracy as compared with the other methods, which shows that PMFSFM can significantly improve the evaluation accuracy of water quality after introducing
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
201
Fig. 2. Membership functions.
Fig. 3. Membership functions of grades I and II.
Fig. 4. Membership functions of grades V and VI. Table 4 Some of the fuzzy water quality data for training. Set
Vec.
pH
DO
CODcr
NH3 -N
TN
NO3
NO2
PO4
TP
CODmn
Grade
Fuzzy output
A1
x11 x12 x13 x14 x15 x16 x17 x18 x19 x110
6.50 6.52 6.55 6.53 6.56 6.55 6.56 6.52 6.52 6.53
1.41 1.44 1.22 1.03 1.50 1.36 1.47 1.27 1.41 1.25
76.92 77.80 76.17 75.60 76.94 77.15 75.84 77.01 75.91 74.55
16.29 16.72 16.60 16.61 16.44 16.50 16.68 16.48 16.43 16.27
15.41 15.96 15.55 15.70 15.53 15.24 15.46 15.59 15.51 15.46
5.13 4.80 4.93 4.77 4.91 4.97 4.93 4.99 5.02 5.04
0.37 0.51 0.53 0.38 0.36 0.64 0.24 0.40 0.46 0.28
0.64 0.62 0.65 0.64 0.63 0.61 0.61 0.59 0.64 0.65
0.60 0.61 0.55 0.60 0.58 0.58 0.59 0.54 0.55 0.61
5.00 4.78 4.79 4.63 4.92 4.72 4.69 4.38 4.91 4.80
VI
(0.1,0.6,1.1)
A2
x21 x22 x23 x24 x25 x26 x27 x28 x29 x210
6.32 6.31 6.26 6.29 6.26 6.28 6.31 6.28 6.28 6.28
2.75 2.67 2.78 2.64 2.71 2.57 2.51 2.60 2.97 2.53
19.34 22.75 21.89 22.38 21.60 19.47 19.19 20.80 20.79 19.92
1.77 1.82 1.68 2.19 1.98 2.27 2.15 1.88 1.88 2.29
5.32 5.19 5.21 4.95 5.28 5.05 5.25 5.07 5.12 5.07
0.24 0.63 0.41 0.53 0.60 0.40 0.28 0.46 0.44 0.52
3.22 3.38 3.29 3.39 3.20 3.29 3.32 3.25 3.29 3.33
0.05 0.05 0.03 0.03 0.02 0.07 0.06 0.04 0.05 0.03
0.62 0.70 0.61 0.62 0.62 0.63 0.63 0.61 0.62 0.66
40.34 40.31 40.28 40.39 40.33 40.40 40.24 40.31 40.37 40.21
IV
(−1.13, 0.2, 1.53 )
202
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205 Table 5 Evaluation results with different methods. Test number
PMFSFM
SFM
SOCP
RHISCRC
LPSVMfast
DT
First-fold Second-fold Third-fold Fourth-fold Fifth-fold Sixth-fold Seventh-fold Eighth-fold Ninth-fold Tenth-fold
0.55 0.73 0.60 0.52 0.68 0.77 0.73 0.68 0.58 0.72
0.57 0.60 0.60 0.55 0.57 0.53 0.58 0.60 0.57 0.62
0.57 0.60 0.57 0.55 0.38 0.57 0.63 0.57 0.50 0.67
0.37 0.37 0.62 0.50 0.75 0.62 0.50 0.37 0.37 0.62
0.55 0.61 0.57 0.58 0.54 0.58 0.64 0.49 0.50 0.54
0.52 0.54 0.48 0.51 0.54 0.56 0.47 0.53 0.55 0.57
Average accuracy Variance H-value Time consumption
0.66 0.08
0.58 0.03 1 1018.38
0.56 0.08 1 11.85
0.51 0.14 1 4.26
0.56 0.05 1 3.81
0.53 0.02 1 27.31
558.85
MDs and possibility measure. And also, the h-values are all 1 which implies that there are signification differences between PMFSFM and other methods. Additionally, the time consumption with PMFSFM is much shorter than that with SFM, this is because the source codes of PMFSFM are optimized. Especially, PMFSFM achieves higher average accuracy than SFM. The fact is that the water quality evaluation is a setbased fuzzy classification in essence and MD plays an important role in obtaining the optimal separating hyperplane, but SFM does not consider the fuzziness in the evaluation. And also, PMFSFM achieves higher average accuracy than LPSVMfast and DT, which verifies the point of view that it is better to represent the original data with a set-valued datum but not a vector-valued datum. However, it is a pity that the accuracies of the six different methods are all relatively low. This is because the water quality data of Fuyang River are extremely unbalanced, they almost belong to grade IV or V. Additionally, PMFSFM converts the sets in Rd into functions in an infinite dimensional Banach space C(S) which leads to the problem of increase of dimension, and then the time consumption with PMFSFM is much longer. 7. Conclusions and future research directions Based on SFM, we formulate a new learning framework called PMFSFM for set-based fuzzy classifications. In this work, we introduce membership degree to formulate the degree of input point to a fuzzy class, and then a set-based fuzzy classification is converted into a function-based fuzzy classification. Based on possibility measure, we construct a PMFSFM finally. PMFSFM is an extension of SFM in fuzzy environment, which can provide both the fuzzy class and the membership degree of a given input to the fuzzy class. Experiment results about water quality evaluation show the effectiveness and superiority of PMFSFM. Whatever, we can also apply PMFSFM to other set-based fuzzy classifications. As PMFSFM converts the input sets in Rd into Banach space C(S) with support function and the classifications are converted into a function-based fuzzy representation after mapping, it can also perform function(or distribution)-based fuzzy classification tasks. In addition, as a vector can be represented by a set with a single point, the vector-based fuzzy classifications can also be handled with the proposed PMFSFM. Moreover, the improvement of PMFSFM to handle with unbalanced fuzzy set-valued data deserves further studies. Acknowledgments This work is supported by the National Natural Science Foundation of China (11671109 and 61732011), Project funded by China Postdoctoral Science Foundation (2018M640234). Appendix A. Proof of Theorem 5 Proof. For any j = 1, 2, . . . , l, assume h j (μ, α ) = −
S
σ j dμ + α ,
0, S σ j d μ + α > 0 , − S σ j dμ + α , S σ j dμ + α ≤ 0 ,
σ dμ + α , S σ j dμ + α > 0, h−j (μ, α ) = S j 0, S σ j dμ + α ≤ 0 . h+j (μ, α ) =
From Theorem 4, we have that inequality (19)
Pos y˜ j
S
σ j dμ + α ≥ 1 = Pos 1 − y˜ j σ j dμ + α ≤ 0 ≥ λ S
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
203
is equivalent to
(1 − λ ) r j1 · h+j (μ, α ) − r j3 · h−j (μ, α ) + λ · r j2 · h j (μ, α ) + 1 ≤ 0, j = 1, 2, . . . , l.
(A.1)
For the given confidence level λ, inequality (A.1) can be represented as
[(1 − λ )rt3 + λ · rt2 ] S σt dμ + α ≥ 1, t = 1, 2, . . . , p, (1 − λ )ri1 + λ · ri2 S σi dμ + α ≥ 1, i = p + 1, . . . , l.
The proof is completed.
(A.2)
Appendix B. Proof of Theorem 7
Proof. Let D be the feasible domain of optimization problem (25). As optimization problem (25) is fuzzy nonlinearly separa ble, there exists a feasible solution μ , α , ξ satisfying constraints of optimization problem (25). Thus, the feasible domain D is nonempty. By Theorem 4 in literature [3], it is not difficult to prove that the objective function 12 μ + C lj=1 ξ j is convex.
For any μ1 , α1 , ξ 1 ,
[(1 − λ )rt3 + λ · rt2 ]
μ2 , α2 , ξ 2 ∈ D, by the first constraint of optimization problem (25), we have
S
σt dμ j + α j + ξ jt ≥ 1, t = 1, 2, . . . , p, j = 1, 2.
Then, for any λ ∈ [0, 1] and t ∈ {1, 2, . . . , p}, we have
σt d[λμ1 + (1 − λ )μ2 ] + [λα1 + (1 − λ )α2 ] + [λξ1t + (1 − λ )ξ2t ] S = [(1 − λ )rt3 + λ · rt2 ] λ σt dμ1 + λα1 + (1 − λ ) σt dμ2 + λα2 + [λξ1t + (1 − λ )ξ2t ]
[(1 − λ )rt3 + λ · rt2 ]
S
S
≥ λ · 1 + ( 1 − λ ) · 1 = 1, Similarly, with the second constraint of optimization problem (25), we have that for any λ ∈ [0, 1] and i ∈ { p + 1, . . . , l }, the inequality
( 1 − λ )ri1 + λ · ri2
S
σi d[λμ1 + (1 − λ )μ2 ] + [λα1 + (1 − λ )α2 ] + [λξ1t + (1 − λ )ξ2t ] ≥ 1
holds. Further,
Thus,
λξ1 j + (1 − λ )ξ2 j ≥ 0, j ∈ {1, 2, . . . , l }.
(B.1)
λ μ1 , α1 , ξ 1 + (1 − λ ) μ2 , α2 , ξ 2 ∈ D.
(B.2)
Therefore, for the linearly inseparable training set, optimization problem (25) is convex.
Appendix C. Proof of Theorem 8
Proof. As triangular fuzzy numbers y˜t = rt1 , rt2 , rt3 and y˜i = ri1 , ri2 , ri3 represent a fuzzy positive class and a fuzzy negative class, respectively, from formula (16), we have
kt = (1 − λ )rt3 + λ · rt2 > 0, t = 1, 2, . . . , p and
li = (1 − λ )ri1 + λ · ri2 < 0, i = p + 1, . . . , l.
Considering that optimization problem (25) is fuzzy nonlinearly separable, there exists a feasible solution (μ , α , ξ ) satisfying constraints of optimization problem (25). Thus, the feasible domain D of optimization problem (25) is nonempty. As l 1 j=1 ξ j ≥ 0, then infimum 2 μ + C
204
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
1 μ + C ξ j 2 l
inf μ, α , ξ ∈ D
j=1
exists. Further, there exists a convergent subsequence
lim
n→+∞
1 μ n + C ξ n j 2 l
μn , ξ n such that
= β,
j=1
where ξ n = ξn1 , ξn2 , . . . , ξnl
and β = inf
μ, α , ξ ∈ D
{ 12 μ + C
B∗
l j=1
C(S) ∗
Let B be a Banach space itsconjugate and
space. Then subsequence μnk , ξ n of μn , ξ n and μ0 , ξ 0 such that
lim
k→+∞
ξ j }. is a B∗ space, and we have that there is a convergent
k
μnk (σ ), ξ nk = μ0 (σ ), ξ 0 , σ ∈ C (S ).
Thus, there exists N0 ∈ N such that for any ε > 0 and all nk > N0 , we have
|μnk (σ ) − μ0 (σ )| < ε , and
ξnk j − ξ0 j < ε , j = 1, 2, . . . , l.
Setting ε = ε0 , we have
−ε0 + μ0 (σ ) < μnk (σ ) < ε0 + μ0 (σ ),
σ ∈ C (S ),
and
−ε0 + ξ0 j < ξnk j < ε0 + ξ0 j , j = 1, 2, . . . , l.
For any μn , αn , ξ n ∈ D and σt+ with output y˜t = rt1 , rt2 , rt3 , from the first constraint of optimization problem (25), we have
kt
μn σt+ + αn + ξnt ≥ 1, t = 1, 2, . . . , p.
Then, the subsequence
kt
+
μnk σt
μnk , αnk , ξ nk
satisfies
+ αnk + ξnk t ≥ 1, t = 1, 2, . . . , p.
Therefore,
kt αnk ≥ 1 − (kt + 1 )ε0 − kt μ0
σt+ − ξ0t .
Assume M1 = min1≤t≤p 1 − (kt + 1 )ε0 − kt μ0 σt+ − ξ0t . Then, for all nk > N0 , we have
αnk ≥
M1 . kt
(C.1)
Similar to the proof of inequality (C.1), for a negative σi− , assume
N1 = min
p+1≤i≤l
1 + ( l i − 1 ) ε 0 − l i μ0
σi− + ξ0i .
Then for all nk > N0 , we have
αnk ≤
N1 . li
Therefore, for both the positive and negative samples σ and all nk > N0 , we have
N1 M1 ≤ αnk ≤ . kt li
M1 N , |M2 |, l 1 , |N2 | . Then, for any nk ∈ N, kt i
Assume M2 = min1≤nk ≤N0 kt αnk , N2 = max1≤nk ≤N0 li αnk , and M = max
we have αnk ≤ M.
Thus, there exists α0 ∈ R and a convergent subsequence αnk sample
σt+ ,
kt
q
of αnk
such that lim αnk = α0 . Then, for any positive q→+∞
q
we have
μ0 σt+ + α0 + ξ0t = lim kt μnkq σt+ + αnkq + ξnkq t ≥ 1. q→+∞
(C.2)
J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205
205
Similar to the proof of (C.2), for any negative sample σi− , we have
li
lim li μnkq σi− + αnkq + ξnkq i ≥ 1. μ0 σi− + α0 + ξ0i = q→ +∞
Therefore, μ0 , α0 , ξ 0 satisfies the constraints of optimization problem (25), that is
μ0 , α0 , ξ 0 ∈ D.
With the help of Theorem 4 in literature [3], it is not difficult to obtain that
1 μ0 + C ξ0 j = inf 2 μ,α ,ξ ∈D j=1 l
1 μ 0 + C ξ0 j . 2 l
j=1
Therefore, for the fuzzy linearly inseparable training set (18), the optimal solution of optimization problem (25) exists.
References [1] Y. Bazi, L. Bruzzone, F. Melgani, Image thresholding based on the em algorithm and the generalized gaussian distribution, Pattern Recognit. 40 (2) (2007) 619–634. [2] J. Camarena, V. Gregori, S. Morillas, A. Sapena, A simple fuzzy method to remove mixed gaussian-impulsive noise from color images, IEEE Trans. Fuzzy Syst. 21 (5) (2013) 971–978. [3] J. Chen, Q. Hu, X. Xue, M. Ha, L. Ma, Support function machine for set-based classification with application to water quality evaluation, Inf. Sci. 388–389 (2017) 48–61. [4] J. Chen, W. Pedrycz, M. Ha, L. Ma, Set-valued samples based support vector regression and its applications, Expert Syst. Appl. 42 (5) (2015) 2502–2509. [5] J. Chen, X. Xue, M. Ha, L. Ma, Separability of set-valued data sets and existence of support hyperplanes in the support function machine, Inf. Sci. 430–431 (2018) 432–443. [6] R. Coppi, P. D’Urso, P. Giordani, Fuzzy and possibilistic clustering for fuzzy data, Comput. Stat. Data Anal. 56 (4) (2012) 915–927. [7] C. Cortes, V. Vapnik, Support vector networks, Mach. Learn. 20 (1995) 273–297. [8] D. Dubois, H. Prade, Possibility Theory, Plenum Press, New York, 1988. [9] M. Galar, A. Fernndez, E. Barrenechea, H. Bustince, F. Herrera, An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes, Pattern Recognit. 44 (8) (2011) 1761–1776. [10] R. Gardner, Geometric Tomography, second ed., Cambridge University Press, New York, 2006. [11] C. Ho, C. Lin, Large-scale linear support vector regression, J. Mach. Learn. Res. 13 (2012) 3323–3348. [12] A. Kalantari, A. Kamsin, S. Shamshirband, A. Gani, H. Alinejad-Rokny, A.T. Chronopoulos, Computational intelligence approaches for classification of medical data: state-of-the-art, future challenges and research directions, Neurocomputing 276 (2018) 2–22. [13] A. Karambelas, T. Holloway, G. Kiesewetter, C.M. Heyes, Constraining the uncertainty in emissions over india with a regional air quality model evaluation, Atmos. Environ. 174 (2018) 194–203. [14] Z. Li, J. Liu, J. Tang, H. Lu, Robust structured subspace learning for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 37 (10) (2015) 2085–2098. [15] D. Liu, T. Li, J. Zhang, A rough set-based incremental approach for learning knowledge in dynamic incomplete information systems, Int. J. Approx. Reason. 55 (8) (2014) 1764–1786. [16] P. Liu, J. Guo, K. Chamnongthai, H. Prasetyo, Fusion of color histogram and LBP-based features for texture image retrieval and classification, Inf. Sci. 390 (2017) 95–111. [17] C. Luo, T. Li, H. Chen, H. Fujita, Z. Yi, Incremental rough set approach for hierarchical multicriteria classification, Inf. Sci. 429 (2018) 72–87. [18] O. Mangasaian, D. Musicant, Lagrangian support vector machines, J. Mach. Learn. Res. 1 (2001) 161–177. [19] K. Nódler, M. Tsakiri, M. Aloupi, G. Gatidou, A. Stasinakis, T. Licha, Evaluation of polar organic micropollutants as indicators for wastewater-related coastal water quality impairment, Environ. Pollut. 211 (2016) 282–290. [20] E.F.D. Oliveira, M.E. de Lima Tostes, C.A.O. de Freitas, J.C. Leite, Voltage thd analysis using knowledge discovery in databases with a decision tree classifier, IEEE Access 6 (2018) 1177–1188. [21] W. Rudin, Real and Complex Analysis, third ed., McGraw-Hill Book Company, New York, 1987. [22] S. Schlag, M. Schmitt, C. Schulz, Faster support vector machines, arXiv:1808.06394 (2018). [23] S. Shamshirband, A. Patel, N. Anuar, M. Kiah, A. Abraham, Cooperative game theoretic approach using fuzzy q-learning for detecting and preventing intrusions in wireless sensor networks, Eng. Appl. Artif. Intell. 32 (2014) 228–241. [24] P. Shivaswamy, C. Bhattacharyya, A. Smola, Second order cone programming approaches for handling missing and uncertain data, J. Mach. Learn. Res. 7 (2006) 1283–1314. [25] S. Sione, M.G. Wilson, M. Lado, A. González, Evaluation of soil degradation produced by rice crop systems in a vertisol, using a soil quality index, Catena 150 (2017) 79–86. [26] H. Tan, Z. Ma, S. Zhang, Z. Zhan, B. Zhang, C. Zhang, Grassmann manifold for nearest points image set classification, Pattern Recognit. Lett. 68 (2015) 190–196. [27] S. Tsang, B. Kao, K. Yip, W. Ho, S. Lee, Decision trees for uncertain data, IEEE Trans. Knowl. Data Eng. 23 (1) (2011) 64–78. [28] Vicente, T.F. Yago, M. Hoai, D. Samaras, Leave-one-out kernel optimization for shadow detection and removal, IEEE Trans. Pattern Anal. Mach. Intell. 40 (3) (2018) 682–695. [29] L. Wang, Study on the Water Environmental Capacity of Fuyang River in Handan (in Chinese), Master’s thesis, Hebei University of Science and Technology, 2014. [30] R. Wang, S. Shan, X. Chen, Q. Dai, W. Gao, Manifold–manifold distance and its application to face recognition with image sets, IEEE Trans. Image Process. 21 (10) (2012) 4466–4479. [31] T. Wong, Parametric methods for comparing the performance of two classification algorithms evaluated by k-fold cross validation on multiple data sets, Pattern Recognit. 65 (2017) 97–107. [32] P. Xu, F. Davoine, H. Zha, T. Denoeux, Evidential calibration of binary SVM classifiers, Int. J. Approx. Reason. 72 (2016) 55–70. [33] Z. Yang, N. Deng, Fuzzy support vector classification based on possibility theory, Pattern Recognit. Artif. Intell. 20 (1) (2007) 7–14. [34] J. Yoneyama, Robust sampled-data stabilization of uncertain fuzzy systems via input delay approach, Inf. Sci. 198 (2012) 169–176. [35] Y. Yu, W. Pedrycz, D. Miao, Neighborhood rough sets based multi-label classification for automatic image annotation, Int. J. Approx. Reason. 54 (9) (2013) 1373–1387. [36] X. Yue, Y. Chen, D. Miao, J. Qian, Tri-partition neighborhood covering reduction for robust classification, Int. J. Approx. Reason. 83 (2017) 371–384. [37] L. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets Syst. 1 (1) (1999) 9–34. [38] P. Zheng, Z. Zhao, J. Gao, X. Wu, A set-level joint sparse representation for image set classification, Inf. Sci. 448 (2018) 75–90. [39] P. Zhu, W. Zuo, L. Zhang, S. Shiu, D. Zhang, Image set based collaborative representation for face recognition, IEEE Trans. Inf. Forensics Secur. 9 (7) (2014) 1120–1132.