Possibility measure based fuzzy support function machine for set-based fuzzy classifications

Information Sciences 483 (2019) 192–205 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

Download PDF

1MB Sizes 0 Downloads 24 Views

Report

PDF Reader
Full Text

Information Sciences 483 (2019) 192–205

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Possibility measure based fuzzy support function machine for set-based fuzzy classiﬁcationsR Jiqiang Chen a,b,c, Qinghua Hu a, Xiaoping Xue b,∗, Minghu Ha c, Litao Ma c, Xuchang Zhang c, Zhipeng Yu c a

School of Computer Science and Technology, Tianjin University, Tianjin 300072, PR China Department of Mathematics, Harbin Institute of Technology, Harbin 150001, PR China c School of Mathematics and Physics, Hebei University of Engineering, Handan 056038, PR China b

a r t i c l e

i n f o

Article history: Received 31 July 2018 Revised 5 January 2019 Accepted 9 January 2019 Available online 11 January 2019 Keywords: Support function machine Possibility measure Membership degree Support function Set-valued data

a b s t r a c t In real-world applications, there are many set-based fuzzy classiﬁcations. However, the current researches have some limitations in solving such classiﬁcations. Therefore, a method called possibility measure based fuzzy support function machine (PMFSFM) is discussed in this work. Firstly, two notes are provided as improvement of SFM in theoretical and experimental perspective. Secondly, a set-based fuzzy classiﬁcation in Euclidean space Rd is converted into a function-based task in Banach space C(S) based on support function and membership degree. Thirdly, a fuzzy optimization problem based on possibility measure is derived and some properties are discussed. Subsequently, a PMFSFM for set-based fuzzy classiﬁcation is constructed, and it can give both the fuzzy class and the membership degree of a given input to the fuzzy class. Experiment results concerning water quality evaluation in fuzzy environment show the effectiveness of PMFSFM. © 2019 Published by Elsevier Inc.

1. Introduction In real-world applications, there are many set-based fuzzy classiﬁcation schemes, where the sample is represented with a set of vectors and it does not clearly belong to one of the decision classes, but belong to each class with a certain membership degree. For example, in water quality evaluation, on the one hand, multiple repeated measurements are often used to reduce the value uncertainty [19], and then we can obtain a set of vectors describing the same sample if a feature vector is extracted from each measurement. On the other hand, the evaluation grades of water quality (such as Clean, Comparatively clean, etc.) are described by fuzzy language with imprecise boundaries, and the water quality data are not exactly assigned to any evaluation grade, but belong to each evaluation grade with a certain membership degree. Therefore, the water quality evaluation is a set-based fuzzy classiﬁcation. In other applications, there are also set-based fuzzy classiﬁcations [13,25]. Now, this kind of classiﬁcations is becoming an important research topic in the ﬁelds of machine learning and pattern recognition [3,15–17,23,27,34,35,38,39]. Currently, there are two kinds of methods to deal with these classiﬁcations. One is to compute the statistics of the original data (such as mean and median) and describe the original data with a vector, such as SVMs [7,11,18,32] and other R ∗

Fully documented templates are available in the elsarticle package on CTAN. Corresponding author. E-mail addresses: [email protected] (J. Chen), [email protected], [email protected] (X. Xue).

https://doi.org/10.1016/j.ins.2019.01.022 0020-0255/© 2019 Published by Elsevier Inc.

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

193

methods [4,12,36,39]. In this kind of methods, a set-valued object is represented by a vector-valued sample, but some classiﬁcation information (such as the variance reﬂecting ﬂuctuation information) may be lost in the reprocessing. The other one is to make some assumptions for the datasets in advance, such as obeying single Gauss [1], Gauss mixed model [2], subspace [14] and manifold model [26,30], and then develop set-based classiﬁers directly. However, some classiﬁcation tasks may lie on the assumptions, others may not. In this sense, these methods do not work. To overcome the shortcomings of the above methods, Chen et al. [3] proposed a support function machine (SFM) for setbased classiﬁcation, and discussed the separability of set-valued data sets and the existence of support hyperplanes [5]. SFM represents the set-valued data by their support functions, respectively. Then the sets are converted into functions that make up an inﬁnite dimensional Banach space C(S) (whose elements are the continuous functions deﬁned on the unit ball S in Rd ), and then SFM is established in the new space. Consequently, SFM not only retains the classiﬁcation information of the original set-valued data, but also does not need any assumption in advance. Compared with other methods, SFM achieves a higher accuracy in water quality assessment. However, since SFM does not consider the fuzziness of water quality evaluation grades, it can not deal with the water quality evaluation effectively. Therefore, in order to further improve the classiﬁcation accuracy, in this paper, membership degree is introduced to describe the degree that an input point belongs to a fuzzy class, possibility measure is introduced to describe the possibility that a fuzzy event occurs. Then, an extended SFM called possibility measure based fuzzy support function machine (PMFSFM) is established for set-based fuzzy classiﬁcations. Consequently, the proposed PMFSFM can provide both the fuzzy class and the membership degree of a given input to the fuzzy class, and achieves the highest average accuracy comparing with the other methods. And also, the source codes of SFM are optimized, then the time consumption with PMFSFM is much shorter than that with SFM. Additionally, PMFSFM achieves higher average accuracy than those with LPSVMfast and DT, which veriﬁes the point of view that it is better to represent the original data with a set-valued datum but not a vector-valued datum as discussed above. Such as SVMs [7,11,18,32] and other methods [4,12,36,39]. This paper is structured as follows: Section 2 provides some preliminaries about SFM and possibility measure. Section 3 provides two notes as improvement of SFM in both theoretical and experimental perspectives. Sections 4 and 5 propose a hard margin and a soft margin PMFSFM, and then discuss some of their properties, respectively. Section 6 provides experiments concerning water quality evaluation in fuzzy environment. Finally, Section 7 draws conclusions and brieﬂy discusses the future research directions. 2. Preliminaries In this section, we provide some preliminaries about SFM and possibility measure to better understand the concepts presented in this paper. Deﬁnition 1 [10]. The support function σA : Rd → R of a non-empty closed convex set A in Rd is given by σA (x ) = supy∈A {x, y}, x ∈ Rd . It follows directly from Deﬁnition 1 that σA (x ) is convex, positive homogeneous σA (α x ) = ασA (x ), α ≥ 0, x ∈ Rd and subadditive σA (x + y ) ≤ σA (x ) + σA (y ), x, y ∈ Rd . And also σα A (x ) = ασA (x ), α ≥ 0, x ∈ Rd and σA+B (x ) = σA (x ) + σB (x ), x ∈ Rd . For simpliﬁcation, we also denote σA (x ) by σ A . As the Banach space C(S) is not an inner space, the hyperplane in SFM is deﬁned via the following Riesz representation theorem in Banach space. Theorem 1 [21]. Assume that X is a locally compact Hausdorff space, then for any bounded linear functional on C0 (X), there is one and only one complex regular Borel measure μ such that (σ ) = X σ dμ, σ ∈ C0 (X ) and = |μ|(X ), where |μ|(X ) = n sup i=1 |μ (Ai )|, Ai is a part it ion o f X, i = 1, 2, . . . , n, n ≥ 1 is the total variation of μ, denoted by μ simply.

2 [3]. We deﬁne M σ ∈ C ( S ) S σ dμ = α , α∈R the hyperplane in C(S), M⊥ ∗ μ ∈ C (S ) S σ dμ = 0, σ ∈ M the orthocomplement of M, μ the vertical of hyperplane M, respectively. For simplicity, we denote ∫S σ dμ by the hyperplane μ(σ ). Deﬁnition

Deﬁnition 3 [3]. If g(σ ), σ ∈ C(S) in decision function dividing C(S) into two parts is a hyperplane, then we say training set is linearly separable. If g(σ ), σ ∈ C(S) in decision function dividing C(S) into two parts is a hypersurface, then we say training set is linearly inseparable. As the possibility measure describes the possibility that a fuzzy event occurs, some preliminaries are provided ﬁrst in the following. Deﬁnition 4 [37]. Assume that the universe of discourse is a ﬁnite set and that all subsets are measurable. A distribution of possibility is a function Pos from 2 to [0,1] such that Axiom 1: Pos{∅} = 0. Axiom 2: Pos{} = 1. Axiom 3: Pos{U ∪ V } = max{Pos{U }, Pos{V }} for any disjoint subsets U and V.

194

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

When is not ﬁnite, Axiom 3 can be replaced by the following: For all index sets I, if the subsets Ui, i ∈ I are pairwise disjoint, Pos{∪i∈IUi } = supi∈I {Pos{Ui }}. Deﬁnition 5 [8]. Let a˜ and b˜ be two fuzzy numbers. Then the possibility measure of fuzzy event {a˜ ≤ b˜ } is deﬁned by

Pos a˜ ≤ b˜ = sup min{μa˜ (x ), μb˜ (y )} .

(1)

x≤y

Particularly, when b˜ degenerates into a real number b, Eq. (1) becomes

Pos{a˜ ≤ b} = sup {μa˜ (x )|x ≤ b}.

(2)

Theorem 2 [33]. If a˜ = (r1 , r2 , r3 ) is a triangular fuzzy number, then

Pos{a˜ ≤ 0} =

1,

r1 , r1 −r2

0,

r2 ≤ 0, r1 ≤ 0, r2 > 0, r1 > 0.

(3)

Theorem 3 [33]. If a˜ = (r1 , r2 , r3 ) is a triangular fuzzy number, then for any given conﬁdence level λ (0 < λ ≤ 1), we have

Pos{a˜ ≤ 0} ≥ λ ⇔ (1 − λ )r1 + λr2 ≤ 0.

(4)

In applications, fuzzy chance-constrained programming

min f (x )

(5)

s.t. Pos h(x ) · ξ˜ + c ≤ 0 ≥ λ

is often used, where x denotes the decision variable, f(x) denotes the objective function without a fuzzy parameter, h(x) is a function with respect to x, ξ˜ denotes a fuzzy number, c is a real number, h(x ) · ξ˜ + c ≤ 0 denotes the constraint condition, and λ(0 < λ ≤ 1) denotes the given conﬁdence level. Theorem 4 [33]. If ξ˜ = (r1 , r2 , r3 ) in fuzzy chance-constrained programming (5) is a triangular fuzzy number, then the following programming

min f (x )

s.t. (1 − λ ) r1 · h+ (x ) − r3 · h− (x ) + λ · r2 · h(x ) + c ≤ 0 is equivalent to (5), where

h(x ), h(x ) ≥ 0, 0, h+ ( x ) = and h− (x ) = 0, h ( x ) < 0, −h(x ),

(6)

h ( x ) ≥ 0, h ( x ) < 0.

3. Two notes on SFM In this section, we provide two notes on SFM [3] as improvement of SFM in theoretical and experimental perspectives, respectively. Note 1. Theorem 2 in literature [3] aimed to discuss the Hausdorff distance between two parallel hyperplanes M1 and M2 in C(S) and proved

H ( M1 , M2 ) =

|α − β| . μ

(7)

However, it only proved

H (M1 , M2 ) = max {h(M1 , M2 ), h(M2 , M1 )} ≥

|α − β| , μ

(8)

|α − β| , μ

(9)

but missed

H (M1 , M2 ) = max {h(M1 , M2 ), h(M2 , M1 )} ≤ therefore, we provide a proof as a supplement here. Proof. For any ε > 0, select σ ε ∈ C(S) satisfying

S

σε dμ > μ − ε > 0, σε = 1,

where μ = sup σ =1

S

σ dμ , then

(10)

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

195

Table 1 Evaluation results with different methods.

σˆ =

Test number

SFM

SOCP

RHISCRC

SANP

First-fold Second-fold Third-fold Fourth-fold Fifth-fold Sixth-fold Seventh-fold Eighth-fold Ninth-fold Tenth-fold

0.57 0.60 0.60 0.55 0.57 0.53 0.58 0.60 0.57 0.62

0.57 0.60 0.57 0.55 0.38 0.57 0.63 0.57 0.50 0.67

0.37 0.37 0.62 0.50 0.75 0.62 0.50 0.37 0.37 0.62

0 0.13 0 0.13 0 0.17 0 0.14 0.13 0

Average accuracy Variance H-value

0.58 0.03 –

0.56 0.08 1

0.51 0.14 1

0.07 0.07 1

α · σε β · σε ∈ M1 , τˆ = ∈ M2 . σ d μ ε S S σε d μ

By inequality (10) we have

α · σε β · σε = |α − β| < |α − β| . σˆ − τˆ = − S σε d μ μ − ε σ d μ ε S S σε d μ

Let ker μ = {υ|

S

(11)

υ dμ = 0}, then

M1 = σˆ + ker μ, M2 = τˆ + ker μ. For any σ ∈ M1 , there exists υˆ ∈ ker μ such that σ = σˆ + υˆ . Let τ˜ = τˆ + υˆ ∈ M2 , by inequality (11) we have

h(σ , M2 ) = inf

τ ∈M2

σ − τ ≤ σ − τ˜ = σˆ − τˆ <

|α − β| . μ − ε

α −β| Thus, supσ ∈M1 h(σ , M2 ) ≤ | μ −ε . And by the arbitrariness of ε , we have

h(M1 , M2 ) = sup h(σ , M2 ) ≤ σ ∈M1

|α − β| . μ

(12)

|α − β| . μ

(13)

Similarly, we can prove that

h(M2 , M1 ) = sup h(τ , M1 ) ≤ τ ∈M2

By inequalities (12) and (13) we have

H (M1 , M2 ) = max {h(M1 , M2 ), h(M2 , M1 )} ≤ The proof is completed.

|α − β| . μ

(14)

Therefore, by inequalities (8) and (14), we have that the Hausdorff distance between two parallel hyperplanes M1 and M2 is

H ( M1 , M2 ) =

|α − β| . μ

Note 2. From Table 2 in literature [3], we can see that the feature DO of the water quality data is considered twice (namely Column 4 and Column 11) in the experiment, then the experiment results are inaccurate. As the measurements are with uncertainty, there are some deviations from the two measurements. Then, without loss of generality, we delete Column 11 and make the experiment again. The results are provided in Table 1. From Table 1 we can see that the average accuracy with SFM is 0.58, which is lower than that in Table 3 of literature [3]. The reason is that feature DO is considered twice in literature [3], which leads to a larger number of features and a higher classiﬁcation accuracy. Based on SFM, we will discuss the set-based fuzzy classiﬁcations in the following.

196

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205 Table 2 Environmental quality standard for surface water (GB3838-2002). (Unit: mg/L). Grade

pH

DO

CODcr

NH3 -N

TN

TP

CODmn

I II III IV V

6∼9

7.5 6 5 3 2

15 15 20 30 40

0.015 0.5 1.0 1.5 2.0

0.2 0.5 1.0 1.5 2.0

0.02 0.1 0.2 0.3 0.4

2.0 4.0 6.0 10.0 15.0

Table 3 Division of water quality grade. Grade

Division criteria

I “Clean” II “Comparatively clean” III “Lightly polluted” IV “Moderately polluted” V “Heavily polluted” VI “Severely polluted”

Most of the indicators were not detected, and the individual detected indicator is within the standard. All the detected indicators are within the standard. Individual detected indicator exceeds the standard. At least two detected indicators exceed the standard. A considerable number of detected indicators exceed the standard. A considerable number of detected indicators exceed several times as compared with the standard.

4. Hard margin PMFSFM In this part, we only discuss the binary fuzzy classiﬁcation, and we can extend it via the well-known one-vs-one decomposition strategy [9] for the multiclass fuzzy tasks. Let the fuzzy training set be

T = {(A1 , δ1 ), (A2 , δ2 ), . . . , (Al , δl )},

(15)

where Aj denotes the sets formulating the input data, δ j ∈ [−1, −0.5 ) ∪ (0.5, 1] ( j = 1, 2, . . . , l ) is the label of input Aj , where δ j ∈ [−1, −0.5 ) implies that Aj belongs to fuzzy negative class with membership degree |δ j | and δ j ∈ (0.5, 1] implies that Aj belongs to fuzzy positive class with membership degree δ j . In order to describe the possibility of input A belonging to a fuzzy positive or negative class, we introduce the triangular fuzzy number [33]

y˜ = (r1 , r2 , r3 ) =

2δ2 +δ−2 2 δ +2 , 0.5 < δ ≤ 1, , 2δ − 1, 2δ −3 δ δ

2δ2 +3 2 δ +2 , 2δ + 1, 2δ −δ −2 , −1 ≤ δ < −0.5. δ δ

(16)

Then the fuzzy training set (15) is converted to

T = {(A1 , y˜1 ), (A2 , y˜2 ), . . . , (Al , y˜l )},

(17)

where y˜ j ( j = 1, 2, . . . , l ) is the triangular fuzzy number representing the fuzzy class and the membership degree of Aj to it. Remark 1. If δ = 0.5 or δ = −0.5, then the possibilities of A belonging to the fuzzy positive class and fuzzy negative class are equivalent, and the label of input A is not clear. However, in this paper, we focus on the supervised learning rather that the unsupervised learning. Therefore, we do not consider the case that δ = 0.5 or δ = −0.5. Deﬁnition 6. We call the problem of ﬁnding a rule according to fuzzy training set (17) and inferring the output y˜ with respect to any input A a set-based fuzzy classiﬁcation. Similar to the idea of SFM, the fuzzy training set (17) is mapped into the inﬁnite dimensional Banach space C(S), and then, the set-based fuzzy classiﬁcation is converted into a function-based fuzzy classiﬁcation. For convenience, we re-rank the fuzzy training set (17) and obtain the following fuzzy training set

T = (σ1 , y˜1 ), (σ2 , y˜2 ), . . . , (σ p , y˜ p ),

σ p+1 , y˜ p+1 , . . . , (σl , y˜l )

(18)

where (σt , y˜t )(t = 1, 2, . . . , p) and (σi , y˜i )(i = p + 1, . . . , l ) represent the fuzzy training points labeled with a fuzzy positive class and a fuzzy negative class, respectively. Deﬁnition 7. Let λ (0 < λ ≤ 1) be the given conﬁdence level. For the fuzzy training set (18), if there exists a real number α and a Radon measure μ such that

Pos y˜ S

σ j dμ + α ≥ 1 ≥ λ, j = 1, 2, . . . , l,

we say that fuzzy training set (18) is fuzzy linearly separable with the conﬁdence level λ.

(19)

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

197

Remark 2. In fuzzy training set (18), if all the outputs of the fuzzy training points are 1 or −1, (18) degenerates into the classical training set

T = (σ1 , 1 ), . . . , (σ p , 1 ),

σ p+1 , −1 , . . . , (σl , −1 ) ,

(20)

and the set-based fuzzy classiﬁcation degenerates into a set-based classiﬁcation. Theorem 5. For the given conﬁdence level λ(0 < λ ≤ 1), inequality (19) is equivalent to

[(1 − λ )rt3 + λrt2 ] S σt dμ + α ≥ 1, t = 1, 2, . . . , p, (1 − λ )ri1 + λri2 S σi dμ + α ≥ 1, i = p + 1, . . . , l.

(21)

Proof. The proof can be seen in Appendix A.

As triangular fuzzy numbers y˜t = rt1 , rt2 , rt3 and y˜i = ri1 , ri2 , ri3 represent the fuzzy positive class and the fuzzy negative class, respectively, from formula (16), we have

(1 − λ )rt3 + λrt2 > 0, (1 − λ )ri1 + λri2 < 0. Therefore, inequality (21) can be represented as

S σt dμ + α ≥ (1 − λ )rt3 + λrt2 , t = 1, 2, . . . , p, S σi dμ + α ≤ (1 − λ )ri1 + λri2 , i = p + 1, . . . , l.

Deﬁnition 8. For fuzzy training set (18), let k+ = mint=1,...,p (1 − λ )rt3 + λrt2 and l − = maxi= p+1,...,l (1 − λ )ri1 + λri2 . We call the two parallel hyperplanes S σt dμ + α = k+ and S σi dμ + α = l − the λ−fuzzy support hyperplanes. With this deﬁnition and Theorem 2 in literature [3], it is not diﬃcult to infer that the Hausdorff distance between the two λ−fuzzy support hyperplanes S σ dμ + α = k+ and S σ dμ + α = l − is

|k+ − l − | , μ

(22)

where k+ > 0 and l − < 0 denote two constants. According to maximum margin algorithm (MMA), the aim is to ﬁnd the optimal separating hyperplane S σ dμ + α = 0. By adjusting α , we can represent the two λ−fuzzy support hyperplanes by S σ dμ + α¯ = β and S σ dμ + α¯ = −β , and 2 the Hausdorff distance (22) is μ . With the given conﬁdence level λ(0 < λ ≤ 1) and Deﬁnition 7, we can derive the following fuzzy chance-constrained programming for fuzzy linearly separable training set (18):

min

1

μ,α 2

μ

s.t. Pos y˜ j

S

σ j dμ + α ≥ 1 ≥ λ, j = 1, 2, . . . , l,

(23)

where λ(0 < λ ≤ 1) is the given conﬁdence level, y˜ j is the triangular fuzzy number in fuzzy training set (18), and Pos{·} is the possibility measure of fuzzy event {·}. The algorithm solving optimization problem (23) is called the hard margin PMFSFM.

5. Soft margin PMFSFM In practice, the solution provided by optimization problem (23) is not very satisfactory. In addition, there are many fuzzy linearly inseparable problems. Therefore, we allow some classiﬁcation errors on the fuzzy training set (18), and replace (23) with its soft margin version, i.e.

1 μ + C ξ j μ,α , ξ 2 j=1 s.t. Pos y˜ j σ j dμ + α ≥ 1 − ξ j ≥ λ, l

min

S

(24)

ξ j ≥ 0, j = 1, 2, . . . , l,

where ξ = (ξ1 , ξ2 , . . . , ξl ) is the slack variable, C > 0 denotes a penalty parameter, λ(0 < λ ≤ 1) is the given conﬁdence level, y˜ j is the triangular fuzzy number in fuzzy training set (18), and Pos{·} is the possibility measure of fuzzy event {·} that is calculated by Deﬁnition 5. T

198

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

Theorem 6. For the given conﬁdence level λ(0 < λ ≤ 1), fuzzy chance-constrained programming (24) is equivalent to the following programming:

1 μ + C ξ j 2 l

min

μ,α ,ξ

j=1

σt dμ + α ≥ 1 − ξt , ξt ≥ 0, t = 1, 2, . . . , p, S ( 1 − λ )ri1 + λ · ri2 σi dμ + α ≥ 1 − ξi , ξi ≥ 0, i = p + 1, . . . , l.

s.t. [(1 − λ )rt3 + λ · rt2 ]

(25)

S

Proof. Assume ηt = 1 − ξt and ηi = 1 − ξi . Then, it is not diﬃcult to obtain this conclusion if we substitute ηt and ηi for 1 and −1 in Theorem 5, respectively. Theorem 7. For the fuzzy linearly inseparable training set (18), optimization problem (25) is convex. Proof. The proof can be seen in Appendix B.

Theorem 8. For the fuzzy linearly inseparable training set (18), an optimal solution μ∗ , α ∗ , ξ ists.

∗

of optimization problem (25) ex-

Proof. The proof can be seen in Appendix C. The algorithm solving optimization problem (25) is called the soft margin PMFSFM. As it is a generalization of hard margin PMFSFM, we only discuss an algorithm for soft margin PMFSFM in this paper. In order to solve the above optimization problem, we should calculate the support function of closed convex hull co(Ai ) of set Ai ﬁrst. As we prove that the above optimization problem (25) is convex, we can use the CVX tool in Matlab to solve the above optimization problem, and obtain the decision function with the optimal solution. We formulate the above idea as the following algorithm. Algorithm 5.1 Step 5.1 Calculate the support functions σ1 , σ2 , . . . , σ p , σ p+1 , σ p+2 , . . . , σl of closed convex sets co(A1 ), co(A2 ), . . . , co(A p ), co(A p+1 ), co(A p+2 ), . . . , co(Al ), respectively. Step 5.2 Divide S into n regions G1 , G2 , . . . , Gn , and estimate the Radon measures μ(G1 ), μ(G2 ), . . . , μ(Gn ), respectively. Step 5.3 Set x j in Gj (do not include the boundary of Gj ) arbitrarily, and calculate nj=1 σi (x j ) · μ(G j ) as an approximate value of S σi (x )dμ (i = 1, 2, . . . , l ). Step 5.4 Solve the following optimization problem

1 μ + C ξ j 2 l

min

μ,α ,ξ

j=1

n

σ i (x j ) · μ (G j ) + α ≥ 1 − ξt , ξt ≥ 0, t = 1, 2, . . . , p, j=1 n ( 1 − λ )ri1 + λ · ri2 σ i (x j ) · μ (G j ) + α ≥ 1 − ξi , ξi ≥ 0, i = p + 1, . . . , l. j=1

s.t. [(1 − λ )rt3 + λ · rt2 ]

(26)

∗

with conﬁdence level λ(0 < λ ≤ 1), and set its optimal solution μ∗n , αn∗ , ξ n as an approximate solution of the optimal solu-

tion μ∗ , α ∗ , ξ

∗

of optimization problem (25).

∗

Step 5.5 With the optimal solution μ∗ , α ∗ , ξ , construct the optimal separating hyperplane the decision function

f (σ ) = sgn(g(σ ) ),

S

σ dμ∗ + α ∗ = 0 to obtain

σ ∈ C ( S ),

(27)

where g(σ ) = S σ dμ∗ + α ∗ . Therefore, for any given testing data σ¯ , by equality (27) we can obtain its fuzzy class f (σ¯ ), where f (σ¯ ) = 1 or −1 implies that σ¯ belongs to fuzzy positive or negative class, respectively. And the membership function can be calculated with

⎧ −1 ⎪ ⎨ϕ+ (g(σ ) ), 0 < g(σ ) ≤−1ϕ+ (1 ), 1, g(σ ) > ϕ+ (1 ), δ ( g( σ ) ) = −1 g( σ ) < 0, ⎪ ⎩ϕ− (g(σ ) ), ϕ− (1 ) ≤ −1 −1, g(σ ) < ϕ− (−1 ),

(28)

where ϕ+ (g(σ ) ) is the regression function obtained by ε -SVR which is constructed by using the following steps: Step 5.6 Construct the training set

{(g(σ1 ), δ1 ), . . . , (g(σ p ), δ p )}. Step 5.7 Select an appropriate ε > 0 and penalty parameter C > 0, and construct ε -SVR.

(29)

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

199

Similarly, ϕ− (g(σ ) ) is obtained with the ε -SVR constructed by using the following steps: Step 5.8 Construct the training set

g

σ p+1 , δ p+1 , . . . , (g(σl ), δl ) .

(30)

Step 5.9 Select an appropriate ε > 0, penalty parameter C > 0, and construct ε -SVR. −1 Here, ϕ+ (1 ) is the value of inverse function ϕ+−1 (g(σ ) ) for ϕ+ (g(σ ) ) with g(σ ) = 1, ϕ−−1 (−1 ) represents the value of −1 inverse function ϕ− (g(σ ) ) for ϕ− (g(σ ) ) with g(σ ) = −1. For any given testing point with input σ¯ , by using (27) and (28), we can obtain f (σ¯ ) and the membership degree δ (g(σ¯ ) ). By formula (16), we can convert δ (g(σ¯ ) ) into a triangular fuzzy number y˜, which is the output of the given testing point. Remark 3. In Algorithm 5.1, deﬁning conﬁdence level λ and penalty parameter C can be formulated as a parameter selection problem, which can be solved via parameter selection methods, such as k-fold cross validation [31] and leave-one-out (LOO) error [28]. 6. Experiments with water quality data in fuzzy environment 6.1. Water quality data in fuzzy environment In order to evaluate the water quality of Fuyang River in Handan, China (see Fig. 1), we set 84 sample points and measured 10 times for each sample point to reduce the uncertainty. Each set-valued input Ai is composed of 10 vectors xi1 , xi2 , . . . , xi10 , that is Ai = {xi1 , xi2 , . . . , xi10 } (i = 1, 2, . . . , 84 ). Every vector xik (k = 1, 2, . . . , 10 ) contains 10 indicators (or features), such as pH, DO, etc. The DO is measured with iodimetry methods, the measurement units of DO, NH3 -N, TN, NO3 , NO2 , PO4 , TP and CODmn are all “mg/L”, the conﬁdence level λ and penalty parameter C are selected with the 10-fold cross-validation [31] based on grid search in intervals [0.6,1] and [2−10 , 27 ], respectively. The evaluation standard of surface water (except the indicators NO3 , NO2 and PO4 ) generally adopts the environmental quality standard for surface water (GB3838-2002), because it has the advantages of timeliness and adaptability. Table 2 lists the environmental quality standard for surface water. And for sample point Ai = {xi1 , xi2 , . . . , xi10 } (i = 1, 2, . . . , 84 ), the jth indicator in xik (k = 1, 2, . . . , 10 ) is assigned to grade G (G = I, II, . . . , V ) if nG = maxG=I,II,...,V nG , where nG is the number that the jth indicator belongs to grade G satisfying VG=I nG = 10. In real-life situations, measurements may be imprecise, the observations and empirical data are often formulated in terms of natural language. These formulations are often represented by L − R fuzzy numbers a˜ [6], whose membership functions are

⎧ ⎨L aα−xa˜ , 1, μa˜ (x ) = ⎩ x−b

R β , a˜

x ≤ a, x ∈ [a, b], x ≥ b.

(31)

Shape functions L(x) and R(x) are non-negative, deﬁned on the positive real line [0, +∞ ), non-increasing, and such that L(0 ) = R(0 ) = 1. Coeﬃcients αa˜ and βa˜ are the left and right spreads, respectively [8]. Particular cases of L − R fuzzy numbers are the trapezoidal and triangular fuzzy numbers. With regard to the water quality evaluation, measurements are imprecise, the grades and the division of water quality grades are formulated in terms of natural language (see Table 3). With the aim of reﬂecting the actual situation and easiness to practice, domain experts take into account both the division criteria and the prior knowledge of water quality grade, and determine the membership functions of sample points Ai with L − R fuzzy numbers, respectively (see Figs. 2–4), where X1 is the number of detected indicators exceeding the standard, X2 is the number of detected indicators within the standard, and X3 is the number of detected indicators exceeding several times as compared with the standard. And from Fig. 2 we can see that if X1 = 7, then sample point Ai is assigned to grade V or VI with membership degree (MDs) δi = 0.8. After that, according to Fig. 4, if X3 = 5, then sample point Ai is assigned to grade V with MD δi = 0.6. Subsequently, we obtain that the sample point Ai is assigned to grade V with MD δi = 0.8, i.e. fuzzy output y˜ = (0.1, 0.6, 1.1 ). Therefore, we obtain 84 fuzzy set-valued data (Ai , δi ) (i = 1, 2, . . . , 84 ) and some of the training fuzzy set-valued data are listed in Table 4. As the water quality data are divided into six different grades, we extend the proposed PMFSFM for binary cases to accommodate this multiclass tasks via the well-known one-vs-one decomposition strategy [9]. 6.2. Experiment results In order to show the importance of introducing the membership degrees in water quality evaluation, we compare the proposed PMFSFM with some other methods, including SFM [3], second order cone programming (SOCP) approaches for handling uncertain data [24], as well as regularized hull based image set based collaborative representation and classiﬁcation (RHISCRC) [39]. Meanwhile, in order to compare with the methods that represent the input sets with a statistic (such as 1 10 SVM), we compute the mean xi = 10 j=1 xi j of the 10 measurements xi j and can obtain the datum (xi , yi ) for each sample

200

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

Fig. 1. Fuyang River in Handan city [29].

point, where yi is the grade that xi belongs to. Then we can obtain the evaluation results with LPSVMfast [22] and decision tree (DT) [20]. For the testing data, 10-fold cross validation is conducted. The accuracy of every fold, the average accuracies, the variances and time consumption (seconds) are shown in Table 5, respectively. And also, with the signiﬁcance level 1 − α = 0.95, t-test is used to calculate the statistical signiﬁcance, and the h-values (“1” implies that there are signiﬁcant differences between PMFSFM and SFM, SOCP, RHISCRC, LPSVMfast , DT, respectively) are also provided in Table 5. 6.3. Comparative analysis From Table 5, we ﬁnd that the proposed PMFSFM achieves the highest average accuracy as compared with the other methods, which shows that PMFSFM can signiﬁcantly improve the evaluation accuracy of water quality after introducing

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

201

Fig. 2. Membership functions.

Fig. 3. Membership functions of grades I and II.

Fig. 4. Membership functions of grades V and VI. Table 4 Some of the fuzzy water quality data for training. Set

Vec.

pH

DO

CODcr

NH3 -N

TN

NO3

NO2

PO4

TP

CODmn

Grade

Fuzzy output

A1

x11 x12 x13 x14 x15 x16 x17 x18 x19 x110

6.50 6.52 6.55 6.53 6.56 6.55 6.56 6.52 6.52 6.53

1.41 1.44 1.22 1.03 1.50 1.36 1.47 1.27 1.41 1.25

76.92 77.80 76.17 75.60 76.94 77.15 75.84 77.01 75.91 74.55

16.29 16.72 16.60 16.61 16.44 16.50 16.68 16.48 16.43 16.27

15.41 15.96 15.55 15.70 15.53 15.24 15.46 15.59 15.51 15.46

5.13 4.80 4.93 4.77 4.91 4.97 4.93 4.99 5.02 5.04

0.37 0.51 0.53 0.38 0.36 0.64 0.24 0.40 0.46 0.28

0.64 0.62 0.65 0.64 0.63 0.61 0.61 0.59 0.64 0.65

0.60 0.61 0.55 0.60 0.58 0.58 0.59 0.54 0.55 0.61

5.00 4.78 4.79 4.63 4.92 4.72 4.69 4.38 4.91 4.80

VI

(0.1,0.6,1.1)

A2

x21 x22 x23 x24 x25 x26 x27 x28 x29 x210

6.32 6.31 6.26 6.29 6.26 6.28 6.31 6.28 6.28 6.28

2.75 2.67 2.78 2.64 2.71 2.57 2.51 2.60 2.97 2.53

19.34 22.75 21.89 22.38 21.60 19.47 19.19 20.80 20.79 19.92

1.77 1.82 1.68 2.19 1.98 2.27 2.15 1.88 1.88 2.29

5.32 5.19 5.21 4.95 5.28 5.05 5.25 5.07 5.12 5.07

0.24 0.63 0.41 0.53 0.60 0.40 0.28 0.46 0.44 0.52

3.22 3.38 3.29 3.39 3.20 3.29 3.32 3.25 3.29 3.33

0.05 0.05 0.03 0.03 0.02 0.07 0.06 0.04 0.05 0.03

0.62 0.70 0.61 0.62 0.62 0.63 0.63 0.61 0.62 0.66

40.34 40.31 40.28 40.39 40.33 40.40 40.24 40.31 40.37 40.21

IV

(−1.13, 0.2, 1.53 )

202

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205 Table 5 Evaluation results with different methods. Test number

PMFSFM

SFM

SOCP

RHISCRC

LPSVMfast

DT

First-fold Second-fold Third-fold Fourth-fold Fifth-fold Sixth-fold Seventh-fold Eighth-fold Ninth-fold Tenth-fold

0.55 0.73 0.60 0.52 0.68 0.77 0.73 0.68 0.58 0.72

0.57 0.60 0.60 0.55 0.57 0.53 0.58 0.60 0.57 0.62

0.57 0.60 0.57 0.55 0.38 0.57 0.63 0.57 0.50 0.67

0.37 0.37 0.62 0.50 0.75 0.62 0.50 0.37 0.37 0.62

0.55 0.61 0.57 0.58 0.54 0.58 0.64 0.49 0.50 0.54

0.52 0.54 0.48 0.51 0.54 0.56 0.47 0.53 0.55 0.57

Average accuracy Variance H-value Time consumption

0.66 0.08

0.58 0.03 1 1018.38

0.56 0.08 1 11.85

0.51 0.14 1 4.26

0.56 0.05 1 3.81

0.53 0.02 1 27.31

558.85

MDs and possibility measure. And also, the h-values are all 1 which implies that there are signiﬁcation differences between PMFSFM and other methods. Additionally, the time consumption with PMFSFM is much shorter than that with SFM, this is because the source codes of PMFSFM are optimized. Especially, PMFSFM achieves higher average accuracy than SFM. The fact is that the water quality evaluation is a setbased fuzzy classiﬁcation in essence and MD plays an important role in obtaining the optimal separating hyperplane, but SFM does not consider the fuzziness in the evaluation. And also, PMFSFM achieves higher average accuracy than LPSVMfast and DT, which veriﬁes the point of view that it is better to represent the original data with a set-valued datum but not a vector-valued datum. However, it is a pity that the accuracies of the six different methods are all relatively low. This is because the water quality data of Fuyang River are extremely unbalanced, they almost belong to grade IV or V. Additionally, PMFSFM converts the sets in Rd into functions in an inﬁnite dimensional Banach space C(S) which leads to the problem of increase of dimension, and then the time consumption with PMFSFM is much longer. 7. Conclusions and future research directions Based on SFM, we formulate a new learning framework called PMFSFM for set-based fuzzy classiﬁcations. In this work, we introduce membership degree to formulate the degree of input point to a fuzzy class, and then a set-based fuzzy classiﬁcation is converted into a function-based fuzzy classiﬁcation. Based on possibility measure, we construct a PMFSFM ﬁnally. PMFSFM is an extension of SFM in fuzzy environment, which can provide both the fuzzy class and the membership degree of a given input to the fuzzy class. Experiment results about water quality evaluation show the effectiveness and superiority of PMFSFM. Whatever, we can also apply PMFSFM to other set-based fuzzy classiﬁcations. As PMFSFM converts the input sets in Rd into Banach space C(S) with support function and the classiﬁcations are converted into a function-based fuzzy representation after mapping, it can also perform function(or distribution)-based fuzzy classiﬁcation tasks. In addition, as a vector can be represented by a set with a single point, the vector-based fuzzy classiﬁcations can also be handled with the proposed PMFSFM. Moreover, the improvement of PMFSFM to handle with unbalanced fuzzy set-valued data deserves further studies. Acknowledgments This work is supported by the National Natural Science Foundation of China (11671109 and 61732011), Project funded by China Postdoctoral Science Foundation (2018M640234). Appendix A. Proof of Theorem 5 Proof. For any j = 1, 2, . . . , l, assume h j (μ, α ) = −

S

σ j dμ + α ,

0, S σ j d μ + α > 0 , − S σ j dμ + α , S σ j dμ + α ≤ 0 ,

σ dμ + α , S σ j dμ + α > 0, h−j (μ, α ) = S j 0, S σ j dμ + α ≤ 0 . h+j (μ, α ) =

From Theorem 4, we have that inequality (19)

Pos y˜ j

S

σ j dμ + α ≥ 1 = Pos 1 − y˜ j σ j dμ + α ≤ 0 ≥ λ S

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

203

is equivalent to

(1 − λ ) r j1 · h+j (μ, α ) − r j3 · h−j (μ, α ) + λ · r j2 · h j (μ, α ) + 1 ≤ 0, j = 1, 2, . . . , l.

(A.1)

For the given conﬁdence level λ, inequality (A.1) can be represented as

[(1 − λ )rt3 + λ · rt2 ] S σt dμ + α ≥ 1, t = 1, 2, . . . , p, (1 − λ )ri1 + λ · ri2 S σi dμ + α ≥ 1, i = p + 1, . . . , l.

The proof is completed.

(A.2)

Appendix B. Proof of Theorem 7

Proof. Let D be the feasible domain of optimization problem (25). As optimization problem (25) is fuzzy nonlinearly separa ble, there exists a feasible solution μ , α , ξ satisfying constraints of optimization problem (25). Thus, the feasible domain D is nonempty. By Theorem 4 in literature [3], it is not diﬃcult to prove that the objective function 12 μ + C lj=1 ξ j is convex.

For any μ1 , α1 , ξ 1 ,

[(1 − λ )rt3 + λ · rt2 ]

μ2 , α2 , ξ 2 ∈ D, by the ﬁrst constraint of optimization problem (25), we have

S

σt dμ j + α j + ξ jt ≥ 1, t = 1, 2, . . . , p, j = 1, 2.

Then, for any λ ∈ [0, 1] and t ∈ {1, 2, . . . , p}, we have

σt d[λμ1 + (1 − λ )μ2 ] + [λα1 + (1 − λ )α2 ] + [λξ1t + (1 − λ )ξ2t ] S = [(1 − λ )rt3 + λ · rt2 ] λ σt dμ1 + λα1 + (1 − λ ) σt dμ2 + λα2 + [λξ1t + (1 − λ )ξ2t ]

[(1 − λ )rt3 + λ · rt2 ]

S

S

≥ λ · 1 + ( 1 − λ ) · 1 = 1, Similarly, with the second constraint of optimization problem (25), we have that for any λ ∈ [0, 1] and i ∈ { p + 1, . . . , l }, the inequality

( 1 − λ )ri1 + λ · ri2

S

σi d[λμ1 + (1 − λ )μ2 ] + [λα1 + (1 − λ )α2 ] + [λξ1t + (1 − λ )ξ2t ] ≥ 1

holds. Further,

Thus,

λξ1 j + (1 − λ )ξ2 j ≥ 0, j ∈ {1, 2, . . . , l }.

(B.1)

λ μ1 , α1 , ξ 1 + (1 − λ ) μ2 , α2 , ξ 2 ∈ D.

(B.2)

Therefore, for the linearly inseparable training set, optimization problem (25) is convex.

Appendix C. Proof of Theorem 8

Proof. As triangular fuzzy numbers y˜t = rt1 , rt2 , rt3 and y˜i = ri1 , ri2 , ri3 represent a fuzzy positive class and a fuzzy negative class, respectively, from formula (16), we have

kt = (1 − λ )rt3 + λ · rt2 > 0, t = 1, 2, . . . , p and

li = (1 − λ )ri1 + λ · ri2 < 0, i = p + 1, . . . , l.

Considering that optimization problem (25) is fuzzy nonlinearly separable, there exists a feasible solution (μ , α , ξ ) satisfying constraints of optimization problem (25). Thus, the feasible domain D of optimization problem (25) is nonempty. As l 1 j=1 ξ j ≥ 0, then inﬁmum 2 μ + C

204

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

1 μ + C ξ j 2 l

inf μ, α , ξ ∈ D

j=1

exists. Further, there exists a convergent subsequence

lim

n→+∞

1 μ n + C ξ n j 2 l

μn , ξ n such that

= β,

j=1

where ξ n = ξn1 , ξn2 , . . . , ξnl

and β = inf

μ, α , ξ ∈ D

{ 12 μ + C

B∗

l j=1

C(S) ∗

Let B be a Banach space itsconjugate and

space. Then subsequence μnk , ξ n of μn , ξ n and μ0 , ξ 0 such that

lim

k→+∞

ξ j }. is a B∗ space, and we have that there is a convergent

k

μnk (σ ), ξ nk = μ0 (σ ), ξ 0 , σ ∈ C (S ).

Thus, there exists N0 ∈ N such that for any ε > 0 and all nk > N0 , we have

|μnk (σ ) − μ0 (σ )| < ε , and

ξnk j − ξ0 j < ε , j = 1, 2, . . . , l.

Setting ε = ε0 , we have

−ε0 + μ0 (σ ) < μnk (σ ) < ε0 + μ0 (σ ),

σ ∈ C (S ),

and

−ε0 + ξ0 j < ξnk j < ε0 + ξ0 j , j = 1, 2, . . . , l.

For any μn , αn , ξ n ∈ D and σt+ with output y˜t = rt1 , rt2 , rt3 , from the ﬁrst constraint of optimization problem (25), we have

kt

μn σt+ + αn + ξnt ≥ 1, t = 1, 2, . . . , p.

Then, the subsequence

kt

+

μnk σt

μnk , αnk , ξ nk

satisﬁes

+ αnk + ξnk t ≥ 1, t = 1, 2, . . . , p.

Therefore,

kt αnk ≥ 1 − (kt + 1 )ε0 − kt μ0

σt+ − ξ0t .

Assume M1 = min1≤t≤p 1 − (kt + 1 )ε0 − kt μ0 σt+ − ξ0t . Then, for all nk > N0 , we have

αnk ≥

M1 . kt

(C.1)

Similar to the proof of inequality (C.1), for a negative σi− , assume

N1 = min

p+1≤i≤l

1 + ( l i − 1 ) ε 0 − l i μ0

σi− + ξ0i .

Then for all nk > N0 , we have

αnk ≤

N1 . li

Therefore, for both the positive and negative samples σ and all nk > N0 , we have

N1 M1 ≤ αnk ≤ . kt li

M1 N , |M2 |, l 1 , |N2 | . Then, for any nk ∈ N, kt i

Assume M2 = min1≤nk ≤N0 kt αnk , N2 = max1≤nk ≤N0 li αnk , and M = max

we have αnk ≤ M.

Thus, there exists α0 ∈ R and a convergent subsequence αnk sample

σt+ ,

kt

q

of αnk

such that lim αnk = α0 . Then, for any positive q→+∞

q

we have

μ0 σt+ + α0 + ξ0t = lim kt μnkq σt+ + αnkq + ξnkq t ≥ 1. q→+∞

(C.2)

J. Chen, Q. Hu and X. Xue et al. / Information Sciences 483 (2019) 192–205

205

Similar to the proof of (C.2), for any negative sample σi− , we have

li

lim li μnkq σi− + αnkq + ξnkq i ≥ 1. μ0 σi− + α0 + ξ0i = q→ +∞

Therefore, μ0 , α0 , ξ 0 satisﬁes the constraints of optimization problem (25), that is

μ0 , α0 , ξ 0 ∈ D.

With the help of Theorem 4 in literature [3], it is not diﬃcult to obtain that

1 μ0 + C ξ0 j = inf 2 μ,α ,ξ ∈D j=1 l

1 μ 0 + C ξ0 j . 2 l

j=1

Therefore, for the fuzzy linearly inseparable training set (18), the optimal solution of optimization problem (25) exists.

References [1] Y. Bazi, L. Bruzzone, F. Melgani, Image thresholding based on the em algorithm and the generalized gaussian distribution, Pattern Recognit. 40 (2) (2007) 619–634. [2] J. Camarena, V. Gregori, S. Morillas, A. Sapena, A simple fuzzy method to remove mixed gaussian-impulsive noise from color images, IEEE Trans. Fuzzy Syst. 21 (5) (2013) 971–978. [3] J. Chen, Q. Hu, X. Xue, M. Ha, L. Ma, Support function machine for set-based classiﬁcation with application to water quality evaluation, Inf. Sci. 388–389 (2017) 48–61. [4] J. Chen, W. Pedrycz, M. Ha, L. Ma, Set-valued samples based support vector regression and its applications, Expert Syst. Appl. 42 (5) (2015) 2502–2509. [5] J. Chen, X. Xue, M. Ha, L. Ma, Separability of set-valued data sets and existence of support hyperplanes in the support function machine, Inf. Sci. 430–431 (2018) 432–443. [6] R. Coppi, P. D’Urso, P. Giordani, Fuzzy and possibilistic clustering for fuzzy data, Comput. Stat. Data Anal. 56 (4) (2012) 915–927. [7] C. Cortes, V. Vapnik, Support vector networks, Mach. Learn. 20 (1995) 273–297. [8] D. Dubois, H. Prade, Possibility Theory, Plenum Press, New York, 1988. [9] M. Galar, A. Fernndez, E. Barrenechea, H. Bustince, F. Herrera, An overview of ensemble methods for binary classiﬁers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes, Pattern Recognit. 44 (8) (2011) 1761–1776. [10] R. Gardner, Geometric Tomography, second ed., Cambridge University Press, New York, 2006. [11] C. Ho, C. Lin, Large-scale linear support vector regression, J. Mach. Learn. Res. 13 (2012) 3323–3348. [12] A. Kalantari, A. Kamsin, S. Shamshirband, A. Gani, H. Alinejad-Rokny, A.T. Chronopoulos, Computational intelligence approaches for classiﬁcation of medical data: state-of-the-art, future challenges and research directions, Neurocomputing 276 (2018) 2–22. [13] A. Karambelas, T. Holloway, G. Kiesewetter, C.M. Heyes, Constraining the uncertainty in emissions over india with a regional air quality model evaluation, Atmos. Environ. 174 (2018) 194–203. [14] Z. Li, J. Liu, J. Tang, H. Lu, Robust structured subspace learning for data representation, IEEE Trans. Pattern Anal. Mach. Intell. 37 (10) (2015) 2085–2098. [15] D. Liu, T. Li, J. Zhang, A rough set-based incremental approach for learning knowledge in dynamic incomplete information systems, Int. J. Approx. Reason. 55 (8) (2014) 1764–1786. [16] P. Liu, J. Guo, K. Chamnongthai, H. Prasetyo, Fusion of color histogram and LBP-based features for texture image retrieval and classiﬁcation, Inf. Sci. 390 (2017) 95–111. [17] C. Luo, T. Li, H. Chen, H. Fujita, Z. Yi, Incremental rough set approach for hierarchical multicriteria classiﬁcation, Inf. Sci. 429 (2018) 72–87. [18] O. Mangasaian, D. Musicant, Lagrangian support vector machines, J. Mach. Learn. Res. 1 (2001) 161–177. [19] K. Nódler, M. Tsakiri, M. Aloupi, G. Gatidou, A. Stasinakis, T. Licha, Evaluation of polar organic micropollutants as indicators for wastewater-related coastal water quality impairment, Environ. Pollut. 211 (2016) 282–290. [20] E.F.D. Oliveira, M.E. de Lima Tostes, C.A.O. de Freitas, J.C. Leite, Voltage thd analysis using knowledge discovery in databases with a decision tree classiﬁer, IEEE Access 6 (2018) 1177–1188. [21] W. Rudin, Real and Complex Analysis, third ed., McGraw-Hill Book Company, New York, 1987. [22] S. Schlag, M. Schmitt, C. Schulz, Faster support vector machines, arXiv:1808.06394 (2018). [23] S. Shamshirband, A. Patel, N. Anuar, M. Kiah, A. Abraham, Cooperative game theoretic approach using fuzzy q-learning for detecting and preventing intrusions in wireless sensor networks, Eng. Appl. Artif. Intell. 32 (2014) 228–241. [24] P. Shivaswamy, C. Bhattacharyya, A. Smola, Second order cone programming approaches for handling missing and uncertain data, J. Mach. Learn. Res. 7 (2006) 1283–1314. [25] S. Sione, M.G. Wilson, M. Lado, A. González, Evaluation of soil degradation produced by rice crop systems in a vertisol, using a soil quality index, Catena 150 (2017) 79–86. [26] H. Tan, Z. Ma, S. Zhang, Z. Zhan, B. Zhang, C. Zhang, Grassmann manifold for nearest points image set classiﬁcation, Pattern Recognit. Lett. 68 (2015) 190–196. [27] S. Tsang, B. Kao, K. Yip, W. Ho, S. Lee, Decision trees for uncertain data, IEEE Trans. Knowl. Data Eng. 23 (1) (2011) 64–78. [28] Vicente, T.F. Yago, M. Hoai, D. Samaras, Leave-one-out kernel optimization for shadow detection and removal, IEEE Trans. Pattern Anal. Mach. Intell. 40 (3) (2018) 682–695. [29] L. Wang, Study on the Water Environmental Capacity of Fuyang River in Handan (in Chinese), Master’s thesis, Hebei University of Science and Technology, 2014. [30] R. Wang, S. Shan, X. Chen, Q. Dai, W. Gao, Manifold–manifold distance and its application to face recognition with image sets, IEEE Trans. Image Process. 21 (10) (2012) 4466–4479. [31] T. Wong, Parametric methods for comparing the performance of two classiﬁcation algorithms evaluated by k-fold cross validation on multiple data sets, Pattern Recognit. 65 (2017) 97–107. [32] P. Xu, F. Davoine, H. Zha, T. Denoeux, Evidential calibration of binary SVM classiﬁers, Int. J. Approx. Reason. 72 (2016) 55–70. [33] Z. Yang, N. Deng, Fuzzy support vector classiﬁcation based on possibility theory, Pattern Recognit. Artif. Intell. 20 (1) (2007) 7–14. [34] J. Yoneyama, Robust sampled-data stabilization of uncertain fuzzy systems via input delay approach, Inf. Sci. 198 (2012) 169–176. [35] Y. Yu, W. Pedrycz, D. Miao, Neighborhood rough sets based multi-label classiﬁcation for automatic image annotation, Int. J. Approx. Reason. 54 (9) (2013) 1373–1387. [36] X. Yue, Y. Chen, D. Miao, J. Qian, Tri-partition neighborhood covering reduction for robust classiﬁcation, Int. J. Approx. Reason. 83 (2017) 371–384. [37] L. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets Syst. 1 (1) (1999) 9–34. [38] P. Zheng, Z. Zhao, J. Gao, X. Wu, A set-level joint sparse representation for image set classiﬁcation, Inf. Sci. 448 (2018) 75–90. [39] P. Zhu, W. Zuo, L. Zhang, S. Shiu, D. Zhang, Image set based collaborative representation for face recognition, IEEE Trans. Inf. Forensics Secur. 9 (7) (2014) 1120–1132.

Possibility measure based fuzzy support function machine for set-based fuzzy classifications

Possibility measure based fuzzy support function machine for set-based fuzzy classifications

Recommend Documents