Neighborhood based decision-theoretic rough set models

JID:IJA AID:7845 /FLA [m3G; v1.168; Prn:13/11/2015; 9:41] P.1 (1-17) International Journal of Approximate Reasoning ••• (••••) •••–••• 1 Contents ...

Download PDF

1MB Sizes 2 Downloads 129 Views

Report

PDF Reader
Full Text

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.1 (1-17)

International Journal of Approximate Reasoning ••• (••••) •••–•••

1

Contents lists available at ScienceDirect

2

1 2

3

International Journal of Approximate Reasoning

4 5

3 4 5

6

6

www.elsevier.com/locate/ijar

7

7

8

8

9

9

10

10

11 12 13 14 15

Neighborhood based decision-theoretic rough set models Weiwei Li a,b , Zhiqiu Huang a , Xiuyi Jia c,∗ , Xinye Cai a a

16

b

17

c

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

18

23 24 25

i n f o

28 29

14 15 16 17

a b s t r a c t

20 21

Article history: Received 1 April 2015 Received in revised form 22 October 2015 Accepted 6 November 2015 Available online xxxx

26 27

13

19

a r t i c l e

21 22

12

18

19 20

11

Keywords: Neighborhood relation Decision-theoretic rough set model Attribute reduction

30 31 32 33 34

As an extension of Pawlak rough set model, decision-theoretic rough set model (DTRS) adopts the Bayesian decision theory to compute the required thresholds in probabilistic rough set models. It gives a new semantic interpretation of the positive, boundary and negative regions by using three-way decisions. DTRS has been widely discussed and applied in data mining and decision making. However, one limitation of DTRS is its lack of ability to deal with numerical data directly. In order to overcome this disadvantage and extend the theory of DTRS, this paper proposes a neighborhood based decision-theoretic rough set model (NDTRS) under the framework of DTRS. Basic concepts of NDTRS are introduced. A positive region related attribute reduct and a minimum cost attribute reduct in the proposed model are deﬁned and analyzed. Experimental results show that our methods can get a short reduct. Furthermore, a new neighborhood classiﬁer based on three-way decisions is constructed and compared with other classiﬁers. Comparison experiments show that the proposed classiﬁer can get a high accuracy and a low misclassiﬁcation cost. © 2015 Published by Elsevier Inc.

22 23 24 25 26 27 28 29 30 31 32 33 34

35

35

36

36

37

37

38

38

39 40

39

1. Introduction

41 42 43 44 45 46 47 48 49 50 51 52 53 54

41

Pawlak rough set [32,33] is a very useful mathematical tool for knowledge representation, especially in describing the uncertainty of the data. In Pawlak rough set model, an object x will be classiﬁed into the category X if the equivalence class | X ∩[x]| of x: [x] is a subset of X , which means p ( X |[x]) = |[x]| = 1. Therefore, Pawlak rough set model can be seen as a qualitative model. However, two kinds of limitations exist in this model as follows. One is the probability p ( X |[x]) must be equal to 1, which is sensitive to noisy data. The other one is that the equivalence class [x] is deﬁned based on the indiscernibility relation, which is not capable of dealing with numerical data directly. For the ﬁrst limitation, decision-theoretic rough set model (DTRS) [44,45] introduces a generalized framework to solve it. It is well known that, without considering the tolerance of classiﬁcation error, in Pawlak rough set model it is diﬃcult to obtain an effective result from the data with noise. To overcome this problem, many researchers studied different probabilistic rough set (PRS) models [34,37,45,60]. Compared to Pawlak rough set model, all PRS models take the tolerance of classiﬁcation error into account. p ( X |[x]) ≥ α is used to classify the object x into the category X , while α is a threshold between 0 and 1.

55

58 59 60 61

42 43 44 45 46 47 48 49 50 51 52 53 54 55

56 57

40

56

*

Corresponding author. E-mail addresses: [email protected] (W. Li), [email protected] (Z. Huang), [email protected] (X. Jia), [email protected] (X. Cai).

http://dx.doi.org/10.1016/j.ijar.2015.11.005 0888-613X/© 2015 Published by Elsevier Inc.

57 58 59 60 61

JID:IJA AID:7845 /FLA

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

[m3G; v1.168; Prn:13/11/2015; 9:41] P.2 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

As a generalized rough set model, DTRS provides a uniﬁed and comprehensive framework for interpreting and determining the required thresholds. Based on minimum Bayesian decision cost procedure, DTRS can compute the required thresholds from given cost functions. Different thresholds for different PRS models can be deduced from appropriate cost functions. Another main contribution of DTRS to rough set theory is the introducing of the notion of three-way decisions [48,49]. Compared to classical two-way decisions in classiﬁcation or decision problems, three-way decisions bring in a deferment decision based on acceptance decision and rejection decision. In rough set theory, category X is described by three regions. An object x will be classiﬁed into one of these regions. Three different classiﬁcation results can be interpreted by three-way decisions. Acceptance decision classiﬁes x into the positive region of X , deferment decision classiﬁes x into the boundary region of X , and rejection decision classiﬁes x into the negative region of X . The advantage of DTRS is that it can deal with noisy data by considering the tolerance of classiﬁcation error. Three-way decisions framework based on DTRS can convert some potential being misclassiﬁed objects into the boundary region for further examination, which means DTRS usually has a higher classiﬁcation accuracy. However, the disadvantage of DTRS is that it cannot deal with numerical data directly, which is also another limitation of Pawlak rough set model. To overcome this problem, many studies usually adopt two kinds of methods in real applications. One is discretization [7], the numerical data is discretized before applying rough set models. Each discretized interval is seen as a nominal value of the attribute. The other one is deﬁning equivalence class based on other relations instead of indiscernibility relation, such as distance functions [24], fuzzy binary relations [42,56], dissimilarity and similarity measures [40], which can be used to measure and represent the continuous values. Generally, all these relations can be understood as special classes of neighborhood systems [26,46]. In general, there exist many data which contain numerical values and noisy values simultaneously. In this regard, we propose an extended rough set model, i.e. neighborhood based decision-theoretic rough set model (NDTRS) to deal with this type of numerical data which are accompanied by noisy values. Besides introducing some basic concepts which are usually deﬁned in rough set models, two kinds of attribute reducts are also deﬁned. The ﬁrst attribute reduct keeps the positive region of the decision table unchanged or extended. The second attribute reduct can minimize its induced decision cost. Heuristic approaches to computing positive region preservation based attribute reduct and minimum cost attribute reduct are designed. Compared to supervised or unsupervised discretization methods, the proposed heuristic approaches can get the shorter reducts. A three-way decisions based neighborhood classiﬁer is also proposed for classiﬁcation problem. Compared to the classical two-way decisions based neighborhood classiﬁer and several classical classiﬁers including C4.5, k-NN and SVM, our proposed classiﬁer can obtain a higher accuracy and a less misclassiﬁcation cost on several data sets. We organize the paper as follows: Section 2 gives a brieﬂy introduction about the related work. Section 3 introduces the framework of NDTRS. Section 4 deﬁnes two kinds of attribute reducts in NDTRS. Section 5 gives a three-way decisions based neighborhood classiﬁer for classiﬁcation problem. Section 6 presents the experimental results. Section 7 concludes this paper.

36 37 38

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

2. Related work

39 40

1

37 38 39

In this section, we will brieﬂy introduce the related work on DTRS and neighborhood systems. In recent years, DTRS has attracted much attention [31]. On the extension of the model, the relationships between DTRS and other PRS models have been investigated by Yao [47]. Many other extended models have been proposed based on DTRS, including multi-view DTRS [58], multiple-category DTRS [28,29,57], game-theoretic rough set model [8], Naive Bayesian DTRS [52], multi-granulation DTRS [35] and triangular fuzzy DTRS [25]. Attribute reduction in DTRS is also thoroughly discussed by many researchers [14,15,23,50]. On the application of the model, DTRS has been successfully applied in many different areas, including software defect prediction [20], text classiﬁcation problem [21], clustering problem [53,22], spam ﬁltering problem [16,59], and government decision analysis [30]. It is worth mentioning that a fuzzy DTRS approach was proposed to deal with real-valued data with fuzzy relation information [56]. We do not compare their method in this paper because we do not have any fuzzy relation information as the domain knowledge on the data sets compared in our experiments. In the studies of neighborhood systems [9,10,18,39,43,54], using distance functions is a very common and useful method. Hu et al. [11] adopted three kinds of distance functions and proposed a neighborhood based rough set model, which is easy to understand and implement. On the extension of the model, Lin et al. [27] developed a neighborhood based multigranulation rough set in the framework of multigranulation rough sets. Wu and Zhang [43] investigated properties of neighborhood operator systems and rough set approximation operators. For attribute reduction and classiﬁcation in neighborhood systems, Du et al. [6] discussed the attribute reduction and the rule learning based on a neighborhood covering space. Chen et al. [2] set up a connection between neighborhood-covering rough sets and evidence theory to establish a basic framework of numerical characterizations of attribute reduction. Hu et al. [10] deﬁned a novel feature evaluation measure for feature selection in neighborhood rough set model. They also proposed large-margin nearest neighbor classiﬁers via sample weight learning [12]. (See Table 1.)

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.3 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1

3

1

Table 1 The cost function matrix for classiﬁcation based on three-way decisions.

2 3

aP aB aN

6 7

3

X

Xc

4

λP P λB P λN P

λP N λB N λN N

5

4 5

2

6 7

8 9 10 11 12 13 14

8 9

3. Neighborhood based decision-theoretic rough set model

10 11

3.1. Decision-theoretic rough set model

12 13

In this section, we present some basic notions of DTRS ﬁrst [48].

14

15 16

15

Deﬁnition 1. A decision table is a quad tuple:

17

S =,

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

16

where U is the universe, At is a ﬁnite nonempty set of attributes, C is a set of condition attributes, and D is a set of decision attributes. V a is a nonempty set of values of a ∈ At, and I a : U → V a is the mapping function. In rough set theory, an object x is usually represented by its equivalence class based on the equivalence relation: [x] A = { y ∈ U |∀a ∈ A ( I a (x) = I a ( y ))}. In DTRS, let = { X , X c } indicate an object x is in category X and not in X . The probability of x in X can be computed | X ∩[x]| as p ( X |[x]) = |[x]| , and the probability of x in X c is p ( X c |[x] = 1 − p ( X |[x])). Three kinds of action A = {a P , a B , a N } are deﬁned to classify x into three regions of X or X c . Then the cost function denotes the cost for taking the action with the current state. The cost functions are summarized in the following matrix. In this matrix, λ∗ P (∗ ∈ { P , B , N }) denote the costs for taking action a P , a B or a N when the x is actual in X , and λ∗ N (∗ ∈ { P , B , N }) denote the costs for taking action a P , a B or a N when the x is actual in X c . Based on given cost functions and the conditional probability of x, the Bayesian costs for different decisions are deﬁned as:

33

R P = R(a P |[x]) = λ P P · p ( X |[x]) + λ P N · p ( X |[x]),

35

R B = R(a B |[x]) = λ B P · p ( X |[x]) + λ B N · p ( X c |[x]),

36

R N = R(a N |[x]) = λ N P · p ( X |[x]) + λ N N · p ( X c |[x]).

37 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

20 21 22 23 24 25 26 27 28 29 30 31 32

35 36

(2)

37 38 39 40

(P) If R P ≤ R B and R P ≤ R N , decide x ∈ POS( X ); (B) If R B ≤ R P and R B ≤ R N , decide x ∈ BND( X ); (N) If R N ≤ R P and R N ≤ R B , decide x ∈ NEG( X ).

41 42 43

POS( X ) denotes the positive region of X , BND( X ) denotes the boundary region of X and NEG( X ) denotes the negative region of X . Consider a reasonable assumption that the costs of taking right actions are less than the costs of taking wrong actions, we have λ P P ≤ λ B P < λ N P and λ N N ≤ λ B N < λ P N . Since P ( X |[x]) + P ( X c |[x]) = 1, we have the following simpliﬁed decision rules:

44 45 46 47 48 49

(P) If p ( X |[x]) ≥ α and p ( X |[x]) ≥ γ , decide x ∈ POS( X ); (B) If p ( X |[x]) ≤ α and p ( X |[x]) ≥ β , decide x ∈ BND( X ); (N) If p ( X |[x]) ≤ β and p ( X |[x]) ≤ γ , decide x ∈ NEG( X ),

50 51 52 53

α , β , and γ can be computed from all cost functions: (λ P N − λ B N ) , (λ P N − λ B N ) + (λ B P − λ P P ) (λ B N − λ N N ) β= , (λ B N − λ N N ) + (λ N P − λ B P ) (λ P N − λ N N ) γ= . (λ P N − λ N N ) + (λ N P − λ P P )

19

34

Following minimum-cost decision rules are obtained:

where

18

33

c

34

38

17

(1)

54 55

α=

56 57 58 59 60

(3)

61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.4 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

By considering the following condition [48]

1 2

(λ B P − λ P P ) (λ N P − λ B P ) > , (λ B N − λ N N ) (λ P N − λ B N )

(4)

4

we have 0 ≤ β < γ < α ≤ 1. The ﬁnal decision rules associated with the probability of x and the pair of thresholds (α , β) are obtained:

23 24 25

28 29 30 31 32

Let p = p ( X |[x]), we have the following Bayesian cost of different rules.

12

10 11 13 14

cost of positive rule: p · λ P P + (1 − p ) · λ P N ,

15

cost of boundary rule: p · λ B P + (1 − p ) · λ B N ,

16

(5)

17

Moreover, if we assume correct classiﬁcation does not bring any cost, that is λ P P = λ N N = 0, we have the following simpliﬁed decision costs:

19

cost of negative rule: p · λ N P + (1 − p ) · λ N N .

37

40

23

cost of boundary rule: p · λ B P + (1 − p ) · λ B N ,

24

cost of negative rule: p · λ N P .

(6)

43 44

COST =

(1 − p i ) · λ P N +

( p j · λ B P + (1 − p j ) · λ B N ) +

β≤ p j ≤α

p i >α

27

pk · λ N P ,

(7)

pk <β

47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

28 29 30 31

|[x ] ∩ D |

where p i = p ( D max ([xi ] A )|[xi ] A ) and D max ([xi ] A ) = arg max D j ∈π D { |[i xA ] | j }. i A

32 33

3.2. Neighborhood relation and neighborhood rough set model

34 35

Several neighborhood systems [26,46,11] were proposed to deal with numerical data. In this paper, we adopt the distance based neighborhood relation deﬁned by Hu [11].

36 37 38

Deﬁnition 2. In a decision table S, for an object xi ∈ U and A ⊆ At, the neighborhood δ A (xi ) of xi in the subspace A is deﬁned as

39 40 41

δ A (xi ) = {x j | x j ∈ U , A (xi , x j ) ≤ δ},

(8)

M ( xi , x j ) =

N

44

1 / M | I ak (xi ) − I ak (x j )|

M

45

.

(9)

k =1

46 47 48

xi and x j are two objects in N-dimensional space At = {a1 , a2 , . . . , a N }. The Minkowski distance is also called: (1) Manhattan distance 1 if M = 1; (2) Euclidean distance 2 if M = 2; (3) Chebyshev distance ∞ if M = ∞. Given a metric space , the granule system is composed of neighborhood granules {δ(xi )|xi ∈ U } and covers the universal space rather than partitions it. It is also noted that the partition of space generated by rough sets can be obtained from neighborhood rough sets with covering principle, which means δ = 0. The pair is called a neighborhood approximation space deﬁned by the attribute set A. C is the cover of U with respect to neighborhood relation N. The family of neighborhood granules induced by the covering C are the basic blocks to construct the neighborhood rough set approximations. For a subset X ⊆ U , the lower and upper approximations of X with respect to C are deﬁned as [26]:

49 50 51 52 53 54 55 56 57 58 59

N C ( X ) = {xi | δ(xi ) ⊆ X , xi ∈ U }, N C ( X ) = {xi | δ(xi ) ∩ X = ∅, xi ∈ U }.

42 43

where is a metric function and the Minkowski distance is widely applied:

45 46

25 26

For all objects in a decision table, the total decision cost can be expressed as:

41 42

20 22

38 39

18

21

cost of positive rule: (1 − p ) · λ P N ,

35 36

7 9

33 34

6 8

26 27

5

(P1) If p ( X |[x]) > α , decide x ∈ POS( X ); (B1) If β ≤ p ( X |[x]) ≤ α , decide x ∈ BND( X ); (N1) If p ( X |[x]) < β , decide x ∈ NEG( X ).

21 22

3

60

(10)

61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.5 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1 2

5

Based on the rough set approximations of X , the positive region, boundary region and negative region of X with respect to C are deﬁned as:

3 4 5 6 7 8 9 10 11 12 13

4 5

BNDC ( X ) = N C ( X ) − N C ( X ),

6

NEGC ( X ) = U − POSC ( X ) ∪ BNDC ( X ) = U − N C ( X ).

(11)

18 19 20

Theorem 1. (See [11].) Given and two nonnegative δ1 and δ2 , if δ1 ≤ δ2 , we have

9

(1) ∀xi ∈ U : δ1 (xi ) ⊆ δ2 (xi ); (2) ∀ X ⊆ U : N 2 ( X ) ⊆ N 1 ( X ), N 1 ( X ) ⊆ N 2 ( X ).

11

10 12 13 14

3.3. Basic notions of neighborhood based decision-theoretic rough set model

15 16

In DTRS and other PRS models, an object x is usually described by its equivalence class: [x]. In NDTRS, δ(x) is applied to |δ(x)∩ X | represent x, then we have p ( X |δ(x)) = |δ(x)| . The lower and upper approximations of X are introduced in the followings. Given , for a subset X ⊆ U , we have the probabilistic lower and upper approximations of X with respect to the subspace B ⊆ At as following:

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

39 40 41 42 43 44 45 46

( X ) = {xi | p ( X |δ B (xi )) > α , xi ∈ U },

(α ,β)

( X ) = {xi | p ( X |δ B (xi ) ≥ β, xi ∈ U }.

NB

49 50 51 52 53 54 55 56 57

60 61

19 20

23

(12)

(α ,β)

(X) = N B

(α ,β)

(X) = N B

POS B BND B

(α ,β)

NEG B

(α ,β)

( X ),

(α ,β)

(X) − N B

26 27 28

(α ,β)

(α ,β)

( X ) = U − (POS B

( X ),

29

(α ,β)

( X ) ∪ BND B

(α ,β)

( X )) = U − N B

30

( X ).

(13)

In a decision table, π D = { D 1 , D 2 , . . . , D m } is a partition of U , which represents m decision classes. The three regions of the decision table based on partition π D in NDTRS can be deﬁned as: (α ,β)

POS B

(π D ) =

(α ,β)

POS B

(α ,β)

(π D ) =

(α ,β)

(α ,β)

|POS B

|U |

34

38

( D i ),

39

(α ,β)

(π D ) ∪ BND B

40

(π D ).

(14)

From the semantic viewpoint, objects in the positive region can be “probably” classiﬁed into a “certain” decision class. Larger positive region usually be with smaller boundary region, which means less ambiguous or uncertain objects. Moreover, in a classiﬁcation problem, less ambiguity implies high accuracy. To measure the classiﬁcation ability quantitatively, the quality of classiﬁcation in NDTRS is deﬁned as follows

γ B(α ,β) ( D ) =

33

37

(α ,β)

BND B

(π D ) = U − POS B

(α ,β)

32

36

1≤i ≤m

NEG B

31

35

( D i ),

1≤i ≤m

BND B

24 25

(π D )|

41 42 43 44 45 46 47

.

(15)

DTRS, as a special case, can be derived from NDTRS by having δ = 0. δ(x) will be an equivalent relation when δ = 0. In addition, the neighborhood rough set model can also be derived from NDTRS when we have α = 1 and β = 0. Theorem 2. Given and two nonnegative δ1 and δ2 , if δ1 ≤ δ2 with the same α and β , we have ∀xi ∈ U : δ1 (xi ) ⊆ δ2 (xi ). Proof. For any y ∈ δ1 (xi ), we have B (xi , y ) ≤ δ1 . Since δ1 ≤ δ2 , B (xi , y ) ≤ δ2 , we can conclude that y ∈ δ2 (xi ), therefore, δ1 (xi ) ⊆ δ2 (xi ). 2

58 59

18

22

For a subset X ⊆ U , the positive, boundary and negative regions of X with respect to B ⊆ At are deﬁned as:

47 48

17

21

(α ,β)

NB

37 38

7 8

16 17

2 3

POSC ( X ) = N C ( X ),

14 15

1

48 49 50 51 52 53 54 55 56 57 58

In classical neighborhood rough set model, Theorem 2 has also been proved without considering the thresholds α and β [11], and the positive region of each set are determined by δ only. However, in NDTRS, the positive region and the boundary region of each set is determined by both δ and (α , β). If δ is ﬁxed, we can also induce following theorems.

59 60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.6 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

6

1 2 3 4 5 6

Theorem 3. Given and two nonnegative (α1 ,β)

POS B

α1 and α2 , if α1 ≤ α2 with the same δ , we have ∀ X ⊆ U : POS(Bα2 ,β) ( X ) ⊆

2

( X ).

3

(α2 ,β)

Proof. For any y ∈ POS B

( X ), we have p ( X |δ B ( y )) > α2 . Since α1 ≤ α2 , p ( X |δ B ( y )) > α1 , we can conclude that y ∈ (α ,β) (α ,β) (α ,β) POS B 1 ( X ), therefore, POS B 2 ( X ) ⊆ POS B 1 ( X ). 2

7 8 9 10

13 14

Theorem 4. Given and two nonnegative β1 and β2 , if β1 ≤ β2 with the same δ , we have ∀ X ⊆ U : BND B (α ,β1 )

BND B

(X) ⊆

( X ).

17 18

( X ), we have p ( X |δ B ( y )) ≥ β2 . Since β1 ≤ β2 , p ( X |δ B ( y )) ≥ β1 , we can conclude that y ∈ (α ,β1 ) (α ,β2 ) (α ,β ) POS B ( X ), therefore, BND B ( X ) ⊆ BND B 1 ( X ). 2

23 24 25 26

29 30 31 32 33 34 35

38 39 40 41 42

47 48 49 50 51

54

57 58

A classical attribute reduct in Pawlak rough set model and other extension models is the positive region preservation based attribute reduct. We give the same deﬁnition in NDTRS as follows. Deﬁnition 3. In a decision table S =, given , an attribute set R ⊆ C is a positive region preservation based attribute reduct with respect to D if it satisﬁes the following two conditions: (α ,β)

61

18

22 23 24 25 26

30 31 32 33 34 35 36

(α ,β)

(π D ) = POSC (π D ); (α ,β) (α ,β) (π D ). (2) for any attribute a ∈ R, POS R −{a} (π D ) = POSC (1) POS R

37 38 39

In Pawlak rough set model and classical neighborhood rough set model, there exists a monotonicity of the positive region of a decision table with respect to the set of condition attributes (also called decision monotonicity), that is:

B 1 ⊆ B 2 ⇒ POS B 1 (π D ) ⊆ POS B 2 (π D ) ⇒ γ B 1 (π D ) ≤ γ B 2 (π D ).

(16)

If the monotonicity holds in the decisions table S, the reduct obtained from Deﬁnition 3 will be a minimal attribute set that keeps the positive region unchanged. However, the monotonicity does not always hold in PRS models. As NDTRS is also (α ,β) (α ,β) a kind of PRS model, given B 1 ⊆ B 2 , we may have POS B 2 ( X ) ⊆ POS B 1 ( X ). For the condition (1) in Deﬁnition 3, there (α ,β)

(α ,β)

may exists a subset B ⊆ C with POSC ( X ) ⊂ POS B ( X ). For the condition (2), R − {a} cannot guarantee the reduct is a minimal result. (α ,β) Based on above analysis, a positive region extension based attribute reduct in NDTRS can be deﬁned by using γ B (π D ).

40 41 42 43 44 45 46 47 48 49 50 51 52

Deﬁnition 4. In a decision table S =, given , an attribute set R ⊆ C is a positive region extension based attribute reduct with respect to D if it satisﬁes the following two conditions:

53 54 55

γ R(α ,β) (π D ) ≥ γC(α ,β) (π D ); (α ,β) (α ,β) (π D ) < γ R (π D ). (2) for any subset R ⊆ R, γ R (1)

59 60

17

29

55 56

16

28

4.1. Positive region related attribute reduct

52 53

14

27

44 46

13

21

43 45

12

20

Attribute reduction plays an important role in rough set theory and machine learning ﬁeld. There are two related issues should be considered in a new deﬁnition of attribute reduct. The ﬁrst issue is the jointly suﬃcient condition, which means choosing appropriate criteria. Based on the attribute reduct, the values of given criteria will be preserved or improved. The second issue is the individually necessary condition, which guarantees the reduct is a minimal subset of attributes, and none of its subsets is a reduct. In this section, we deﬁne two kinds of attribute reducts in NDTRS based on different criteria.

36 37

9

19

4. Attribute reduction in neighborhood based decision-theoretic rough set model

27 28

8

15

Theorem 3 shows that the positive region monotonously decreases with larger α , and Theorem 4 shows that the boundary region monotonously decreases with larger β . These two theorems suggest that we can adjust the size of the positive region or the boundary region of X by modifying the values of (α , β).

21 22

6

11

(α ,β2 )

Proof. For any y ∈ BND B

19 20

5

10

15 16

4

7

(α ,β2 )

11 12

1

56 57 58 59

(α ,β)

In this deﬁnition, the quantitative criterion γ R (π D ) is applied. The condition (2) is also changed to examine all subsets of the reduct. Compared to Deﬁnition 3, Deﬁnition 4 can obtain the attribute reduct that induces a larger positive region.

60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.7 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1

7

4.2. Minimum cost attribute reduct

1

2 3 4 5 6 7

2

In PRS models, qualitative criteria are not suitable for deﬁning attribute reduct, because the decision monotonicity does not always hold. Therefore, Jia et al. [15] suggested that minimum cost is a better criterion to deﬁne the attribute reduct in DTRS, because the Bayesian decision procedure helps to make decisions with minimum cost based on observed evidence. Similarly, we can also deﬁne the minimum cost attribute reduct in NDTRS. First, based on attribute set B ⊆ C , the cost formulation is rewritten as follows.

8 9 10 11 12

COST B =

+

13 15

x j ∈BND B

16 17 18 19 20 21 22

+

25

28

(α ,β)

xk ∈NEG B

12 13

(π D )

14

pk · λ N P ,

(17)

31 32

17

where xi is classiﬁed into the positive region while its conditional probability p i > α ; x j is classiﬁed into the boundary region while its conditional probability β ≤ p j ≤ α ; and xk is classiﬁed into the negative region while its conditional probability pk < β , respectively. As the decision monotonicity does not always hold in NDTRS, the decision cost may be decreased if we remove some attributes. To get less decision cost, we can deﬁne a reducing cost attribute reduct as follows.

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

61

19 20 21 22 24 25 26

(1) COST R ≤ COSTC , (2) ∀ R ⊂ R, COST R > COST R .

27 28 29

In this deﬁnition, reducing cost attribute reduct is a subset of C and its induced decision cost will be reduced or unchanged. Furthermore, for most users in their decision procedures, it will be better to obtain the smallest cost if they could. Therefore, we can deﬁne a minimum cost attribute reduct as follows.

30 31 32 33

Deﬁnition 6. In a decision table S =, given , R ⊆ C is called a minimum cost attribute reduct if it satisﬁes:

34

(1) R = arg min R ⊆C {COST R }, (2) ∀ R ⊂ R, COST R > COST R .

37

35 36 38 39

Based on conditions (1) and (2), we can conclude that the minimum cost attribute reduct is a minimal set of condition attributes with minimal decision cost.

40 41 42 43

4.3. Attribute reduction algorithms

44

As computing all reducts is an NP-hard problem [36], many heuristic algorithms for ﬁnding one reduct have been investigated in [5,13,19,55]. A heuristic algorithm usually contains two parts: heuristics and search strategy. For the heuristics in an attribute reduction heuristic algorithm, a ﬁtness function which considers the criteria to be (α ,β) optimized is usually adopted. For the positive region preservation based attribute reduct in Deﬁnition 3, POS R (π D ) is the (α ,β)

heuristics. For the positive region extension based attribute reduct in Deﬁnition 4, γ R (π D ) is the heuristics. Similarly, for Deﬁnition 5 and Deﬁnition 6, COST R is the heuristics. With regard to the search strategies for designing a heuristic algorithm, two kinds of strategies are considered. One is directional and the other one is nondirectional search strategy. The directional search strategy can be further categorized into deletion method, addition method and addition–deletion method [51]. Nondirectional search strategy is usually applied in evolutionary algorithms, swarm algorithms and other population-based meta-heuristic algorithms for optimization problems. In this paper, we apply the addition–deletion method for the sake of simplicity. By applying the addition–deletion method, we need to further deﬁne the signiﬁcance of each attribute to determine the order of condition attributes. In this paper, we use an inner signiﬁcance which is deﬁned as follows.

59 60

18

23

Deﬁnition 5. In a decision table S =, given , R ⊆ C is a reducing cost attribute reduct if the following conditions are satisﬁed:

33 34

15 16

(π D )

29 30

7

11

( p j · λ B P + (1 − p j ) · λ B N )

26 27

6

10

23 24

5

9

(π D )

(α ,β)

14

4

8

(1 − p i ) · λ P N

(α ,β)

xi ∈POS B

3

45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

Deﬁnition 7. In a decision table S =, given , B ⊆ C and a ∈ B. The inner signiﬁcance of a in B based on the ﬁtness function f is deﬁned as:

60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.8 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

8

Siginner (a, B , D ) =

1 2

| f B (π D ) − f B −{a} (π D )| . | f B (π D )|

(18)

3 4 5 6

(α ,β)

Siginner (a, C , D ) =

8 9

|POSC

(α ,β)

(π D )| − |POSC −{a} (π D )| (α ,β)

|POSC

(π D )|

(19)

17 18 19 20 21 22 23 24 25 26 27 28

31 32 33 34 35 36 37 38

12 13

Algorithm 1 A heuristic approach to positive region preservation based attribute reduct.

14

Input: A decision table. Output: A reduct R. BEGIN 1. R = ∅, G = C ; 2. rank all the attributes in G according to their signiﬁcance values; 3. 3. 4. 5. 6. 7.

(α ,β)

15 16 17 18

(α ,β)

(π D ) < POSC (π D ) and G = ∅ WHILE POS R select the ﬁrst attribute a ∈ G with the maximum signiﬁcance value; G = G − {a}; R = R ∪ {a}; END WHILE FOR each r ∈ R (α ,β)

(α ,β)

8. IF POS R −{r } (π D ) == POS R 9. R = R − {r }; 10. END IF 11. END FOR 12. output R; END BEGIN

19 20 21 22 23

(π D )

24 25 26 27 28 29

Similar to the positive region preservation based attribute reduct, we can design a heuristic approach to the minimum cost attribute reduct as follows. First, the ﬁtness function f is denoted as the decision cost: COST B , and the signiﬁcance of condition attribute a is deﬁned as:

Siginner (a, C , D ) =

COSTC −{a} − COSTC COSTC

.

(20)

Second, the heuristic approach to the minimum cost attribute reduct is described in Algorithm 2.

39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

58 59 60 61

30 31 32 33 34 35 36 37 38 39

Algorithm 2 A heuristic approach to minimum cost attribute reduct. Input: A decision table. Output: A reduct R. BEGIN 1. R = ∅, G = C ; 2. rank all the attributes in G according to their signiﬁcance values; 3. WHILE COST R > COSTC and G = ∅ 4. select the ﬁrst attribute a ∈ G with the maximum signiﬁcance value; 5. G = G − {a}; 6. R = R ∪ {a}; 7. END WHILE 8. FOR each r ∈ R 9. IF COST R −{r } <= COST R 10. R = R − {r }; 11. END IF 12. END FOR 13. output R; END BEGIN

56 57

8

11

29 30

6

10

12

16

5

9

Now, a heuristic approach to the positive region preservation based attribute reduct is described in Algorithm 1.

11

15

4

7

.

10

14

2 3

In the heuristic approach for positive region preservation based attribute reducts, the ﬁtness function f is denoted as (α ,β) the cardinality of POS B (π D ), and the signiﬁcance of condition attribute a is deﬁned as:

7

13

1

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

It should be noted that both reducts from above two approaches are approximate because they do not certainly satisfy the individually necessary condition. As mentioned above, the monotonicity of the criteria with respect to the set of inclusion of attributes may not hold in NDTRS, then it is not enough to guarantee the result is a reduct with a minimal set of condition attributes by checking the subset R − {r } only. Theoretically, it is still an NP-hard problem when checking all possible subsets of a candidate result. Fortunately, in most situations (shown in our experiments), the result is a reduct.

57 58 59 60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.9 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1

9

5. Three-way decisions based neighborhood classiﬁer for NDTRS

1

2 3 4 5 6 7 8 9 10 11 12 13

2

In artiﬁcial intelligence, the purpose of machine learning is to generalize the “knowledge” from training data and predict the test objects. By considering the phase of generalization, machine learning algorithms are usually classiﬁed into two groups: one is eager learning and the other one is lazy learning. In eager learning methods, such as SVM, Naive Bayesian, ID3 and so on, the generalization is carried out beyond the training data before predicting the new objects. In contrast, the lazy learning methods which are also called instance based learning methods [1], simply store training data (or only minor processing) and defer processing until new objects coming. Compared to the single global hypothesis in eager learning, a richer hypothesis space is applied in lazy learning methods since they use many local information. k-NN is a typical lazy learning algorithm [4]. In the classical neighborhood rough set model, a neighborhood classiﬁer (NEC) was also introduced to target the classiﬁcation task [11]. As k-NN and NEC methods do not consider the decision cost, they are not suitable to apply to NDTRS directly. Therefore, based on the three-way decisions and NEC, we propose a three-way decisions based neighborhood classiﬁer for NDTRS in this paper.

14 15

18 19 20 21

24 25 26 27 28 29 30

33 34 35 36 37 38 39

44 45 46 47 48 49 50 51 52 53 54

57 58

61

10 11 12 13

17 18 19 20 21

25 26 27 28 29 30 31

The basis of NEC is the general idea of estimating the class of an object from its neighbors. For the size of neighbors, the threshold δ is dynamically assigned based on the local and global information around s.

32 33 34

(21)

35

In this equation, xi (i = 1, . . . , n) is the set of training objects, min( (xi , s)) and max( (xi , s)) denote the minimal and the maximal values of distance between xi and the test object s, respectively. For the value range of ω , the authors [11] suggested that ω should take values in the range [0, 0.1].

37

δ = min( (xi , s)) + ω · (max( (xi , s)) − min( (xi , s))), ω ≤ 1.

36 38 39 40

5.2. Three-way decisions based neighborhood classiﬁer (TDNEC)

41 42

NEC is a two-way decisions neighborhood classiﬁer, as the class D j with the majority training objects in δ(s) is assigned to s. The advantage of two-way decisions classiﬁers is that it can predict the test object simply and quickly. However, this is often accompanied by higher prediction error rate and more decision cost. Assume an object is ambiguous when it has same or approximate probabilities of being in two or more classes, such as 51% for D i and 49% for D j . Applying the majority principle to predict an ambiguous object may lead to a wrong result with a high probability. In many real applications, one cannot make a decision immediately due to the lack of suﬃcient information. In this regard, deferment, a third choice, is usually applied. The three types of choices of acceptance, rejection and deferment, is called as three-way decisions. Three-way decisions is based on Bayesian decision theory which is the principle of minimization of decision cost. In a decision table with given cost functions, we can compute the thresholds α and β according to Equation (3). For a binary classiﬁcation problem, assume D + is the positive class and D − is the negative class, then the three-way decisions can be described as follows.

43 44 45 46 47 48 49 50 51 52 53 54 55

(P) If p ( D + |δ B (x)) > α , x is classiﬁed into D + . (B) If β ≤ p ( D + |δ B (x)) ≤ α , x needs further examination. (N) If p ( D + |δ B (x)) < β , x is classiﬁed into D − .

59 60

9

24

Input: Training set: ; Test object: s; Threshold δ ; Specify the norm used. Output: Class of s. BEGIN 1. compute the distance between s and xi ∈ U with the used norm; 2. ﬁnd the objects in the neighborhood δ(s) of s; 3. ﬁnd the class D j with the majority training objects in δ(s); 4. assign D j to the test s; END BEGIN

55 56

8

23

Algorithm 3 Neighborhood classiﬁers (NEC) [11].

42 43

7

22

40 41

6

16

In decision problems, one usually makes a binary decision, which contains two choices: acceptance or rejection. We call this kind of decision as two-way decisions. By reviewing current researches on classiﬁcation problems, most classiﬁers make two-way decisions. In the neighborhood rough set model, Hu et al. [11] also proposed a two-way decisions neighborhood classiﬁer (NEC). As our proposed three-way decisions neighborhood classiﬁer is an extension of NEC, we refer NEC and list it in Algorithm 3.

31 32

5

15

5.1. Two-way decisions and NEC

22 23

4

14

16 17

3

56 57 58 59

The acceptance decision classiﬁes x into D + , the rejection decision classiﬁes x into D − , and the deferment decision needs further examination to classify x, respectively.

60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.10 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

10

1

1

Table 2 Brief description of the data sets.

2 3 4 5 6 7 8 9 10 11 12

2

Data sets (abbreviation)

Objects

Condition attributes

Classes

glass hepatitis ionosphere iris wisconsin diagnostic breast cancer (wdbc) wisconsin prognostic breast cancer (wpbc) vertebral column (vertebral) image segmentation (image) sonar, mines vs. rocks (sonar) wine recognition (wine)

214 155 351 150 569 198 310 210 208 178

9 19 34 4 30 33 6 19 60 13

6 2 2 3 2 2 3 7 2 2

13 14 15 16 17

20 21

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

48 49 50

Based on three-way decisions and NEC, we will introduce a three-way decisions based neighborhood classiﬁer in the following Algorithm 4. The purpose of the classiﬁer is to reduce misclassiﬁcation rate and decision cost. Algorithm 4 Three-way decisions based neighborhood classiﬁer (TDNEC). Input: Training set: ; Test object: s; Parameter ω; Cost matrix {λi j }; Specify the norm used. Output: Class of s. BEGIN 1. compute the decision threshold α , β based on {λi j }; 2. FOR each x in U 3. compute the distance (x, s) between x and s with the used norm; 4. MIN = min( (x, s)); 5. MAX = max( (x, s)); 6. END FOR 7. δ(s) = MIN + ω · (MAX − MIN); 8. p = p ( D max (δ(s))|δ(s)); /* compute the probability of the dominant decision class of objects in δ(s).*/ 9. IF p > α 10. assign D max (δ(s))|δ(s) to test object s; 11. ELSE IF β ≤ p ≤ α 12. s is in the boundary region of D max (δ(s))|δ(s); 13. ELSE 14. s is rejected to classify into any class; 15. END IF END BEGIN

In the three-way decisions based neighborhood classiﬁer, s could be rejected to classify into any class if its probability is less than β . This kind of procedure is also applied in learning with rejection methods [3] similarly. However, if the rejection procedure is not permitted in a real application, we can merge the boundary region and the negative region simply, which means β does not need to appear in the algorithm.

55 56 57 58 59 60 61

8 9 10 11 12 14 15 16 17 19 20 21 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 47 48 49 50 51

6. Experiments

53 54

7

46

51 52

6

22

|δ (x)∩ D | where D max (δ B (x)) ∈ π D is a dominant decision class of the objects in δ B (x), i.e., D max (δ B (x)) = arg max D i ∈π D { B|δ (x)| i }. B

46 47

5

18

(P) If p ( D max (δ B (x))|δ B (x)) > α , x is classiﬁed into D max (δ B (x)). (B) If β ≤ p ( D max (δ B (x))|δ B (x)) ≤ α , x needs further examination. (N) If p ( D max (δ B (x))|δ B (x)) < β , x is not classiﬁed into any class,

22 23

4

13

For a multi-classiﬁcation problem, we can apply the one-vs.-all strategy to decompose the multi-classiﬁcation problem into several groups of binary classiﬁcation problems ﬁrst, and then compose the obtained three-way decisions results. Besides, we can also deal with multi-classiﬁcation problem directly. The dominant decision class could be regarded as the positive class, then the three-way decisions are deﬁned as following:

18 19

3

52 53

In this section, we conduct experiments to show the eﬃciency of our attribute reduction algorithms and three-way decisions based neighborhood classiﬁer. In our experiments, 10 UCI data sets [38] are used, with information shown in Table 2. For data sets glass, wdbc and wpbc, the corresponding id attribute is removed ﬁrst. Experimental setting can be found in Table 3. Since 10-fold cross validation are employed, we only present average results in the experiments. 10 different groups of cost functions are also generated randomly for each data set, which means we will run 10 times 10-fold cross validation for each classiﬁcation task. The values of δ and ω are set as suggested in Ref. [11], 0.25, 0.3 and 0.35 are tested for δ , several ω values between 0 and 0.1 are tested.

54 55 56 57 58 59 60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.11 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1 2 3 4

2 3

Parameter

Value

platform

Eclipse with WEKA (version 3.5) 10 groups, generated randomly Euclidean distance 2 {0.25, 0.3, 0.35} (0, 0.1] default values in WEKA 10-folds

{λi j }

6

distance

7

δ

9

1

Table 3 Experimental setting.

5

8

11

ω parameters for C4.5, k-NN, SVM cross validation

10

13 14 15

18 19 20 21 22 23 24 25 26 27 28

M1

M2

M3

glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine

1.6 ± 1.28 4.2 ± 2.75 2.1 ± 0.30 3.7 ± 0.46 5.0 ± 0.00 2.1 ± 0.70 6.0 ± 0.00 6.3 ± 1.00 6.1 ± 1.81 5.5 ± 0.50

7.0 ± 0.00 9.0 ± 0.00 9.2 ± 0.60 4.0 ± 0.00 8.6 ± 0.49 1.0 ± 0.00 5.5 ± 0.50 8.4 ± 0.70 12.8 ± 0.98 5.3 ± 0.78

8.0 ± 0.00 9.9 ± 0.83 5.0 ± 0.00 3.0 ± 0.00 7.9 ± 0.54 4.8 ± 0.40 6.0 ± 0.00 4.9 ± 0.30 6.7 ± 0.46 4.0 ± 0.00

average

4.3 ± 0.88

7.1 ± 0.41

6.0 ± 0.25

win/tie/loss

–

6/2/2

5/2/3

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

61

13 14 15 17 18 19 20 21 22 23 24 25 26 27 28

31 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57

6.2. Experiments on classiﬁcation

59 60

12

32

As the classical rough set mainly aims to handle categorical data, most attribute reduction approaches have to discretize the numerical data ﬁrst. The preprocessing procedure increases the complexity of the reduction approach. However, the attribute reduction based on neighborhood rough set can deal with numerical data directly. To show the eﬃciency of our proposed deﬁnition of reducts and reduction approaches, two kinds of discretization methods are compared in our experiments: one is a kind of supervised discretization method and the other one is a kind of unsupervised discretization method. For the same numerical data, the comparing methods discretize it ﬁrst, and then the same heuristic approach is applied to obtain the reduct. For simplicity’s sake, our method, the supervised discretization method [41] and the unsupervised method [41] are denoted as M1, M2 and M3, respectively. Both the positive region preservation based attribute reduct and the minimum cost attribute reduct are implemented. Since a reduct can be explained as a minimal subset of attributes which satisﬁes a speciﬁc condition [17], we usually obtain the length of the reduct to measure its minimality. As different deﬁnitions of reduct satisfy different conditions, there does not exist a uniﬁed measure to evaluate the completeness of reducts. In this paper, we assume the reduction procedure is the preprocessing of a classiﬁcation task, then we compute the classiﬁcation performances of several classiﬁers based on derived reduct to measure its completeness. Table 4 and Table 5 show the lengths of the derived reducts based on different methods. For both positive region preservation based attribute reduct and minimum cost attribute reduct, our method produces the shortest average result than other two methods, and the supervised discretization method obtains the longest result. We also test the classiﬁcation accuracies of C4.5 based on the reducts derived from different methods. The results in the form of “mean ± standard deviation” are recorded in Table 6 and Table 7. The supervised discretization method can obtain the highest average accuracy and one reason is that it adopts most attributes. Our method is better than the unsupervised discretization method. From the above mentioned 4 tables, we can ﬁnd that the results are very similar in most data sets when two different reduct deﬁnitions are used, which means the two kinds of deﬁnitions can be equally transferred based on speciﬁc δ and cost functions.

57 58

9

30

6.1. Experiments on attribute reduction

32 34

8

29

30

33

7

16

Data sets

29 31

6

11

Table 4 Comparison of lengths of positive region preservation based attribute reducts based on different methods. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 signiﬁcance level are also recorded.

16 17

5

10

11 12

4

58 59

In this section, we compare three-way decisions based neighborhood classiﬁer (TDNEC) with NEC, C4.5, k-NN, and SVM on several criteria including accuracy, F-measure and misclassiﬁcation cost.

60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.12 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

12

1

1

Table 5 Comparison of lengths of minimum cost attribute reducts based on different methods. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 signiﬁcance level are also recorded.

2 3 4

2 3 4

5

Data sets

M1

M2

M3

5

6

glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine

3.9 ± 1.14 4.2 ± 2.75 2.1 ± 0.30 3.7 ± 0.46 5.0 ± 0.00 2.1 ± 0.70 6.0 ± 0.00 6.3 ± 1.00 6.1 ± 1.81 5.5 ± 0.50

6.7 ± 0.64 9.0 ± 0.00 9.2 ± 0.60 2.5 ± 0.67 8.6 ± 0.49 1.0 ± 0.00 5.3 ± 0.46 7.5 ± 1.30 12.8 ± 0.98 5.3 ± 0.78

8.0 ± 0.00 9.9 ± 0.83 5.0 ± 0.00 3.0 ± 0.00 7.9 ± 0.54 4.8 ± 0.40 6.0 ± 0.00 4.9 ± 0.30 6.7 ± 0.46 4.0 ± 0.00

6

15

average

4.5 ± 0.87

6.8 ± 0.59

6.0 ± 0.25

15

16

win/tie/loss

–

6/1/3

5/2/3

16

7 8 9 10 11 12 13 14

7 8 9 10 11 12 13 14

17

17

18

18

19

19

Table 6 Comparison of classiﬁcation accuracies of C4.5 based on different positive region preservation based attribute reducts. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 signiﬁcance level are also recorded.

20 21 22

20 21 22

23

23

24 25 26 27 28 29 30 31 32 33 34 35

Data sets

M1

M2

M3

glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine

0.3463 ± 0.1246 0.7117 ± 0.1075 0.8345 ± 0.0786 0.8280 ± 0.0855 0.8907 ± 0.0283 0.6860 ± 0.0695 0.7442 ± 0.0789 0.5119 ± 0.1246 0.5463 ± 0.1223 0.7343 ± 0.1125

0.5727 ± 0.1442 0.7233 ± 0.0956 0.8483 ± 0.0785 0.9493 ± 0.0619 0.8784 ± 0.0540 0.7129 ± 0.0905 0.7755 ± 0.0262 0.5062 ± 0.1502 0.6774 ± 0.1241 0.8108 ± 0.1087

0.3208 ± 0.1113 0.7360 ± 0.0734 0.7989 ± 0.0850 0.7800 ± 0.1301 0.8201 ± 0.0660 0.7629 ± 0.0199 0.5768 ± 0.1017 0.2586 ± 0.1546 0.5370 ± 0.1770 0.4696 ± 0.1809

average

0.6834 ± 0.0932

0.7455 ± 0.0934

0.6061 ± 0.1100

win/tie/loss

–

0/7/3

3/6/1

24 25 26 27 28 29 30 31 32 33 34 35

36

36

37

37

Table 7 Comparison of classiﬁcation accuracies of C4.5 based on different minimum cost attribute reducts. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 signiﬁcance level are also recorded.

38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

39 40 41 42

Data sets

M1

M2

M3

glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine

0.4145 ± 0.1169 0.6960 ± 0.1051 0.8371 ± 0.0802 0.8313 ± 0.0721 0.8874 ± 0.0321 0.6861 ± 0.0783 0.7439 ± 0.0759 0.5110 ± 0.1197 0.5510 ± 0.1267 0.7182 ± 0.1294

0.5703 ± 0.1419 0.7288 ± 0.0900 0.8481 ± 0.0759 0.9560 ± 0.0518 0.8758 ± 0.0546 0.7134 ± 0.0891 0.7297 ± 0.0649 0.5229 ± 0.1267 0.6837 ± 0.1325 0.8074 ± 0.1093

0.3172 ± 0.1125 0.7277 ± 0.0771 0.7963 ± 0.0835 0.7640 ± 0.1287 0.8192 ± 0.0655 0.7629 ± 0.0199 0.5726 ± 0.1064 0.2629 ± 0.1510 0.5321 ± 0.1833 0.4764 ± 0.1857

average

0.6876 ± 0.0936

0.7436 ± 0.0937

0.6031 ± 0.1114

52

win/tie/loss

–

0/7/3

3/6/1

54

53 54

38

43 44 45 46 47 48 49 50 51 53

55 56 57 58 59 60 61

55

6.2.1. Comparison experiments on 10 data sets Assume n P P means the number of objects being classiﬁed correctly, n B P means the number of objects with deferment decisions, and n N P means the number of objects being classiﬁed incorrectly. The accuracy is deﬁned as:

accuracy =

nP P n P P + nN P

56 57 58 59 60

.

(22)

61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.13 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1 2 3

13

Table 8 Comparison of accuracies of different classiﬁers. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 signiﬁcance level are also recorded.

1 2 3

4

Data sets

TDNEC

NEC

C4.5

k-NN

SVM

4

5

glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine

0.7038 ± 0.0043 0.8062 ± 0.0259 0.8956 ± 0.0197 0.9675 ± 0.0088 0.9512 ± 0.0087 0.7995 ± 0.0493 0.8219 ± 0.0078 0.8961 ± 0.0462 0.8382 ± 0.0000 0.8422 ± 0.0003

0.6893 ± 0.0127 0.7155 ± 0.0103 0.8709 ± 0.0050 0.9660 ± 0.0080 0.9329 ± 0.0025 0.7641 ± 0.0024 0.7771 ± 0.0064 0.6967 ± 0.0098 0.8226 ± 0.0092 0.8360 ± 0.0121

0.6720 ± 0.0124 0.7742 ± 0.0243 0.9026 ± 0.0127 0.9487 ± 0.0055 0.9313 ± 0.0061 0.7470 ± 0.0201 0.8097 ± 0.0135 0.8795 ± 0.0123 0.7337 ± 0.0340 0.9371 ± 0.0102

0.6949 ± 0.0106 0.7974 ± 0.0097 0.8684 ± 0.0040 0.9527 ± 0.0049 0.9548 ± 0.0040 0.7172 ± 0.0101 0.7677 ± 0.0099 0.8662 ± 0.0085 0.8692 ± 0.0101 0.9511 ± 0.0038

0.5687 ± 0.0143 0.8639 ± 0.0083 0.8803 ± 0.0078 0.9620 ± 0.0055 0.9764 ± 0.0025 0.7657 ± 0.0112 0.7587 ± 0.0064 0.8824 ± 0.0060 0.7721 ± 0.0223 0.9899 ± 0.0024

5

average

0.8522 ± 0.0171

0.8071 ± 0.0078

0.8336 ± 0.0151

0.8440 ± 0.0076

0.8420 ± 0.0087

14

win/tie/loss

–

8/2/0

7/2/1

5/3/2

5/2/3

6 7 8 9 10 11 12 13 14 15

20

8 9 10 11 12 13

16

17 19

7

15

16 18

6

17

Table 9 The F-measures of different classiﬁers. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 signiﬁcance level are also recorded.

18 19 20

21

Data sets

TDNEC

NEC

C4.5

k-NN

SVM

21

22

glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine

0.8169 ± 0.0019 0.7385 ± 0.0831 0.9038 ± 0.0264 0.9719 ± 0.0153 0.9384 ± 0.0235 0.8194 ± 0.0773 0.8648 ± 0.0014 0.6633 ± 0.0578 0.9039 ± 0.0000 0.8978 ± 0.0009

0.8160 ± 0.0089 0.8341 ± 0.0070 0.9310 ± 0.0029 0.9827 ± 0.0041 0.9653 ± 0.0013 0.8663 ± 0.0016 0.8746 ± 0.0040 0.8212 ± 0.0068 0.9026 ± 0.0055 0.9106 ± 0.0072

0.8037 ± 0.0089 0.8725 ± 0.0156 0.9487 ± 0.0071 0.9736 ± 0.0029 0.9644 ± 0.0032 0.8550 ± 0.0132 0.8948 ± 0.0083 0.9359 ± 0.0070 0.8460 ± 0.0229 0.9675 ± 0.0054

0.8199 ± 0.0074 0.8873 ± 0.0060 0.9295 ± 0.0023 0.9758 ± 0.0026 0.9769 ± 0.0021 0.8353 ± 0.0069 0.8686 ± 0.0063 0.9283 ± 0.0049 0.9300 ± 0.0058 0.9749 ± 0.0020

0.7250 ± 0.0116 0.9269 ± 0.0048 0.9363 ± 0.0044 0.9806 ± 0.0029 0.9881 ± 0.0013 0.8672 ± 0.0072 0.8628 ± 0.0041 0.9375 ± 0.0034 0.8712 ± 0.0142 0.9949 ± 0.0012

22

31

average

0.8519 ± 0.0288

0.8904 ± 0.0049

0.9062 ± 0.0095

0.9127 ± 0.0046

0.9091 ± 0.0055

32

win/tie/loss

–

0/3/7

2/3/5

0/4/6

2/3/5

23 24 25 26 27 28 29 30

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

24 25 26 27 28 29 30 31 32

33 34

23

33 34

The coverage is deﬁned as:

coverage =

35

n P P + nN P n P P + nN P + n B P

.

(23)

As TDNEC is based on three-way decisions, some objects would be classiﬁed into the boundary region, therefore, the coverage value of TDNEC is usually less than 1. In most situations, there exists a kind of trade-off relation between the accuracy value and the coverage value, then F-measure is adopted to measure the performance of classiﬁers. Based on accuracy and coverage, the classical F-measure is deﬁned as:

F =2·

accuracy · coverage accuracy + coverage

.

(24)

37 38 39 40 41 42 43 44

As λ N P denotes the cost for classifying an object into the negative region when it belongs to the positive region, and λ B P denotes the cost for classifying an object into the boundary region when it belongs to the positive region, we can deﬁne the misclassiﬁcation cost as following:

cost = n N P · λ N P + n B P · λ B P .

36

45 46 47 48

(25)

Table 8 gives the comparison results of classiﬁcation accuracy based on different classiﬁers. Table 9 gives the comparison results of F-measure based on different classiﬁers. Table 10 shows the comparison results of misclassiﬁcation cost. It is worth noting that we do not implement the pairwise two-tailed t-test in Table 10 because the cost functions are generated randomly in each group and their distribution does not satisfy the normal assumption. From these tables, we can have the following observations:

55

49 50 51 52 53 54 55

56

• For classiﬁcation accuracy, TDNEC is superior to other algorithms on most data sets. SVM gets the second-best result. As

56

57

TDNEC is a kind of three-way decisions method, all ambiguous objects will be deferred for further-examination, which leads to the result of a high accuracy. • TDNEC gets the lowest average F-measure. From the result, we can see that TDNEC can get an approximate result with other algorithms on most data sets except hepatitis and image. The reason of the low F-measure of TDNEC is that its coverage is usually less than 1. The values of coverage in other algorithms are always equal to 1.

57

58 59 60 61

58 59 60 61

JID:IJA AID:7845 /FLA

1 2 3

[m3G; v1.168; Prn:13/11/2015; 9:41] P.14 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

14

1

Table 10 The misclassiﬁcation cost of different classiﬁers. The best results are highlighted by boldface.

2

Data sets

TDNEC

NEC

C4.5

k-NN

SVM

3

25.2325 ± 11.1476 10.5863 ± 5.9879 14.6892 ± 6.1744 2.4633 ± 0.9457 11.7323 ± 4.7330 20.5434 ± 7.7416 22.7879 ± 12.2406 11.4490 ± 7.9794 15.6047 ± 7.2257 12.4796 ± 7.8355

26.5346 ± 11.7793 15.8433 ± 7.2284 15.6843 ± 6.2866 2.6633 ± 1.0299 13.7977 ± 5.7555 21.5429 ± 7.8297 27.5139 ± 14.7655 29.5834 ± 12.0772 16.2253 ± 7.6764 13.8086 ± 9.0180

28.1160 ± 12.2956 12.9123 ± 6.2377 12.0792 ± 5.5903 3.5004 ± 1.6359 14.4259 ± 5.7277 22.6148 ± 6.9049 24.9459 ± 13.7711 11.6732 ± 4.8068 25.6541 ± 13.0811 5.1059 ± 3.6624

26.3724 ± 12.0564 11.5662 ± 5.3847 16.1878 ± 6.7594 3.0897 ± 1.0928 9.5654 ± 3.9699 25.5864 ± 8.9236 30.6939 ± 16.8902 12.9565 ± 5.3101 12.5437 ± 5.9366 3.8658 ± 2.5711

37.2347 ± 16.6630 7.7342 ± 3.4683 14.7266 ± 6.1755 2.5257 ± 1.0305 5.0550 ± 2.3537 21.4640 ± 8.7511 31.6467 ± 16.8751 11.4339 ± 4.7057 21.6677 ± 9.2152 0.7656 ± 0.5277

4

12

glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine

13

average

14.7568 ± 7.2011

18.3197 ± 8.3447

16.1028 ± 7.3714

15.2428 ± 6.8895

15.4254 ± 6.9766

13

4 5 6 7 8 9 10 11

14

• For misclassiﬁcation cost, TDNEC can get the best average result. From the theoretical analysis, we can say that this is

16

because TDNEC is based on Bayesian decision principle with the objective of minimizing the decision cost, even though the Bayesian decision cost is not exactly same as our deﬁned misclassiﬁcation cost in this paper.

17 18 19 21 22

25 26 27 28 29 30 31

34 35 36 37 38 39 40 41 42 43 44

49 50 51 52 53 54 55 56 57 58

11 12

15 16 17 18 20 21 22 24 25 26 27 28 29 30 31 33 34 35 36 37 38 39 40 41 42 43 44 46 47

We propose a neighborhood based decision-theoretic rough set model to deal with numerical data with noise in this paper. Based on the basic notions of NDTRS, two kinds of attribute reducts are deﬁned: one is the positive region related attribute reduct and the other one is the minimum cost attribute reduct. The heuristic approach to attribute reductions is also introduced. Experimental results show that the proposed two reductions can achieve the shortest reduct with a competitive classiﬁcation ability. The classiﬁcation schema of NDTRS is also discussed in this paper. The proposed three-way decisions based neighborhood classiﬁer can obtain a better classiﬁcation accuracy and a less decision cost rather than other classiﬁers, especially the two-way decisions based neighborhood classiﬁer. The main contribution of this paper is that the combined model NDTRS is a generalization of both neighborhood rough set models and decision-theoretic rough set models. To design faster reduction and classiﬁcation algorithms for large scale data will be further investigated in further work.

59

48 49 50 51 52 53 54 55 56 57 58 59

60 61

10

45

7. Conclusion and further work

47 48

9

32

6.2.3. Comparison experiments on large data sets To examine the classiﬁcation ability of TDNEC on large scale data sets, we also compare TDNEC with other 4 algorithms on two large data sets.1 One is Letter Recognition (letter for abbreviation) with 20 000 instances, 16 condition attributes and 26 classes, the other is isolet1+2+3+4 (isolet for abbreviation) with 6238 instances, 617 condition attributes and 26 classes. Since last experiment suggests that ω ∈ (0, 0.01] is a good choice for classiﬁcation, we try ω from 0.001 to 0.01 with step 0.001 in this experiment. The accuracy and F-measure for these two data sets are shown in Fig. 3. For accuracy, Fig. 3(a) shows that TDNEC can get the highest accuracy on data set letter and Fig. 3(c) shows that SVM can get the highest accuracy and TDNEC gets the second best on data set isolet. For F-measure, Fig. 3(b) and Fig. 3(d) tell us that TDNEC usually cannot obtain the best value. Compared to the above results shown in Tables 8 and 9, TDNEC has a similar classiﬁcation ability on large data sets. Although TDNEC plays well on both small and large data sets, designing eﬃcient algorithm for neighborhood decision-theoretic rough set model on large data sets is very important and it will be investigated further in future work.

45 46

8

23

6.2.2. Comparison experiments on inﬂuence of parameter ω We also conduct a series of experiments to test the inﬂuence of parameter ω on classiﬁcation eﬃciency of TDNEC. We try ω from 0.001 to 0.009 with step 0.001 in the ﬁrst group and try ω from 0.01 to 0.1 with step 0.01 in the second group. The classiﬁcation accuracy and F-measure based on 10-fold cross validation are recorded. Fig. 2(a) presents the classiﬁcation accuracy curves varying with ω and Fig. 2(b) presents the F-measure curves varying with ω for all data sets. For classiﬁcation accuracy, ω = 0.1 are optimal value for most data sets. For F-measure, we can ﬁnd that there are similar trends in these curves. The main tread is that the values of F-measure decrease when the values of ω increase. TDNEC can obtain a better F-measure when ω ∈ (0, 0.01].

32 33

7

19

Fig. 1 shows the comparison results of TDNEC and NEC on accuracy, F-measure and misclassiﬁcation cost. From these ﬁgures, we can see that TDNEC can obtain a higher accuracy and a lower misclassiﬁcation cost on all data sets. NEC can obtain a higher F-measure on all data sets.

23 24

6

14

15

20

5

60 1

Both data sets are also coming from UCI [38].

61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.15 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

15

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

17

17

18

18

19

19

20

20

21

21

22

22

23

23

24

24

25

25

26

26

27

27

28

28

29

29

30

30

31

31

32

32

33

33

34

34

35

35

36

36

37

37

38

38

39

39

40 41

40

Fig. 1. TDNEC vs. NEC on accuracy, F-measure and misclassiﬁcation cost for 10 data sets.

41

42

42

43

43

44

44

45

45

46

46

47

47

48

48

49

49

50

50

51

51

52

52

53

53

54

54

55

55

56

56

57

57

58

58

59

59

60 61

60

Fig. 2. Classiﬁcation accuracy and F-measure varying with

ω for 10 data sets.

61

JID:IJA AID:7845 /FLA

16

[m3G; v1.168; Prn:13/11/2015; 9:41] P.16 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

17

17

18

18

19

19

20

20

21

21

22

22

23

23

24

24

25

25

26

26

27

27

28

28

29

29

30

30

31

31

32

32

33

33

34

34

35

35

36

36

37

37

38

38

Fig. 3. Classiﬁcation accuracy and F-measure varying with

39

ω for 2 large data sets.

40 41

40

Acknowledgements

42 43 44

49 50 51 52 53 54 55 56 57 58 59 60 61

43 44 45

References

47 48

41 42

We would like to acknowledge the support for this work from the National Natural Science Foundation of China (Grant Nos. 61272083, 61403200), Natural Science Foundation of Jiangsu Province (Grant No. BK20140800).

45 46

39

46 47

[1] D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, Mach. Learn. 6 (1991) 37–66. [2] D.G. Chen, W.L. Li, X. Zhang, S. Kwong, Evidence-theory-based numerical algorithms of attribute reduction with neighborhood-covering rough sets, Int. J. Approx. Reason. 55 (3) (2014) 908–923. [3] C.K. Chow, On optimum recognition error and reject tradeoff, IEEE Trans. Inf. Theory 16 (1) (1970) 41–46. [4] T.M. Cover, P.E. Hart, Nearest neighbor pattern classiﬁcation, IEEE Trans. Inf. Theory 13 (1) (1967) 21–27. [5] J.H. Dai, Y.X. Li, Heuristic genetic algorithm for minimal reduction decision system based on rough set theory, in: Proceedings of ICMLC, 2002, pp. 4–6. [6] Y. Du, Q.H. Hu, P.F. Zhu, P.J. Ma, Rule learning for classiﬁcation based on neighborhood covering reduction, Inf. Sci. 181 (24) (2011) 5457–5467. [7] U.M. Fayyad, K.B. Irani, Multi-interval discretization of continuous-valued attributes for classiﬁcation learning, in: Proceedings of the International Joint Conference on Uncertainty in AI, 1993, pp. 1022–1027. [8] J.P. Herbert, J.T. Yao, Game-theoretic rough sets, Fundam. Inform. 108 (3–4) (2011) 267–286. [9] Q.H. Hu, D.R. Yu, J.F. Liu, C.X. Wu, Neighborhood rough set based heterogeneous feature selection, Inf. Sci. 178 (2008) 3577–3594. [10] Q.H. Hu, W. Pedrycz, D.R. Yu, J. Lang, Selecting discrete and continuous features based on neighborhood decision error minimization, IEEE Trans. Syst. Man Cybern., Part B, Cybern. 40 (2010) 137–150. [11] Q.H. Hu, D.R. Yu, Z.X. Xie, Neighborhood classiﬁers, Expert Syst. Appl. 34 (2008) 866–876. [12] Q.H. Hu, P.F. Zhu, Y.B. Yang, D. Yu, Large-margin nearest neighbor classiﬁers via sample weight learning, Neurocomputing 74 (4) (2011) 656–660. [13] R. Jensen, Q. Shen, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE Trans. Knowl. Data Eng. 16 (12) (2004) 1457–1471.

48 49 50 51 52 53 54 55 56 57 58 59 60 61

JID:IJA AID:7845 /FLA

[m3G; v1.168; Prn:13/11/2015; 9:41] P.17 (1-17)

W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

17

[14] X.Y. Jia, W.W. Li, L. Shang, J.J. Chen, An optimization viewpoint of decision-theoretic rough set model, in: Proceedings of RSKT, 2011, in: LNCS, vol. 6954, 2011, pp. 457–465. [15] X.Y. Jia, W.H. Liao, Z.M. Tang, L. Shang, Minimum cost attribute reduction in decision-theoretic rough set models, Inf. Sci. 219 (2013) 151–167. [16] X.Y. Jia, L. Shang, Three-way decisions versus two-way decisions on ﬁltering spam email, in: Transactions on Rough Sets XVIII, in: LNCS, vol. 8449, 2014, pp. 69–91. [17] X.Y. Jia, L. Shang, B. Zhou, Y.Y. Yao, Generalized attribute reduct in rough set theory, Knowl.-Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys. 2015.05.017. [18] W. Jin, A.K.H. Tung, J.W. Han, J. Wang, Ranking outliers using symmetric neighborhood relationship, in: Proceedings of PAKDD, 2006, pp. 577–593. [19] L.J. Ke, Z.R. Feng, Z.G. Ren, An eﬃcient ant colony optimization approach to attribute reduction in rough set theory, Pattern Recognit. Lett. 29 (9) (2008) 1351–1357. [20] W.W. Li, Z.Q. Huang, Q. Li, Three-way decisions based software defect prediction, Knowl.-Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015. 09.035. [21] W. Li, D.Q. Miao, W.L. Wang, N. Zhang, Hierarchical rough decision theoretic framework for text classiﬁcation, in: Proceedings of ICCI, 2010, pp. 484–489. [22] F. Li, M. Ye, X.D. Chen, An extension to rough c-means clustering based on decision-theoretic rough sets model, Int. J. Approx. Reason. 55 (1) (2014) 116–129. [23] H.X. Li, X.Z. Zhou, J.B. Zhao, D. Liu, Attribute reduction in decision-theoretic rough set model: a further investigation, in: Proceedings of RSKT, 2011, in: LNCS, vol. 6954, 2011, pp. 466–475. [24] J.Y. Liang, R. Li, Y.H. Qian, Distance: a more comprehensible perspective for measures in rough set theory, Knowl.-Based Syst. 27 (2012) 126–136. [25] D.C. Liang, D. Liu, W. Pedrycz, P. Hu, Triangular fuzzy decision-theoretic rough sets, Int. J. Approx. Reason. 54 (8) (2013) 1087–1106. [26] T.Y. Lin, Neighborhood systems and approximation in database and knowledge base systems, in: Proceedings of the Fourth International Symposium on Methodologies of Intelligent Systems, Poster Session, 1989, pp. 75–86. [27] G.P. Lin, Y.H. Qian, J.J. Li, NMGRS: neighborhood-based multigranulation rough sets, Int. J. Approx. Reason. 53 (7) (2012) 1080–1093. [28] P. Lingras, M. Chen, D.Q. Miao, Rough multi-category decision theoretic framework, in: Proceedings of RSKT, 2008, pp. 676–683. [29] D. Liu, T.R. Li, H.X. Li, A multiple-category classiﬁcation approach with decision-theoretic rough sets, Fundam. Inform. 115 (2–3) (2012) 173–188. [30] D. Liu, T.R. Li, D.C. Liang, Three-way government decision analysis with decision-theoretic rough sets, Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 20 (1) (2012) 119–132. [31] D. Liu, H.X. Li, X.Z. Zhou, Two decades’ research on decision-theoretic rough sets, in: Proceedings of ICCI, 2010, pp. 968–973. [32] Z. Pawlak, Rough sets, Int. J. Comput. Inf. Sci. 11 (1982) 341–356. [33] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Dordrecht, MA, 1991. [34] Z. Pawlak, S.K.M. Wong, W. Ziarko, Rough sets: probabilistic versus deterministic approach, Int. J. Man-Mach. Stud. 29 (1988) 81–95. [35] Y.H. Qian, H. Zhang, Y.L. Sang, J.Y. Liang, Multigranulation decision-theoretic rough sets, Int. J. Approx. Reason. 55 (1) (2014) 225–237. [36] A. Skowron, C. Rauszer, The discernibility matrices and functions in information systems, in: Intelligent Decision Support, in: Theory and Decision Library, vol. 11, 1992, pp. 331–362. [37] D. Slezak, W. Ziarko, The investigation of the Bayesian rough set model, Int. J. Approx. Reason. 40 (2005) 81–91. [38] UC Irvine machine learning repository, http://archive.ics.uci.edu/ml/. [39] H. Wang, Nearest neighbors by neighborhood counting, IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 942–953. [40] J. Wang, A. Woznica, A. Kalousis, Learning neighborhoods for metric learning, in: Proceedings of ECML/PKDD, 2012, pp. 223–236. [41] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, ACM SIGKDD Explor. Newsl. 11 (1) (2009). [42] W.Z. Wu, J.S. Mi, W.X. Zhang, Generalized fuzzy rough sets, Inf. Sci. 151 (2003) 263–282. [43] W.Z. Wu, W.X. Zhang, Neighborhood operator systems and approximations, Inf. Sci. 144 (2002) 201–217. [44] Y.Y. Yao, S.KM. Wong, A decision theoretic framework for approximating concepts, Int. J. Man-Mach. Stud. 37 (6) (1992) 793–809. [45] Y.Y. Yao, S.K.M. Wong, P. Lingras, A decision-theoretic rough set model, in: Proceedings of the 5th International Symposium on Methodologies for Intelligent Systems, 1990, pp. 17–25. [46] Y.Y. Yao, Relational interpretations of neighborhood operators and rough set approximation operators, Inf. Sci. 111 (1) (1998) 239–259. [47] Y.Y. Yao, Probabilistic rough set approximations, Int. J. Approx. Reason. 49 (2008) 255–271. [48] Y.Y. Yao, Three-way decisions with probabilistic rough sets, Inf. Sci. 180 (2010) 341–353. [49] Y.Y. Yao, The superiority of three-way decisions in probabilistic rough set models, Inf. Sci. 181 (6) (2011) 1080–1096. [50] Y.Y. Yao, Y. Zhao, Attribute reductions in decision-theoretic rough set models, Inf. Sci. 178 (2008) 3356–3373. [51] Y.Y. Yao, Y. Zhao, J. Wang, On reduct construction algorithms, in: Proceedings of RSKT, 2006, pp. 297–304. [52] Y.Y. Yao, B. Zhou, Naive Bayesian rough sets, in: Proceedings of RSKT, 2010, in: LNAI, vol. 6401, 2010, pp. 719–726. [53] H. Yu, Q.F. Zhou, A cluster ensemble framework based on three-way decisions, in: Proceedings of RSKT, 2013, pp. 302–312. [54] J.B. Zhang, T.R. Li, D. Ruan, D. Liu, Neighborhood rough sets for dynamic data mining, Int. J. Intell. Syst. 27 (2012) 317–342. [55] W.X. Zhang, J.S. Mi, W.Z. Wu, Approaches to knowledge reductions in inconsistent systems, Int. J. Intell. Syst. 18 (2003) 989–1000. [56] X.R. Zhao, B.Q. Hu, Fuzzy and interval-valued fuzzy decision-theoretic rough set approaches based on fuzzy probability measure, Inf. Sci. 298 (2015) 534–554. [57] B. Zhou, Multi-class decision-theoretic rough sets, Int. J. Approx. Reason. 55 (1) (2014) 211–224. [58] X.Z. Zhou, H.X. Li, A multi-view decision model based on decision-theoretic rough set, in: Proceedings of RSKT2009, in: LNCS, vol. 5589, 2009, pp. 650–657. [59] B. Zhou, Y.Y. Yao, J.G. Luo, Cost-sensitive three-way email spam ﬁltering, J. Intell. Inf. Syst. 42 (1) (2014) 19–45. [60] W. Ziarko, Variable precision rough set model, J. Comput. Syst. Sci. 46 (1993) 39–59.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

52

52

53

53

54

54

55

55

56

56

57

57

58

58

59

59

60

60

61

61

Neighborhood based decision-theoretic rough set models

Neighborhood based decision-theoretic rough set models

Recommend Documents