JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.1 (1-17)
International Journal of Approximate Reasoning ••• (••••) •••–•••
1
Contents lists available at ScienceDirect
2
1 2
3
International Journal of Approximate Reasoning
4 5
3 4 5
6
6
www.elsevier.com/locate/ijar
7
7
8
8
9
9
10
10
11 12 13 14 15
Neighborhood based decision-theoretic rough set models Weiwei Li a,b , Zhiqiu Huang a , Xiuyi Jia c,∗ , Xinye Cai a a
16
b
17
c
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
18
23 24 25
i n f o
28 29
14 15 16 17
a b s t r a c t
20 21
Article history: Received 1 April 2015 Received in revised form 22 October 2015 Accepted 6 November 2015 Available online xxxx
26 27
13
19
a r t i c l e
21 22
12
18
19 20
11
Keywords: Neighborhood relation Decision-theoretic rough set model Attribute reduction
30 31 32 33 34
As an extension of Pawlak rough set model, decision-theoretic rough set model (DTRS) adopts the Bayesian decision theory to compute the required thresholds in probabilistic rough set models. It gives a new semantic interpretation of the positive, boundary and negative regions by using three-way decisions. DTRS has been widely discussed and applied in data mining and decision making. However, one limitation of DTRS is its lack of ability to deal with numerical data directly. In order to overcome this disadvantage and extend the theory of DTRS, this paper proposes a neighborhood based decision-theoretic rough set model (NDTRS) under the framework of DTRS. Basic concepts of NDTRS are introduced. A positive region related attribute reduct and a minimum cost attribute reduct in the proposed model are defined and analyzed. Experimental results show that our methods can get a short reduct. Furthermore, a new neighborhood classifier based on three-way decisions is constructed and compared with other classifiers. Comparison experiments show that the proposed classifier can get a high accuracy and a low misclassification cost. © 2015 Published by Elsevier Inc.
22 23 24 25 26 27 28 29 30 31 32 33 34
35
35
36
36
37
37
38
38
39 40
39
1. Introduction
41 42 43 44 45 46 47 48 49 50 51 52 53 54
41
Pawlak rough set [32,33] is a very useful mathematical tool for knowledge representation, especially in describing the uncertainty of the data. In Pawlak rough set model, an object x will be classified into the category X if the equivalence class | X ∩[x]| of x: [x] is a subset of X , which means p ( X |[x]) = |[x]| = 1. Therefore, Pawlak rough set model can be seen as a qualitative model. However, two kinds of limitations exist in this model as follows. One is the probability p ( X |[x]) must be equal to 1, which is sensitive to noisy data. The other one is that the equivalence class [x] is defined based on the indiscernibility relation, which is not capable of dealing with numerical data directly. For the first limitation, decision-theoretic rough set model (DTRS) [44,45] introduces a generalized framework to solve it. It is well known that, without considering the tolerance of classification error, in Pawlak rough set model it is difficult to obtain an effective result from the data with noise. To overcome this problem, many researchers studied different probabilistic rough set (PRS) models [34,37,45,60]. Compared to Pawlak rough set model, all PRS models take the tolerance of classification error into account. p ( X |[x]) ≥ α is used to classify the object x into the category X , while α is a threshold between 0 and 1.
55
58 59 60 61
42 43 44 45 46 47 48 49 50 51 52 53 54 55
56 57
40
56
*
Corresponding author. E-mail addresses:
[email protected] (W. Li),
[email protected] (Z. Huang),
[email protected] (X. Jia),
[email protected] (X. Cai).
http://dx.doi.org/10.1016/j.ijar.2015.11.005 0888-613X/© 2015 Published by Elsevier Inc.
57 58 59 60 61
JID:IJA AID:7845 /FLA
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
[m3G; v1.168; Prn:13/11/2015; 9:41] P.2 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
As a generalized rough set model, DTRS provides a unified and comprehensive framework for interpreting and determining the required thresholds. Based on minimum Bayesian decision cost procedure, DTRS can compute the required thresholds from given cost functions. Different thresholds for different PRS models can be deduced from appropriate cost functions. Another main contribution of DTRS to rough set theory is the introducing of the notion of three-way decisions [48,49]. Compared to classical two-way decisions in classification or decision problems, three-way decisions bring in a deferment decision based on acceptance decision and rejection decision. In rough set theory, category X is described by three regions. An object x will be classified into one of these regions. Three different classification results can be interpreted by three-way decisions. Acceptance decision classifies x into the positive region of X , deferment decision classifies x into the boundary region of X , and rejection decision classifies x into the negative region of X . The advantage of DTRS is that it can deal with noisy data by considering the tolerance of classification error. Three-way decisions framework based on DTRS can convert some potential being misclassified objects into the boundary region for further examination, which means DTRS usually has a higher classification accuracy. However, the disadvantage of DTRS is that it cannot deal with numerical data directly, which is also another limitation of Pawlak rough set model. To overcome this problem, many studies usually adopt two kinds of methods in real applications. One is discretization [7], the numerical data is discretized before applying rough set models. Each discretized interval is seen as a nominal value of the attribute. The other one is defining equivalence class based on other relations instead of indiscernibility relation, such as distance functions [24], fuzzy binary relations [42,56], dissimilarity and similarity measures [40], which can be used to measure and represent the continuous values. Generally, all these relations can be understood as special classes of neighborhood systems [26,46]. In general, there exist many data which contain numerical values and noisy values simultaneously. In this regard, we propose an extended rough set model, i.e. neighborhood based decision-theoretic rough set model (NDTRS) to deal with this type of numerical data which are accompanied by noisy values. Besides introducing some basic concepts which are usually defined in rough set models, two kinds of attribute reducts are also defined. The first attribute reduct keeps the positive region of the decision table unchanged or extended. The second attribute reduct can minimize its induced decision cost. Heuristic approaches to computing positive region preservation based attribute reduct and minimum cost attribute reduct are designed. Compared to supervised or unsupervised discretization methods, the proposed heuristic approaches can get the shorter reducts. A three-way decisions based neighborhood classifier is also proposed for classification problem. Compared to the classical two-way decisions based neighborhood classifier and several classical classifiers including C4.5, k-NN and SVM, our proposed classifier can obtain a higher accuracy and a less misclassification cost on several data sets. We organize the paper as follows: Section 2 gives a briefly introduction about the related work. Section 3 introduces the framework of NDTRS. Section 4 defines two kinds of attribute reducts in NDTRS. Section 5 gives a three-way decisions based neighborhood classifier for classification problem. Section 6 presents the experimental results. Section 7 concludes this paper.
36 37 38
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
2. Related work
39 40
1
37 38 39
In this section, we will briefly introduce the related work on DTRS and neighborhood systems. In recent years, DTRS has attracted much attention [31]. On the extension of the model, the relationships between DTRS and other PRS models have been investigated by Yao [47]. Many other extended models have been proposed based on DTRS, including multi-view DTRS [58], multiple-category DTRS [28,29,57], game-theoretic rough set model [8], Naive Bayesian DTRS [52], multi-granulation DTRS [35] and triangular fuzzy DTRS [25]. Attribute reduction in DTRS is also thoroughly discussed by many researchers [14,15,23,50]. On the application of the model, DTRS has been successfully applied in many different areas, including software defect prediction [20], text classification problem [21], clustering problem [53,22], spam filtering problem [16,59], and government decision analysis [30]. It is worth mentioning that a fuzzy DTRS approach was proposed to deal with real-valued data with fuzzy relation information [56]. We do not compare their method in this paper because we do not have any fuzzy relation information as the domain knowledge on the data sets compared in our experiments. In the studies of neighborhood systems [9,10,18,39,43,54], using distance functions is a very common and useful method. Hu et al. [11] adopted three kinds of distance functions and proposed a neighborhood based rough set model, which is easy to understand and implement. On the extension of the model, Lin et al. [27] developed a neighborhood based multigranulation rough set in the framework of multigranulation rough sets. Wu and Zhang [43] investigated properties of neighborhood operator systems and rough set approximation operators. For attribute reduction and classification in neighborhood systems, Du et al. [6] discussed the attribute reduction and the rule learning based on a neighborhood covering space. Chen et al. [2] set up a connection between neighborhood-covering rough sets and evidence theory to establish a basic framework of numerical characterizations of attribute reduction. Hu et al. [10] defined a novel feature evaluation measure for feature selection in neighborhood rough set model. They also proposed large-margin nearest neighbor classifiers via sample weight learning [12]. (See Table 1.)
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.3 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1
3
1
Table 1 The cost function matrix for classification based on three-way decisions.
2 3
aP aB aN
6 7
3
X
Xc
4
λP P λB P λN P
λP N λB N λN N
5
4 5
2
6 7
8 9 10 11 12 13 14
8 9
3. Neighborhood based decision-theoretic rough set model
10 11
3.1. Decision-theoretic rough set model
12 13
In this section, we present some basic notions of DTRS first [48].
14
15 16
15
Definition 1. A decision table is a quad tuple:
17
S =< U , At = C ∪ D , { V a |a ∈ At}, { I a |a ∈ At} >,
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
16
where U is the universe, At is a finite nonempty set of attributes, C is a set of condition attributes, and D is a set of decision attributes. V a is a nonempty set of values of a ∈ At, and I a : U → V a is the mapping function. In rough set theory, an object x is usually represented by its equivalence class based on the equivalence relation: [x] A = { y ∈ U |∀a ∈ A ( I a (x) = I a ( y ))}. In DTRS, let = { X , X c } indicate an object x is in category X and not in X . The probability of x in X can be computed | X ∩[x]| as p ( X |[x]) = |[x]| , and the probability of x in X c is p ( X c |[x] = 1 − p ( X |[x])). Three kinds of action A = {a P , a B , a N } are defined to classify x into three regions of X or X c . Then the cost function denotes the cost for taking the action with the current state. The cost functions are summarized in the following matrix. In this matrix, λ∗ P (∗ ∈ { P , B , N }) denote the costs for taking action a P , a B or a N when the x is actual in X , and λ∗ N (∗ ∈ { P , B , N }) denote the costs for taking action a P , a B or a N when the x is actual in X c . Based on given cost functions and the conditional probability of x, the Bayesian costs for different decisions are defined as:
33
R P = R(a P |[x]) = λ P P · p ( X |[x]) + λ P N · p ( X |[x]),
35
R B = R(a B |[x]) = λ B P · p ( X |[x]) + λ B N · p ( X c |[x]),
36
R N = R(a N |[x]) = λ N P · p ( X |[x]) + λ N N · p ( X c |[x]).
37 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
20 21 22 23 24 25 26 27 28 29 30 31 32
35 36
(2)
37 38 39 40
(P) If R P ≤ R B and R P ≤ R N , decide x ∈ POS( X ); (B) If R B ≤ R P and R B ≤ R N , decide x ∈ BND( X ); (N) If R N ≤ R P and R N ≤ R B , decide x ∈ NEG( X ).
41 42 43
POS( X ) denotes the positive region of X , BND( X ) denotes the boundary region of X and NEG( X ) denotes the negative region of X . Consider a reasonable assumption that the costs of taking right actions are less than the costs of taking wrong actions, we have λ P P ≤ λ B P < λ N P and λ N N ≤ λ B N < λ P N . Since P ( X |[x]) + P ( X c |[x]) = 1, we have the following simplified decision rules:
44 45 46 47 48 49
(P) If p ( X |[x]) ≥ α and p ( X |[x]) ≥ γ , decide x ∈ POS( X ); (B) If p ( X |[x]) ≤ α and p ( X |[x]) ≥ β , decide x ∈ BND( X ); (N) If p ( X |[x]) ≤ β and p ( X |[x]) ≤ γ , decide x ∈ NEG( X ),
50 51 52 53
α , β , and γ can be computed from all cost functions: (λ P N − λ B N ) , (λ P N − λ B N ) + (λ B P − λ P P ) (λ B N − λ N N ) β= , (λ B N − λ N N ) + (λ N P − λ B P ) (λ P N − λ N N ) γ= . (λ P N − λ N N ) + (λ N P − λ P P )
19
34
Following minimum-cost decision rules are obtained:
where
18
33
c
34
38
17
(1)
54 55
α=
56 57 58 59 60
(3)
61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.4 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
By considering the following condition [48]
1 2
(λ B P − λ P P ) (λ N P − λ B P ) > , (λ B N − λ N N ) (λ P N − λ B N )
(4)
4
we have 0 ≤ β < γ < α ≤ 1. The final decision rules associated with the probability of x and the pair of thresholds (α , β) are obtained:
23 24 25
28 29 30 31 32
Let p = p ( X |[x]), we have the following Bayesian cost of different rules.
12
10 11 13 14
cost of positive rule: p · λ P P + (1 − p ) · λ P N ,
15
cost of boundary rule: p · λ B P + (1 − p ) · λ B N ,
16
(5)
17
Moreover, if we assume correct classification does not bring any cost, that is λ P P = λ N N = 0, we have the following simplified decision costs:
19
cost of negative rule: p · λ N P + (1 − p ) · λ N N .
37
40
23
cost of boundary rule: p · λ B P + (1 − p ) · λ B N ,
24
cost of negative rule: p · λ N P .
(6)
43 44
COST =
(1 − p i ) · λ P N +
( p j · λ B P + (1 − p j ) · λ B N ) +
β≤ p j ≤α
p i >α
27
pk · λ N P ,
(7)
pk <β
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
28 29 30 31
|[x ] ∩ D |
where p i = p ( D max ([xi ] A )|[xi ] A ) and D max ([xi ] A ) = arg max D j ∈π D { |[i xA ] | j }. i A
32 33
3.2. Neighborhood relation and neighborhood rough set model
34 35
Several neighborhood systems [26,46,11] were proposed to deal with numerical data. In this paper, we adopt the distance based neighborhood relation defined by Hu [11].
36 37 38
Definition 2. In a decision table S, for an object xi ∈ U and A ⊆ At, the neighborhood δ A (xi ) of xi in the subspace A is defined as
39 40 41
δ A (xi ) = {x j | x j ∈ U , A (xi , x j ) ≤ δ},
(8)
M ( xi , x j ) =
N
44
1 / M | I ak (xi ) − I ak (x j )|
M
45
.
(9)
k =1
46 47 48
xi and x j are two objects in N-dimensional space At = {a1 , a2 , . . . , a N }. The Minkowski distance is also called: (1) Manhattan distance 1 if M = 1; (2) Euclidean distance 2 if M = 2; (3) Chebyshev distance ∞ if M = ∞. Given a metric space < U , >, the granule system is composed of neighborhood granules {δ(xi )|xi ∈ U } and covers the universal space rather than partitions it. It is also noted that the partition of space generated by rough sets can be obtained from neighborhood rough sets with covering principle, which means δ = 0. The pair < U , N A > is called a neighborhood approximation space defined by the attribute set A. C is the cover of U with respect to neighborhood relation N. The family of neighborhood granules induced by the covering C are the basic blocks to construct the neighborhood rough set approximations. For a subset X ⊆ U , the lower and upper approximations of X with respect to C are defined as [26]:
49 50 51 52 53 54 55 56 57 58 59
N C ( X ) = {xi | δ(xi ) ⊆ X , xi ∈ U }, N C ( X ) = {xi | δ(xi ) ∩ X = ∅, xi ∈ U }.
42 43
where is a metric function and the Minkowski distance is widely applied:
45 46
25 26
For all objects in a decision table, the total decision cost can be expressed as:
41 42
20 22
38 39
18
21
cost of positive rule: (1 − p ) · λ P N ,
35 36
7 9
33 34
6 8
26 27
5
(P1) If p ( X |[x]) > α , decide x ∈ POS( X ); (B1) If β ≤ p ( X |[x]) ≤ α , decide x ∈ BND( X ); (N1) If p ( X |[x]) < β , decide x ∈ NEG( X ).
21 22
3
60
(10)
61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.5 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1 2
5
Based on the rough set approximations of X , the positive region, boundary region and negative region of X with respect to C are defined as:
3 4 5 6 7 8 9 10 11 12 13
4 5
BNDC ( X ) = N C ( X ) − N C ( X ),
6
NEGC ( X ) = U − POSC ( X ) ∪ BNDC ( X ) = U − N C ( X ).
(11)
18 19 20
Theorem 1. (See [11].) Given < U , N > and two nonnegative δ1 and δ2 , if δ1 ≤ δ2 , we have
9
(1) ∀xi ∈ U : δ1 (xi ) ⊆ δ2 (xi ); (2) ∀ X ⊆ U : N 2 ( X ) ⊆ N 1 ( X ), N 1 ( X ) ⊆ N 2 ( X ).
11
10 12 13 14
3.3. Basic notions of neighborhood based decision-theoretic rough set model
15 16
In DTRS and other PRS models, an object x is usually described by its equivalence class: [x]. In NDTRS, δ(x) is applied to |δ(x)∩ X | represent x, then we have p ( X |δ(x)) = |δ(x)| . The lower and upper approximations of X are introduced in the followings. Given < U , N >, for a subset X ⊆ U , we have the probabilistic lower and upper approximations of X with respect to the subspace B ⊆ At as following:
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
39 40 41 42 43 44 45 46
( X ) = {xi | p ( X |δ B (xi )) > α , xi ∈ U },
(α ,β)
( X ) = {xi | p ( X |δ B (xi ) ≥ β, xi ∈ U }.
NB
49 50 51 52 53 54 55 56 57
60 61
19 20
23
(12)
(α ,β)
(X) = N B
(α ,β)
(X) = N B
POS B BND B
(α ,β)
NEG B
(α ,β)
( X ),
(α ,β)
(X) − N B
26 27 28
(α ,β)
(α ,β)
( X ) = U − (POS B
( X ),
29
(α ,β)
( X ) ∪ BND B
(α ,β)
( X )) = U − N B
30
( X ).
(13)
In a decision table, π D = { D 1 , D 2 , . . . , D m } is a partition of U , which represents m decision classes. The three regions of the decision table based on partition π D in NDTRS can be defined as: (α ,β)
POS B
(π D ) =
(α ,β)
POS B
(α ,β)
(π D ) =
(α ,β)
(α ,β)
|POS B
|U |
34
38
( D i ),
39
(α ,β)
(π D ) ∪ BND B
40
(π D ).
(14)
From the semantic viewpoint, objects in the positive region can be “probably” classified into a “certain” decision class. Larger positive region usually be with smaller boundary region, which means less ambiguous or uncertain objects. Moreover, in a classification problem, less ambiguity implies high accuracy. To measure the classification ability quantitatively, the quality of classification in NDTRS is defined as follows
γ B(α ,β) ( D ) =
33
37
(α ,β)
BND B
(π D ) = U − POS B
(α ,β)
32
36
1≤i ≤m
NEG B
31
35
( D i ),
1≤i ≤m
BND B
24 25
(π D )|
41 42 43 44 45 46 47
.
(15)
DTRS, as a special case, can be derived from NDTRS by having δ = 0. δ(x) will be an equivalent relation when δ = 0. In addition, the neighborhood rough set model can also be derived from NDTRS when we have α = 1 and β = 0. Theorem 2. Given < U , N > and two nonnegative δ1 and δ2 , if δ1 ≤ δ2 with the same α and β , we have ∀xi ∈ U : δ1 (xi ) ⊆ δ2 (xi ). Proof. For any y ∈ δ1 (xi ), we have B (xi , y ) ≤ δ1 . Since δ1 ≤ δ2 , B (xi , y ) ≤ δ2 , we can conclude that y ∈ δ2 (xi ), therefore, δ1 (xi ) ⊆ δ2 (xi ). 2
58 59
18
22
For a subset X ⊆ U , the positive, boundary and negative regions of X with respect to B ⊆ At are defined as:
47 48
17
21
(α ,β)
NB
37 38
7 8
16 17
2 3
POSC ( X ) = N C ( X ),
14 15
1
48 49 50 51 52 53 54 55 56 57 58
In classical neighborhood rough set model, Theorem 2 has also been proved without considering the thresholds α and β [11], and the positive region of each set are determined by δ only. However, in NDTRS, the positive region and the boundary region of each set is determined by both δ and (α , β). If δ is fixed, we can also induce following theorems.
59 60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.6 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
6
1 2 3 4 5 6
Theorem 3. Given < U , N > and two nonnegative (α1 ,β)
POS B
α1 and α2 , if α1 ≤ α2 with the same δ , we have ∀ X ⊆ U : POS(Bα2 ,β) ( X ) ⊆
2
( X ).
3
(α2 ,β)
Proof. For any y ∈ POS B
( X ), we have p ( X |δ B ( y )) > α2 . Since α1 ≤ α2 , p ( X |δ B ( y )) > α1 , we can conclude that y ∈ (α ,β) (α ,β) (α ,β) POS B 1 ( X ), therefore, POS B 2 ( X ) ⊆ POS B 1 ( X ). 2
7 8 9 10
13 14
Theorem 4. Given < U , N > and two nonnegative β1 and β2 , if β1 ≤ β2 with the same δ , we have ∀ X ⊆ U : BND B (α ,β1 )
BND B
(X) ⊆
( X ).
17 18
( X ), we have p ( X |δ B ( y )) ≥ β2 . Since β1 ≤ β2 , p ( X |δ B ( y )) ≥ β1 , we can conclude that y ∈ (α ,β1 ) (α ,β2 ) (α ,β ) POS B ( X ), therefore, BND B ( X ) ⊆ BND B 1 ( X ). 2
23 24 25 26
29 30 31 32 33 34 35
38 39 40 41 42
47 48 49 50 51
54
57 58
A classical attribute reduct in Pawlak rough set model and other extension models is the positive region preservation based attribute reduct. We give the same definition in NDTRS as follows. Definition 3. In a decision table S =< U , C ∪ D , { V a }, { I a } >, given < U , N >, an attribute set R ⊆ C is a positive region preservation based attribute reduct with respect to D if it satisfies the following two conditions: (α ,β)
61
18
22 23 24 25 26
30 31 32 33 34 35 36
(α ,β)
(π D ) = POSC (π D ); (α ,β) (α ,β) (π D ). (2) for any attribute a ∈ R, POS R −{a} (π D ) = POSC (1) POS R
37 38 39
In Pawlak rough set model and classical neighborhood rough set model, there exists a monotonicity of the positive region of a decision table with respect to the set of condition attributes (also called decision monotonicity), that is:
B 1 ⊆ B 2 ⇒ POS B 1 (π D ) ⊆ POS B 2 (π D ) ⇒ γ B 1 (π D ) ≤ γ B 2 (π D ).
(16)
If the monotonicity holds in the decisions table S, the reduct obtained from Definition 3 will be a minimal attribute set that keeps the positive region unchanged. However, the monotonicity does not always hold in PRS models. As NDTRS is also (α ,β) (α ,β) a kind of PRS model, given B 1 ⊆ B 2 , we may have POS B 2 ( X ) ⊆ POS B 1 ( X ). For the condition (1) in Definition 3, there (α ,β)
(α ,β)
may exists a subset B ⊆ C with POSC ( X ) ⊂ POS B ( X ). For the condition (2), R − {a} cannot guarantee the reduct is a minimal result. (α ,β) Based on above analysis, a positive region extension based attribute reduct in NDTRS can be defined by using γ B (π D ).
40 41 42 43 44 45 46 47 48 49 50 51 52
Definition 4. In a decision table S =< U , C ∪ D , { V a }, { I a } >, given < U , N >, an attribute set R ⊆ C is a positive region extension based attribute reduct with respect to D if it satisfies the following two conditions:
53 54 55
γ R(α ,β) (π D ) ≥ γC(α ,β) (π D ); (α ,β) (α ,β) (π D ) < γ R (π D ). (2) for any subset R ⊆ R, γ R (1)
59 60
17
29
55 56
16
28
4.1. Positive region related attribute reduct
52 53
14
27
44 46
13
21
43 45
12
20
Attribute reduction plays an important role in rough set theory and machine learning field. There are two related issues should be considered in a new definition of attribute reduct. The first issue is the jointly sufficient condition, which means choosing appropriate criteria. Based on the attribute reduct, the values of given criteria will be preserved or improved. The second issue is the individually necessary condition, which guarantees the reduct is a minimal subset of attributes, and none of its subsets is a reduct. In this section, we define two kinds of attribute reducts in NDTRS based on different criteria.
36 37
9
19
4. Attribute reduction in neighborhood based decision-theoretic rough set model
27 28
8
15
Theorem 3 shows that the positive region monotonously decreases with larger α , and Theorem 4 shows that the boundary region monotonously decreases with larger β . These two theorems suggest that we can adjust the size of the positive region or the boundary region of X by modifying the values of (α , β).
21 22
6
11
(α ,β2 )
Proof. For any y ∈ BND B
19 20
5
10
15 16
4
7
(α ,β2 )
11 12
1
56 57 58 59
(α ,β)
In this definition, the quantitative criterion γ R (π D ) is applied. The condition (2) is also changed to examine all subsets of the reduct. Compared to Definition 3, Definition 4 can obtain the attribute reduct that induces a larger positive region.
60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.7 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1
7
4.2. Minimum cost attribute reduct
1
2 3 4 5 6 7
2
In PRS models, qualitative criteria are not suitable for defining attribute reduct, because the decision monotonicity does not always hold. Therefore, Jia et al. [15] suggested that minimum cost is a better criterion to define the attribute reduct in DTRS, because the Bayesian decision procedure helps to make decisions with minimum cost based on observed evidence. Similarly, we can also define the minimum cost attribute reduct in NDTRS. First, based on attribute set B ⊆ C , the cost formulation is rewritten as follows.
8 9 10 11 12
COST B =
+
13 15
x j ∈BND B
16 17 18 19 20 21 22
+
25
28
(α ,β)
xk ∈NEG B
12 13
(π D )
14
pk · λ N P ,
(17)
31 32
17
where xi is classified into the positive region while its conditional probability p i > α ; x j is classified into the boundary region while its conditional probability β ≤ p j ≤ α ; and xk is classified into the negative region while its conditional probability pk < β , respectively. As the decision monotonicity does not always hold in NDTRS, the decision cost may be decreased if we remove some attributes. To get less decision cost, we can define a reducing cost attribute reduct as follows.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
61
19 20 21 22 24 25 26
(1) COST R ≤ COSTC , (2) ∀ R ⊂ R, COST R > COST R .
27 28 29
In this definition, reducing cost attribute reduct is a subset of C and its induced decision cost will be reduced or unchanged. Furthermore, for most users in their decision procedures, it will be better to obtain the smallest cost if they could. Therefore, we can define a minimum cost attribute reduct as follows.
30 31 32 33
Definition 6. In a decision table S =< U , C ∪ { D }, { V a }, { I a } >, given < U , N >, R ⊆ C is called a minimum cost attribute reduct if it satisfies:
34
(1) R = arg min R ⊆C {COST R }, (2) ∀ R ⊂ R, COST R > COST R .
37
35 36 38 39
Based on conditions (1) and (2), we can conclude that the minimum cost attribute reduct is a minimal set of condition attributes with minimal decision cost.
40 41 42 43
4.3. Attribute reduction algorithms
44
As computing all reducts is an NP-hard problem [36], many heuristic algorithms for finding one reduct have been investigated in [5,13,19,55]. A heuristic algorithm usually contains two parts: heuristics and search strategy. For the heuristics in an attribute reduction heuristic algorithm, a fitness function which considers the criteria to be (α ,β) optimized is usually adopted. For the positive region preservation based attribute reduct in Definition 3, POS R (π D ) is the (α ,β)
heuristics. For the positive region extension based attribute reduct in Definition 4, γ R (π D ) is the heuristics. Similarly, for Definition 5 and Definition 6, COST R is the heuristics. With regard to the search strategies for designing a heuristic algorithm, two kinds of strategies are considered. One is directional and the other one is nondirectional search strategy. The directional search strategy can be further categorized into deletion method, addition method and addition–deletion method [51]. Nondirectional search strategy is usually applied in evolutionary algorithms, swarm algorithms and other population-based meta-heuristic algorithms for optimization problems. In this paper, we apply the addition–deletion method for the sake of simplicity. By applying the addition–deletion method, we need to further define the significance of each attribute to determine the order of condition attributes. In this paper, we use an inner significance which is defined as follows.
59 60
18
23
Definition 5. In a decision table S =< U , C ∪ { D }, { V a }, { I a } >, given < U , N >, R ⊆ C is a reducing cost attribute reduct if the following conditions are satisfied:
33 34
15 16
(π D )
29 30
7
11
( p j · λ B P + (1 − p j ) · λ B N )
26 27
6
10
23 24
5
9
(π D )
(α ,β)
14
4
8
(1 − p i ) · λ P N
(α ,β)
xi ∈POS B
3
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
Definition 7. In a decision table S =< U , C ∪ { D }, { V a }, { I a } >, given < U , N >, B ⊆ C and a ∈ B. The inner significance of a in B based on the fitness function f is defined as:
60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.8 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
8
Siginner (a, B , D ) =
1 2
| f B (π D ) − f B −{a} (π D )| . | f B (π D )|
(18)
3 4 5 6
(α ,β)
Siginner (a, C , D ) =
8 9
|POSC
(α ,β)
(π D )| − |POSC −{a} (π D )| (α ,β)
|POSC
(π D )|
(19)
17 18 19 20 21 22 23 24 25 26 27 28
31 32 33 34 35 36 37 38
12 13
Algorithm 1 A heuristic approach to positive region preservation based attribute reduct.
14
Input: A decision table. Output: A reduct R. BEGIN 1. R = ∅, G = C ; 2. rank all the attributes in G according to their significance values; 3. 3. 4. 5. 6. 7.
(α ,β)
15 16 17 18
(α ,β)
(π D ) < POSC (π D ) and G = ∅ WHILE POS R select the first attribute a ∈ G with the maximum significance value; G = G − {a}; R = R ∪ {a}; END WHILE FOR each r ∈ R (α ,β)
(α ,β)
8. IF POS R −{r } (π D ) == POS R 9. R = R − {r }; 10. END IF 11. END FOR 12. output R; END BEGIN
19 20 21 22 23
(π D )
24 25 26 27 28 29
Similar to the positive region preservation based attribute reduct, we can design a heuristic approach to the minimum cost attribute reduct as follows. First, the fitness function f is denoted as the decision cost: COST B , and the significance of condition attribute a is defined as:
Siginner (a, C , D ) =
COSTC −{a} − COSTC COSTC
.
(20)
Second, the heuristic approach to the minimum cost attribute reduct is described in Algorithm 2.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
58 59 60 61
30 31 32 33 34 35 36 37 38 39
Algorithm 2 A heuristic approach to minimum cost attribute reduct. Input: A decision table. Output: A reduct R. BEGIN 1. R = ∅, G = C ; 2. rank all the attributes in G according to their significance values; 3. WHILE COST R > COSTC and G = ∅ 4. select the first attribute a ∈ G with the maximum significance value; 5. G = G − {a}; 6. R = R ∪ {a}; 7. END WHILE 8. FOR each r ∈ R 9. IF COST R −{r } <= COST R 10. R = R − {r }; 11. END IF 12. END FOR 13. output R; END BEGIN
56 57
8
11
29 30
6
10
12
16
5
9
Now, a heuristic approach to the positive region preservation based attribute reduct is described in Algorithm 1.
11
15
4
7
.
10
14
2 3
In the heuristic approach for positive region preservation based attribute reducts, the fitness function f is denoted as (α ,β) the cardinality of POS B (π D ), and the significance of condition attribute a is defined as:
7
13
1
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
It should be noted that both reducts from above two approaches are approximate because they do not certainly satisfy the individually necessary condition. As mentioned above, the monotonicity of the criteria with respect to the set of inclusion of attributes may not hold in NDTRS, then it is not enough to guarantee the result is a reduct with a minimal set of condition attributes by checking the subset R − {r } only. Theoretically, it is still an NP-hard problem when checking all possible subsets of a candidate result. Fortunately, in most situations (shown in our experiments), the result is a reduct.
57 58 59 60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.9 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1
9
5. Three-way decisions based neighborhood classifier for NDTRS
1
2 3 4 5 6 7 8 9 10 11 12 13
2
In artificial intelligence, the purpose of machine learning is to generalize the “knowledge” from training data and predict the test objects. By considering the phase of generalization, machine learning algorithms are usually classified into two groups: one is eager learning and the other one is lazy learning. In eager learning methods, such as SVM, Naive Bayesian, ID3 and so on, the generalization is carried out beyond the training data before predicting the new objects. In contrast, the lazy learning methods which are also called instance based learning methods [1], simply store training data (or only minor processing) and defer processing until new objects coming. Compared to the single global hypothesis in eager learning, a richer hypothesis space is applied in lazy learning methods since they use many local information. k-NN is a typical lazy learning algorithm [4]. In the classical neighborhood rough set model, a neighborhood classifier (NEC) was also introduced to target the classification task [11]. As k-NN and NEC methods do not consider the decision cost, they are not suitable to apply to NDTRS directly. Therefore, based on the three-way decisions and NEC, we propose a three-way decisions based neighborhood classifier for NDTRS in this paper.
14 15
18 19 20 21
24 25 26 27 28 29 30
33 34 35 36 37 38 39
44 45 46 47 48 49 50 51 52 53 54
57 58
61
10 11 12 13
17 18 19 20 21
25 26 27 28 29 30 31
The basis of NEC is the general idea of estimating the class of an object from its neighbors. For the size of neighbors, the threshold δ is dynamically assigned based on the local and global information around s.
32 33 34
(21)
35
In this equation, xi (i = 1, . . . , n) is the set of training objects, min( (xi , s)) and max( (xi , s)) denote the minimal and the maximal values of distance between xi and the test object s, respectively. For the value range of ω , the authors [11] suggested that ω should take values in the range [0, 0.1].
37
δ = min( (xi , s)) + ω · (max( (xi , s)) − min( (xi , s))), ω ≤ 1.
36 38 39 40
5.2. Three-way decisions based neighborhood classifier (TDNEC)
41 42
NEC is a two-way decisions neighborhood classifier, as the class D j with the majority training objects in δ(s) is assigned to s. The advantage of two-way decisions classifiers is that it can predict the test object simply and quickly. However, this is often accompanied by higher prediction error rate and more decision cost. Assume an object is ambiguous when it has same or approximate probabilities of being in two or more classes, such as 51% for D i and 49% for D j . Applying the majority principle to predict an ambiguous object may lead to a wrong result with a high probability. In many real applications, one cannot make a decision immediately due to the lack of sufficient information. In this regard, deferment, a third choice, is usually applied. The three types of choices of acceptance, rejection and deferment, is called as three-way decisions. Three-way decisions is based on Bayesian decision theory which is the principle of minimization of decision cost. In a decision table with given cost functions, we can compute the thresholds α and β according to Equation (3). For a binary classification problem, assume D + is the positive class and D − is the negative class, then the three-way decisions can be described as follows.
43 44 45 46 47 48 49 50 51 52 53 54 55
(P) If p ( D + |δ B (x)) > α , x is classified into D + . (B) If β ≤ p ( D + |δ B (x)) ≤ α , x needs further examination. (N) If p ( D + |δ B (x)) < β , x is classified into D − .
59 60
9
24
Input: Training set: < U , C , D >; Test object: s; Threshold δ ; Specify the norm used. Output: Class of s. BEGIN 1. compute the distance between s and xi ∈ U with the used norm; 2. find the objects in the neighborhood δ(s) of s; 3. find the class D j with the majority training objects in δ(s); 4. assign D j to the test s; END BEGIN
55 56
8
23
Algorithm 3 Neighborhood classifiers (NEC) [11].
42 43
7
22
40 41
6
16
In decision problems, one usually makes a binary decision, which contains two choices: acceptance or rejection. We call this kind of decision as two-way decisions. By reviewing current researches on classification problems, most classifiers make two-way decisions. In the neighborhood rough set model, Hu et al. [11] also proposed a two-way decisions neighborhood classifier (NEC). As our proposed three-way decisions neighborhood classifier is an extension of NEC, we refer NEC and list it in Algorithm 3.
31 32
5
15
5.1. Two-way decisions and NEC
22 23
4
14
16 17
3
56 57 58 59
The acceptance decision classifies x into D + , the rejection decision classifies x into D − , and the deferment decision needs further examination to classify x, respectively.
60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.10 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
10
1
1
Table 2 Brief description of the data sets.
2 3 4 5 6 7 8 9 10 11 12
2
Data sets (abbreviation)
Objects
Condition attributes
Classes
glass hepatitis ionosphere iris wisconsin diagnostic breast cancer (wdbc) wisconsin prognostic breast cancer (wpbc) vertebral column (vertebral) image segmentation (image) sonar, mines vs. rocks (sonar) wine recognition (wine)
214 155 351 150 569 198 310 210 208 178
9 19 34 4 30 33 6 19 60 13
6 2 2 3 2 2 3 7 2 2
13 14 15 16 17
20 21
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
48 49 50
Based on three-way decisions and NEC, we will introduce a three-way decisions based neighborhood classifier in the following Algorithm 4. The purpose of the classifier is to reduce misclassification rate and decision cost. Algorithm 4 Three-way decisions based neighborhood classifier (TDNEC). Input: Training set: < U , C , D >; Test object: s; Parameter ω; Cost matrix {λi j }; Specify the norm used. Output: Class of s. BEGIN 1. compute the decision threshold α , β based on {λi j }; 2. FOR each x in U 3. compute the distance (x, s) between x and s with the used norm; 4. MIN = min( (x, s)); 5. MAX = max( (x, s)); 6. END FOR 7. δ(s) = MIN + ω · (MAX − MIN); 8. p = p ( D max (δ(s))|δ(s)); /* compute the probability of the dominant decision class of objects in δ(s).*/ 9. IF p > α 10. assign D max (δ(s))|δ(s) to test object s; 11. ELSE IF β ≤ p ≤ α 12. s is in the boundary region of D max (δ(s))|δ(s); 13. ELSE 14. s is rejected to classify into any class; 15. END IF END BEGIN
In the three-way decisions based neighborhood classifier, s could be rejected to classify into any class if its probability is less than β . This kind of procedure is also applied in learning with rejection methods [3] similarly. However, if the rejection procedure is not permitted in a real application, we can merge the boundary region and the negative region simply, which means β does not need to appear in the algorithm.
55 56 57 58 59 60 61
8 9 10 11 12 14 15 16 17 19 20 21 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 47 48 49 50 51
6. Experiments
53 54
7
46
51 52
6
22
|δ (x)∩ D | where D max (δ B (x)) ∈ π D is a dominant decision class of the objects in δ B (x), i.e., D max (δ B (x)) = arg max D i ∈π D { B|δ (x)| i }. B
46 47
5
18
(P) If p ( D max (δ B (x))|δ B (x)) > α , x is classified into D max (δ B (x)). (B) If β ≤ p ( D max (δ B (x))|δ B (x)) ≤ α , x needs further examination. (N) If p ( D max (δ B (x))|δ B (x)) < β , x is not classified into any class,
22 23
4
13
For a multi-classification problem, we can apply the one-vs.-all strategy to decompose the multi-classification problem into several groups of binary classification problems first, and then compose the obtained three-way decisions results. Besides, we can also deal with multi-classification problem directly. The dominant decision class could be regarded as the positive class, then the three-way decisions are defined as following:
18 19
3
52 53
In this section, we conduct experiments to show the efficiency of our attribute reduction algorithms and three-way decisions based neighborhood classifier. In our experiments, 10 UCI data sets [38] are used, with information shown in Table 2. For data sets glass, wdbc and wpbc, the corresponding id attribute is removed first. Experimental setting can be found in Table 3. Since 10-fold cross validation are employed, we only present average results in the experiments. 10 different groups of cost functions are also generated randomly for each data set, which means we will run 10 times 10-fold cross validation for each classification task. The values of δ and ω are set as suggested in Ref. [11], 0.25, 0.3 and 0.35 are tested for δ , several ω values between 0 and 0.1 are tested.
54 55 56 57 58 59 60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.11 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1 2 3 4
2 3
Parameter
Value
platform
Eclipse with WEKA (version 3.5) 10 groups, generated randomly Euclidean distance 2 {0.25, 0.3, 0.35} (0, 0.1] default values in WEKA 10-folds
{λi j }
6
distance
7
δ
9
1
Table 3 Experimental setting.
5
8
11
ω parameters for C4.5, k-NN, SVM cross validation
10
13 14 15
18 19 20 21 22 23 24 25 26 27 28
M1
M2
M3
glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine
1.6 ± 1.28 4.2 ± 2.75 2.1 ± 0.30 3.7 ± 0.46 5.0 ± 0.00 2.1 ± 0.70 6.0 ± 0.00 6.3 ± 1.00 6.1 ± 1.81 5.5 ± 0.50
7.0 ± 0.00 9.0 ± 0.00 9.2 ± 0.60 4.0 ± 0.00 8.6 ± 0.49 1.0 ± 0.00 5.5 ± 0.50 8.4 ± 0.70 12.8 ± 0.98 5.3 ± 0.78
8.0 ± 0.00 9.9 ± 0.83 5.0 ± 0.00 3.0 ± 0.00 7.9 ± 0.54 4.8 ± 0.40 6.0 ± 0.00 4.9 ± 0.30 6.7 ± 0.46 4.0 ± 0.00
average
4.3 ± 0.88
7.1 ± 0.41
6.0 ± 0.25
win/tie/loss
–
6/2/2
5/2/3
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
61
13 14 15 17 18 19 20 21 22 23 24 25 26 27 28
31 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
6.2. Experiments on classification
59 60
12
32
As the classical rough set mainly aims to handle categorical data, most attribute reduction approaches have to discretize the numerical data first. The preprocessing procedure increases the complexity of the reduction approach. However, the attribute reduction based on neighborhood rough set can deal with numerical data directly. To show the efficiency of our proposed definition of reducts and reduction approaches, two kinds of discretization methods are compared in our experiments: one is a kind of supervised discretization method and the other one is a kind of unsupervised discretization method. For the same numerical data, the comparing methods discretize it first, and then the same heuristic approach is applied to obtain the reduct. For simplicity’s sake, our method, the supervised discretization method [41] and the unsupervised method [41] are denoted as M1, M2 and M3, respectively. Both the positive region preservation based attribute reduct and the minimum cost attribute reduct are implemented. Since a reduct can be explained as a minimal subset of attributes which satisfies a specific condition [17], we usually obtain the length of the reduct to measure its minimality. As different definitions of reduct satisfy different conditions, there does not exist a unified measure to evaluate the completeness of reducts. In this paper, we assume the reduction procedure is the preprocessing of a classification task, then we compute the classification performances of several classifiers based on derived reduct to measure its completeness. Table 4 and Table 5 show the lengths of the derived reducts based on different methods. For both positive region preservation based attribute reduct and minimum cost attribute reduct, our method produces the shortest average result than other two methods, and the supervised discretization method obtains the longest result. We also test the classification accuracies of C4.5 based on the reducts derived from different methods. The results in the form of “mean ± standard deviation” are recorded in Table 6 and Table 7. The supervised discretization method can obtain the highest average accuracy and one reason is that it adopts most attributes. Our method is better than the unsupervised discretization method. From the above mentioned 4 tables, we can find that the results are very similar in most data sets when two different reduct definitions are used, which means the two kinds of definitions can be equally transferred based on specific δ and cost functions.
57 58
9
30
6.1. Experiments on attribute reduction
32 34
8
29
30
33
7
16
Data sets
29 31
6
11
Table 4 Comparison of lengths of positive region preservation based attribute reducts based on different methods. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 significance level are also recorded.
16 17
5
10
11 12
4
58 59
In this section, we compare three-way decisions based neighborhood classifier (TDNEC) with NEC, C4.5, k-NN, and SVM on several criteria including accuracy, F-measure and misclassification cost.
60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.12 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
12
1
1
Table 5 Comparison of lengths of minimum cost attribute reducts based on different methods. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 significance level are also recorded.
2 3 4
2 3 4
5
Data sets
M1
M2
M3
5
6
glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine
3.9 ± 1.14 4.2 ± 2.75 2.1 ± 0.30 3.7 ± 0.46 5.0 ± 0.00 2.1 ± 0.70 6.0 ± 0.00 6.3 ± 1.00 6.1 ± 1.81 5.5 ± 0.50
6.7 ± 0.64 9.0 ± 0.00 9.2 ± 0.60 2.5 ± 0.67 8.6 ± 0.49 1.0 ± 0.00 5.3 ± 0.46 7.5 ± 1.30 12.8 ± 0.98 5.3 ± 0.78
8.0 ± 0.00 9.9 ± 0.83 5.0 ± 0.00 3.0 ± 0.00 7.9 ± 0.54 4.8 ± 0.40 6.0 ± 0.00 4.9 ± 0.30 6.7 ± 0.46 4.0 ± 0.00
6
15
average
4.5 ± 0.87
6.8 ± 0.59
6.0 ± 0.25
15
16
win/tie/loss
–
6/1/3
5/2/3
16
7 8 9 10 11 12 13 14
7 8 9 10 11 12 13 14
17
17
18
18
19
19
Table 6 Comparison of classification accuracies of C4.5 based on different positive region preservation based attribute reducts. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 significance level are also recorded.
20 21 22
20 21 22
23
23
24 25 26 27 28 29 30 31 32 33 34 35
Data sets
M1
M2
M3
glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine
0.3463 ± 0.1246 0.7117 ± 0.1075 0.8345 ± 0.0786 0.8280 ± 0.0855 0.8907 ± 0.0283 0.6860 ± 0.0695 0.7442 ± 0.0789 0.5119 ± 0.1246 0.5463 ± 0.1223 0.7343 ± 0.1125
0.5727 ± 0.1442 0.7233 ± 0.0956 0.8483 ± 0.0785 0.9493 ± 0.0619 0.8784 ± 0.0540 0.7129 ± 0.0905 0.7755 ± 0.0262 0.5062 ± 0.1502 0.6774 ± 0.1241 0.8108 ± 0.1087
0.3208 ± 0.1113 0.7360 ± 0.0734 0.7989 ± 0.0850 0.7800 ± 0.1301 0.8201 ± 0.0660 0.7629 ± 0.0199 0.5768 ± 0.1017 0.2586 ± 0.1546 0.5370 ± 0.1770 0.4696 ± 0.1809
average
0.6834 ± 0.0932
0.7455 ± 0.0934
0.6061 ± 0.1100
win/tie/loss
–
0/7/3
3/6/1
24 25 26 27 28 29 30 31 32 33 34 35
36
36
37
37
Table 7 Comparison of classification accuracies of C4.5 based on different minimum cost attribute reducts. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 significance level are also recorded.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
39 40 41 42
Data sets
M1
M2
M3
glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine
0.4145 ± 0.1169 0.6960 ± 0.1051 0.8371 ± 0.0802 0.8313 ± 0.0721 0.8874 ± 0.0321 0.6861 ± 0.0783 0.7439 ± 0.0759 0.5110 ± 0.1197 0.5510 ± 0.1267 0.7182 ± 0.1294
0.5703 ± 0.1419 0.7288 ± 0.0900 0.8481 ± 0.0759 0.9560 ± 0.0518 0.8758 ± 0.0546 0.7134 ± 0.0891 0.7297 ± 0.0649 0.5229 ± 0.1267 0.6837 ± 0.1325 0.8074 ± 0.1093
0.3172 ± 0.1125 0.7277 ± 0.0771 0.7963 ± 0.0835 0.7640 ± 0.1287 0.8192 ± 0.0655 0.7629 ± 0.0199 0.5726 ± 0.1064 0.2629 ± 0.1510 0.5321 ± 0.1833 0.4764 ± 0.1857
average
0.6876 ± 0.0936
0.7436 ± 0.0937
0.6031 ± 0.1114
52
win/tie/loss
–
0/7/3
3/6/1
54
53 54
38
43 44 45 46 47 48 49 50 51 53
55 56 57 58 59 60 61
55
6.2.1. Comparison experiments on 10 data sets Assume n P P means the number of objects being classified correctly, n B P means the number of objects with deferment decisions, and n N P means the number of objects being classified incorrectly. The accuracy is defined as:
accuracy =
nP P n P P + nN P
56 57 58 59 60
.
(22)
61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.13 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1 2 3
13
Table 8 Comparison of accuracies of different classifiers. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 significance level are also recorded.
1 2 3
4
Data sets
TDNEC
NEC
C4.5
k-NN
SVM
4
5
glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine
0.7038 ± 0.0043 0.8062 ± 0.0259 0.8956 ± 0.0197 0.9675 ± 0.0088 0.9512 ± 0.0087 0.7995 ± 0.0493 0.8219 ± 0.0078 0.8961 ± 0.0462 0.8382 ± 0.0000 0.8422 ± 0.0003
0.6893 ± 0.0127 0.7155 ± 0.0103 0.8709 ± 0.0050 0.9660 ± 0.0080 0.9329 ± 0.0025 0.7641 ± 0.0024 0.7771 ± 0.0064 0.6967 ± 0.0098 0.8226 ± 0.0092 0.8360 ± 0.0121
0.6720 ± 0.0124 0.7742 ± 0.0243 0.9026 ± 0.0127 0.9487 ± 0.0055 0.9313 ± 0.0061 0.7470 ± 0.0201 0.8097 ± 0.0135 0.8795 ± 0.0123 0.7337 ± 0.0340 0.9371 ± 0.0102
0.6949 ± 0.0106 0.7974 ± 0.0097 0.8684 ± 0.0040 0.9527 ± 0.0049 0.9548 ± 0.0040 0.7172 ± 0.0101 0.7677 ± 0.0099 0.8662 ± 0.0085 0.8692 ± 0.0101 0.9511 ± 0.0038
0.5687 ± 0.0143 0.8639 ± 0.0083 0.8803 ± 0.0078 0.9620 ± 0.0055 0.9764 ± 0.0025 0.7657 ± 0.0112 0.7587 ± 0.0064 0.8824 ± 0.0060 0.7721 ± 0.0223 0.9899 ± 0.0024
5
average
0.8522 ± 0.0171
0.8071 ± 0.0078
0.8336 ± 0.0151
0.8440 ± 0.0076
0.8420 ± 0.0087
14
win/tie/loss
–
8/2/0
7/2/1
5/3/2
5/2/3
6 7 8 9 10 11 12 13 14 15
20
8 9 10 11 12 13
16
17 19
7
15
16 18
6
17
Table 9 The F-measures of different classifiers. The best results are highlighted by boldface. Comparison results (win/tie/loss) under pairwise two-tailed t-test with 0.05 significance level are also recorded.
18 19 20
21
Data sets
TDNEC
NEC
C4.5
k-NN
SVM
21
22
glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine
0.8169 ± 0.0019 0.7385 ± 0.0831 0.9038 ± 0.0264 0.9719 ± 0.0153 0.9384 ± 0.0235 0.8194 ± 0.0773 0.8648 ± 0.0014 0.6633 ± 0.0578 0.9039 ± 0.0000 0.8978 ± 0.0009
0.8160 ± 0.0089 0.8341 ± 0.0070 0.9310 ± 0.0029 0.9827 ± 0.0041 0.9653 ± 0.0013 0.8663 ± 0.0016 0.8746 ± 0.0040 0.8212 ± 0.0068 0.9026 ± 0.0055 0.9106 ± 0.0072
0.8037 ± 0.0089 0.8725 ± 0.0156 0.9487 ± 0.0071 0.9736 ± 0.0029 0.9644 ± 0.0032 0.8550 ± 0.0132 0.8948 ± 0.0083 0.9359 ± 0.0070 0.8460 ± 0.0229 0.9675 ± 0.0054
0.8199 ± 0.0074 0.8873 ± 0.0060 0.9295 ± 0.0023 0.9758 ± 0.0026 0.9769 ± 0.0021 0.8353 ± 0.0069 0.8686 ± 0.0063 0.9283 ± 0.0049 0.9300 ± 0.0058 0.9749 ± 0.0020
0.7250 ± 0.0116 0.9269 ± 0.0048 0.9363 ± 0.0044 0.9806 ± 0.0029 0.9881 ± 0.0013 0.8672 ± 0.0072 0.8628 ± 0.0041 0.9375 ± 0.0034 0.8712 ± 0.0142 0.9949 ± 0.0012
22
31
average
0.8519 ± 0.0288
0.8904 ± 0.0049
0.9062 ± 0.0095
0.9127 ± 0.0046
0.9091 ± 0.0055
32
win/tie/loss
–
0/3/7
2/3/5
0/4/6
2/3/5
23 24 25 26 27 28 29 30
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
24 25 26 27 28 29 30 31 32
33 34
23
33 34
The coverage is defined as:
coverage =
35
n P P + nN P n P P + nN P + n B P
.
(23)
As TDNEC is based on three-way decisions, some objects would be classified into the boundary region, therefore, the coverage value of TDNEC is usually less than 1. In most situations, there exists a kind of trade-off relation between the accuracy value and the coverage value, then F-measure is adopted to measure the performance of classifiers. Based on accuracy and coverage, the classical F-measure is defined as:
F =2·
accuracy · coverage accuracy + coverage
.
(24)
37 38 39 40 41 42 43 44
As λ N P denotes the cost for classifying an object into the negative region when it belongs to the positive region, and λ B P denotes the cost for classifying an object into the boundary region when it belongs to the positive region, we can define the misclassification cost as following:
cost = n N P · λ N P + n B P · λ B P .
36
45 46 47 48
(25)
Table 8 gives the comparison results of classification accuracy based on different classifiers. Table 9 gives the comparison results of F-measure based on different classifiers. Table 10 shows the comparison results of misclassification cost. It is worth noting that we do not implement the pairwise two-tailed t-test in Table 10 because the cost functions are generated randomly in each group and their distribution does not satisfy the normal assumption. From these tables, we can have the following observations:
55
49 50 51 52 53 54 55
56
• For classification accuracy, TDNEC is superior to other algorithms on most data sets. SVM gets the second-best result. As
56
57
TDNEC is a kind of three-way decisions method, all ambiguous objects will be deferred for further-examination, which leads to the result of a high accuracy. • TDNEC gets the lowest average F-measure. From the result, we can see that TDNEC can get an approximate result with other algorithms on most data sets except hepatitis and image. The reason of the low F-measure of TDNEC is that its coverage is usually less than 1. The values of coverage in other algorithms are always equal to 1.
57
58 59 60 61
58 59 60 61
JID:IJA AID:7845 /FLA
1 2 3
[m3G; v1.168; Prn:13/11/2015; 9:41] P.14 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
14
1
Table 10 The misclassification cost of different classifiers. The best results are highlighted by boldface.
2
Data sets
TDNEC
NEC
C4.5
k-NN
SVM
3
25.2325 ± 11.1476 10.5863 ± 5.9879 14.6892 ± 6.1744 2.4633 ± 0.9457 11.7323 ± 4.7330 20.5434 ± 7.7416 22.7879 ± 12.2406 11.4490 ± 7.9794 15.6047 ± 7.2257 12.4796 ± 7.8355
26.5346 ± 11.7793 15.8433 ± 7.2284 15.6843 ± 6.2866 2.6633 ± 1.0299 13.7977 ± 5.7555 21.5429 ± 7.8297 27.5139 ± 14.7655 29.5834 ± 12.0772 16.2253 ± 7.6764 13.8086 ± 9.0180
28.1160 ± 12.2956 12.9123 ± 6.2377 12.0792 ± 5.5903 3.5004 ± 1.6359 14.4259 ± 5.7277 22.6148 ± 6.9049 24.9459 ± 13.7711 11.6732 ± 4.8068 25.6541 ± 13.0811 5.1059 ± 3.6624
26.3724 ± 12.0564 11.5662 ± 5.3847 16.1878 ± 6.7594 3.0897 ± 1.0928 9.5654 ± 3.9699 25.5864 ± 8.9236 30.6939 ± 16.8902 12.9565 ± 5.3101 12.5437 ± 5.9366 3.8658 ± 2.5711
37.2347 ± 16.6630 7.7342 ± 3.4683 14.7266 ± 6.1755 2.5257 ± 1.0305 5.0550 ± 2.3537 21.4640 ± 8.7511 31.6467 ± 16.8751 11.4339 ± 4.7057 21.6677 ± 9.2152 0.7656 ± 0.5277
4
12
glass hepatitis ionosphere iris wdbc wpbc vertebral image sonar wine
13
average
14.7568 ± 7.2011
18.3197 ± 8.3447
16.1028 ± 7.3714
15.2428 ± 6.8895
15.4254 ± 6.9766
13
4 5 6 7 8 9 10 11
14
• For misclassification cost, TDNEC can get the best average result. From the theoretical analysis, we can say that this is
16
because TDNEC is based on Bayesian decision principle with the objective of minimizing the decision cost, even though the Bayesian decision cost is not exactly same as our defined misclassification cost in this paper.
17 18 19 21 22
25 26 27 28 29 30 31
34 35 36 37 38 39 40 41 42 43 44
49 50 51 52 53 54 55 56 57 58
11 12
15 16 17 18 20 21 22 24 25 26 27 28 29 30 31 33 34 35 36 37 38 39 40 41 42 43 44 46 47
We propose a neighborhood based decision-theoretic rough set model to deal with numerical data with noise in this paper. Based on the basic notions of NDTRS, two kinds of attribute reducts are defined: one is the positive region related attribute reduct and the other one is the minimum cost attribute reduct. The heuristic approach to attribute reductions is also introduced. Experimental results show that the proposed two reductions can achieve the shortest reduct with a competitive classification ability. The classification schema of NDTRS is also discussed in this paper. The proposed three-way decisions based neighborhood classifier can obtain a better classification accuracy and a less decision cost rather than other classifiers, especially the two-way decisions based neighborhood classifier. The main contribution of this paper is that the combined model NDTRS is a generalization of both neighborhood rough set models and decision-theoretic rough set models. To design faster reduction and classification algorithms for large scale data will be further investigated in further work.
59
48 49 50 51 52 53 54 55 56 57 58 59
60 61
10
45
7. Conclusion and further work
47 48
9
32
6.2.3. Comparison experiments on large data sets To examine the classification ability of TDNEC on large scale data sets, we also compare TDNEC with other 4 algorithms on two large data sets.1 One is Letter Recognition (letter for abbreviation) with 20 000 instances, 16 condition attributes and 26 classes, the other is isolet1+2+3+4 (isolet for abbreviation) with 6238 instances, 617 condition attributes and 26 classes. Since last experiment suggests that ω ∈ (0, 0.01] is a good choice for classification, we try ω from 0.001 to 0.01 with step 0.001 in this experiment. The accuracy and F-measure for these two data sets are shown in Fig. 3. For accuracy, Fig. 3(a) shows that TDNEC can get the highest accuracy on data set letter and Fig. 3(c) shows that SVM can get the highest accuracy and TDNEC gets the second best on data set isolet. For F-measure, Fig. 3(b) and Fig. 3(d) tell us that TDNEC usually cannot obtain the best value. Compared to the above results shown in Tables 8 and 9, TDNEC has a similar classification ability on large data sets. Although TDNEC plays well on both small and large data sets, designing efficient algorithm for neighborhood decision-theoretic rough set model on large data sets is very important and it will be investigated further in future work.
45 46
8
23
6.2.2. Comparison experiments on influence of parameter ω We also conduct a series of experiments to test the influence of parameter ω on classification efficiency of TDNEC. We try ω from 0.001 to 0.009 with step 0.001 in the first group and try ω from 0.01 to 0.1 with step 0.01 in the second group. The classification accuracy and F-measure based on 10-fold cross validation are recorded. Fig. 2(a) presents the classification accuracy curves varying with ω and Fig. 2(b) presents the F-measure curves varying with ω for all data sets. For classification accuracy, ω = 0.1 are optimal value for most data sets. For F-measure, we can find that there are similar trends in these curves. The main tread is that the values of F-measure decrease when the values of ω increase. TDNEC can obtain a better F-measure when ω ∈ (0, 0.01].
32 33
7
19
Fig. 1 shows the comparison results of TDNEC and NEC on accuracy, F-measure and misclassification cost. From these figures, we can see that TDNEC can obtain a higher accuracy and a lower misclassification cost on all data sets. NEC can obtain a higher F-measure on all data sets.
23 24
6
14
15
20
5
60 1
Both data sets are also coming from UCI [38].
61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.15 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
15
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
32
32
33
33
34
34
35
35
36
36
37
37
38
38
39
39
40 41
40
Fig. 1. TDNEC vs. NEC on accuracy, F-measure and misclassification cost for 10 data sets.
41
42
42
43
43
44
44
45
45
46
46
47
47
48
48
49
49
50
50
51
51
52
52
53
53
54
54
55
55
56
56
57
57
58
58
59
59
60 61
60
Fig. 2. Classification accuracy and F-measure varying with
ω for 10 data sets.
61
JID:IJA AID:7845 /FLA
16
[m3G; v1.168; Prn:13/11/2015; 9:41] P.16 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
32
32
33
33
34
34
35
35
36
36
37
37
38
38
Fig. 3. Classification accuracy and F-measure varying with
39
ω for 2 large data sets.
40 41
40
Acknowledgements
42 43 44
49 50 51 52 53 54 55 56 57 58 59 60 61
43 44 45
References
47 48
41 42
We would like to acknowledge the support for this work from the National Natural Science Foundation of China (Grant Nos. 61272083, 61403200), Natural Science Foundation of Jiangsu Province (Grant No. BK20140800).
45 46
39
46 47
[1] D.W. Aha, D. Kibler, M.K. Albert, Instance-based learning algorithms, Mach. Learn. 6 (1991) 37–66. [2] D.G. Chen, W.L. Li, X. Zhang, S. Kwong, Evidence-theory-based numerical algorithms of attribute reduction with neighborhood-covering rough sets, Int. J. Approx. Reason. 55 (3) (2014) 908–923. [3] C.K. Chow, On optimum recognition error and reject tradeoff, IEEE Trans. Inf. Theory 16 (1) (1970) 41–46. [4] T.M. Cover, P.E. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory 13 (1) (1967) 21–27. [5] J.H. Dai, Y.X. Li, Heuristic genetic algorithm for minimal reduction decision system based on rough set theory, in: Proceedings of ICMLC, 2002, pp. 4–6. [6] Y. Du, Q.H. Hu, P.F. Zhu, P.J. Ma, Rule learning for classification based on neighborhood covering reduction, Inf. Sci. 181 (24) (2011) 5457–5467. [7] U.M. Fayyad, K.B. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, in: Proceedings of the International Joint Conference on Uncertainty in AI, 1993, pp. 1022–1027. [8] J.P. Herbert, J.T. Yao, Game-theoretic rough sets, Fundam. Inform. 108 (3–4) (2011) 267–286. [9] Q.H. Hu, D.R. Yu, J.F. Liu, C.X. Wu, Neighborhood rough set based heterogeneous feature selection, Inf. Sci. 178 (2008) 3577–3594. [10] Q.H. Hu, W. Pedrycz, D.R. Yu, J. Lang, Selecting discrete and continuous features based on neighborhood decision error minimization, IEEE Trans. Syst. Man Cybern., Part B, Cybern. 40 (2010) 137–150. [11] Q.H. Hu, D.R. Yu, Z.X. Xie, Neighborhood classifiers, Expert Syst. Appl. 34 (2008) 866–876. [12] Q.H. Hu, P.F. Zhu, Y.B. Yang, D. Yu, Large-margin nearest neighbor classifiers via sample weight learning, Neurocomputing 74 (4) (2011) 656–660. [13] R. Jensen, Q. Shen, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE Trans. Knowl. Data Eng. 16 (12) (2004) 1457–1471.
48 49 50 51 52 53 54 55 56 57 58 59 60 61
JID:IJA AID:7845 /FLA
[m3G; v1.168; Prn:13/11/2015; 9:41] P.17 (1-17)
W. Li et al. / International Journal of Approximate Reasoning ••• (••••) •••–•••
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
17
[14] X.Y. Jia, W.W. Li, L. Shang, J.J. Chen, An optimization viewpoint of decision-theoretic rough set model, in: Proceedings of RSKT, 2011, in: LNCS, vol. 6954, 2011, pp. 457–465. [15] X.Y. Jia, W.H. Liao, Z.M. Tang, L. Shang, Minimum cost attribute reduction in decision-theoretic rough set models, Inf. Sci. 219 (2013) 151–167. [16] X.Y. Jia, L. Shang, Three-way decisions versus two-way decisions on filtering spam email, in: Transactions on Rough Sets XVIII, in: LNCS, vol. 8449, 2014, pp. 69–91. [17] X.Y. Jia, L. Shang, B. Zhou, Y.Y. Yao, Generalized attribute reduct in rough set theory, Knowl.-Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys. 2015.05.017. [18] W. Jin, A.K.H. Tung, J.W. Han, J. Wang, Ranking outliers using symmetric neighborhood relationship, in: Proceedings of PAKDD, 2006, pp. 577–593. [19] L.J. Ke, Z.R. Feng, Z.G. Ren, An efficient ant colony optimization approach to attribute reduction in rough set theory, Pattern Recognit. Lett. 29 (9) (2008) 1351–1357. [20] W.W. Li, Z.Q. Huang, Q. Li, Three-way decisions based software defect prediction, Knowl.-Based Syst. (2015), http://dx.doi.org/10.1016/j.knosys.2015. 09.035. [21] W. Li, D.Q. Miao, W.L. Wang, N. Zhang, Hierarchical rough decision theoretic framework for text classification, in: Proceedings of ICCI, 2010, pp. 484–489. [22] F. Li, M. Ye, X.D. Chen, An extension to rough c-means clustering based on decision-theoretic rough sets model, Int. J. Approx. Reason. 55 (1) (2014) 116–129. [23] H.X. Li, X.Z. Zhou, J.B. Zhao, D. Liu, Attribute reduction in decision-theoretic rough set model: a further investigation, in: Proceedings of RSKT, 2011, in: LNCS, vol. 6954, 2011, pp. 466–475. [24] J.Y. Liang, R. Li, Y.H. Qian, Distance: a more comprehensible perspective for measures in rough set theory, Knowl.-Based Syst. 27 (2012) 126–136. [25] D.C. Liang, D. Liu, W. Pedrycz, P. Hu, Triangular fuzzy decision-theoretic rough sets, Int. J. Approx. Reason. 54 (8) (2013) 1087–1106. [26] T.Y. Lin, Neighborhood systems and approximation in database and knowledge base systems, in: Proceedings of the Fourth International Symposium on Methodologies of Intelligent Systems, Poster Session, 1989, pp. 75–86. [27] G.P. Lin, Y.H. Qian, J.J. Li, NMGRS: neighborhood-based multigranulation rough sets, Int. J. Approx. Reason. 53 (7) (2012) 1080–1093. [28] P. Lingras, M. Chen, D.Q. Miao, Rough multi-category decision theoretic framework, in: Proceedings of RSKT, 2008, pp. 676–683. [29] D. Liu, T.R. Li, H.X. Li, A multiple-category classification approach with decision-theoretic rough sets, Fundam. Inform. 115 (2–3) (2012) 173–188. [30] D. Liu, T.R. Li, D.C. Liang, Three-way government decision analysis with decision-theoretic rough sets, Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 20 (1) (2012) 119–132. [31] D. Liu, H.X. Li, X.Z. Zhou, Two decades’ research on decision-theoretic rough sets, in: Proceedings of ICCI, 2010, pp. 968–973. [32] Z. Pawlak, Rough sets, Int. J. Comput. Inf. Sci. 11 (1982) 341–356. [33] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Dordrecht, MA, 1991. [34] Z. Pawlak, S.K.M. Wong, W. Ziarko, Rough sets: probabilistic versus deterministic approach, Int. J. Man-Mach. Stud. 29 (1988) 81–95. [35] Y.H. Qian, H. Zhang, Y.L. Sang, J.Y. Liang, Multigranulation decision-theoretic rough sets, Int. J. Approx. Reason. 55 (1) (2014) 225–237. [36] A. Skowron, C. Rauszer, The discernibility matrices and functions in information systems, in: Intelligent Decision Support, in: Theory and Decision Library, vol. 11, 1992, pp. 331–362. [37] D. Slezak, W. Ziarko, The investigation of the Bayesian rough set model, Int. J. Approx. Reason. 40 (2005) 81–91. [38] UC Irvine machine learning repository, http://archive.ics.uci.edu/ml/. [39] H. Wang, Nearest neighbors by neighborhood counting, IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 942–953. [40] J. Wang, A. Woznica, A. Kalousis, Learning neighborhoods for metric learning, in: Proceedings of ECML/PKDD, 2012, pp. 223–236. [41] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, ACM SIGKDD Explor. Newsl. 11 (1) (2009). [42] W.Z. Wu, J.S. Mi, W.X. Zhang, Generalized fuzzy rough sets, Inf. Sci. 151 (2003) 263–282. [43] W.Z. Wu, W.X. Zhang, Neighborhood operator systems and approximations, Inf. Sci. 144 (2002) 201–217. [44] Y.Y. Yao, S.KM. Wong, A decision theoretic framework for approximating concepts, Int. J. Man-Mach. Stud. 37 (6) (1992) 793–809. [45] Y.Y. Yao, S.K.M. Wong, P. Lingras, A decision-theoretic rough set model, in: Proceedings of the 5th International Symposium on Methodologies for Intelligent Systems, 1990, pp. 17–25. [46] Y.Y. Yao, Relational interpretations of neighborhood operators and rough set approximation operators, Inf. Sci. 111 (1) (1998) 239–259. [47] Y.Y. Yao, Probabilistic rough set approximations, Int. J. Approx. Reason. 49 (2008) 255–271. [48] Y.Y. Yao, Three-way decisions with probabilistic rough sets, Inf. Sci. 180 (2010) 341–353. [49] Y.Y. Yao, The superiority of three-way decisions in probabilistic rough set models, Inf. Sci. 181 (6) (2011) 1080–1096. [50] Y.Y. Yao, Y. Zhao, Attribute reductions in decision-theoretic rough set models, Inf. Sci. 178 (2008) 3356–3373. [51] Y.Y. Yao, Y. Zhao, J. Wang, On reduct construction algorithms, in: Proceedings of RSKT, 2006, pp. 297–304. [52] Y.Y. Yao, B. Zhou, Naive Bayesian rough sets, in: Proceedings of RSKT, 2010, in: LNAI, vol. 6401, 2010, pp. 719–726. [53] H. Yu, Q.F. Zhou, A cluster ensemble framework based on three-way decisions, in: Proceedings of RSKT, 2013, pp. 302–312. [54] J.B. Zhang, T.R. Li, D. Ruan, D. Liu, Neighborhood rough sets for dynamic data mining, Int. J. Intell. Syst. 27 (2012) 317–342. [55] W.X. Zhang, J.S. Mi, W.Z. Wu, Approaches to knowledge reductions in inconsistent systems, Int. J. Intell. Syst. 18 (2003) 989–1000. [56] X.R. Zhao, B.Q. Hu, Fuzzy and interval-valued fuzzy decision-theoretic rough set approaches based on fuzzy probability measure, Inf. Sci. 298 (2015) 534–554. [57] B. Zhou, Multi-class decision-theoretic rough sets, Int. J. Approx. Reason. 55 (1) (2014) 211–224. [58] X.Z. Zhou, H.X. Li, A multi-view decision model based on decision-theoretic rough set, in: Proceedings of RSKT2009, in: LNCS, vol. 5589, 2009, pp. 650–657. [59] B. Zhou, Y.Y. Yao, J.G. Luo, Cost-sensitive three-way email spam filtering, J. Intell. Inf. Syst. 42 (1) (2014) 19–45. [60] W. Ziarko, Variable precision rough set model, J. Comput. Syst. Sci. 46 (1993) 39–59.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
52
52
53
53
54
54
55
55
56
56
57
57
58
58
59
59
60
60
61
61