Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set

Knowledge-Based Systems xxx (2012) xxx–xxx Contents lists available at SciVerse ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier...

Download PDF

817KB Sizes 0 Downloads 92 Views

Report

PDF Reader
Full Text

Knowledge-Based Systems xxx (2012) xxx–xxx

Contents lists available at SciVerse ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set Shaoyong Li a, Tianrui Li a,⇑, Dun Liu b a b

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China School of Economics and Management, Southwest Jiaotong University, Chengdu 610031, China

a r t i c l e

i n f o

Article history: Received 14 February 2012 Received in revised form 3 November 2012 Accepted 7 November 2012 Available online xxxx Keywords: Rough sets Knowledge discovery Dominance relation Incremental updating Approximations

a b s t r a c t Dominance-based Rough Sets Approach (DRSA) is a generalized model of the classical Rough Sets Theory (RST) which may handle information with preference-ordered attribute domain. The attribute set in the information system may evolve over time. Approximations of DRSA used to induce decision rules need updating for knowledge discovery and other related tasks. We ﬁrstly introduce a kind of dominance matrix to calculate P-dominating sets and P-dominated sets in DRSA. Then we discuss the principles of updating P-dominating sets and P-dominated sets when some attributes are added into or deleted from the attribute set P. Furthermore, we propose incremental approaches and algorithms for updating approximations in DRSA. The proposed incremental approaches effectively reduce the computational time in comparison with the non-incremental approach are validated by experimental evaluations on different data sets from UCI. 2012 Elsevier B.V. All rights reserved.

1. Introduction Rough Sets Theory (RST) introduced by Pawlak in the early 1980s [1] is an effective mathematical tool to deal with uncertain and inconsistent information. In RST, lower and upper approximations were deﬁned to characterize a concept, namely, a subset of the universe. By lower and upper approximations, the knowledge concealed in information systems can be expressed in the form of rules. The main advantage of RST in data analysis is that it does not need any preliminary or additional information about data [2]. RST has been widely applied to many ﬁelds, e.g., pattern recognition [3], knowledge discovery [4,5] and decision making [6,7]. However, RST is not able to process information with preference-ordered attribute domain and decision classes [8,9]. For example, we consider two cars, A and B, and suppose that fuel consumption of A is less than that of B while the maximum speed of A is higher than that of B. It is reasonable to believe that A is better than B. Within RST, however, the two cars will be only divided into two different classes and no preference relation between them will be considered. To handle such information, Greco et al. proposed Dominance-based Rough Sets Approach (DRSA) by using a dominance relation to substitute the equivalence relation in RST [10– 12]. DRSA attracted much attention in recent years. For example, Kotlowski et al. found that the notions of rough approximations ⇑ Corresponding author. E-mail addresses: [email protected] (S. Li), [email protected] (T. Li), newton83@ 163.com (D. Liu).

of DRSA are excessively restrictive in real-life problems. They proposed a probabilistic model for ordinal classiﬁcation problems with monotonicity constraints [13]. Dembczynski et al. reformulated the dominance principle and proposed second-order rough approximations of DRSA to handle imprecise evaluations and assignments of objects [14]. Luo et al. proposed a limited dominance-based rough set model based on the assumption that the unknown value can only be compared with the maximal or minimal value in the domain of the corresponding attribute [15]. Inuiguchi et al. presented a Variable-Precision Dominance-based Rough Set Approach (VP-DRSA) for attribute reduction [16]. Yang et al. proposed the concept of a similarity dominance relation to conduct classiﬁcation analysis in the incomplete information system [17]. Then, they investigated DRSA in an incomplete interval-valued information system and employed a data complement method to transform the incomplete interval-valued information system to a traditional one [18]. They also proposed up arrow and down arrow ‘‘optimal credible rules’’ based on DRSA in the incomplete information system [19]. Typically, the collection of information is a dynamic process. The set of objects, the set of attributes and attribute values may vary with time in a dynamic information system. The update of approximations is vital to knowledge representation and reduction based on rough sets in dynamic information systems. The traditional rough sets methodologies update approximations with recomputing from scratch. However, they are inefﬁciency or even intractable in many real-world applications because the computational time is too long. Alternatively, an incremental updating

0950-7051/$ - see front matter 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.knosys.2012.11.002

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

2

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

scheme as a kind of feasible and effective method to maintain knowledge dynamically is often employed in handling dynamic information. It can avoid unnecessary computations by utilizing previous data structures and results. There are many literature on incremental approaches based on rough sets methodologies [20–35]. Since an information system is composed of the objects, the attributes and attribute values, these incremental approaches can be divided into three classes: incremental approaches under the variation of the object set [20–28], incremental approaches under the variation of the attribute set [30–33] and incremental approaches under the variation of attribute values [34,35]. Here, we brieﬂy overview some achievements under the variation of the attribute set as follows: Chan et al. proposed an incremental learning method for updating approximations in RST by considering adding or deleting one attribute [30]. RST is under the assumption that information systems are complete. However, missing data in information systems is common in many real-life applications. An information system with missing data is called as an Incomplete Information System (IIS). Under a characteristic relation, Li et al. proposed an incremental approach for updating approximations of a concept by considering adding or removing some attributes simultaneously in the IIS [31]. For values of attributes including both symbolic and real-valued in practical databases, Cheng proposed two incremental approaches for updating approximations in rough fuzzy set [32]. Furthermore, the incremental updating scheme was also applied in dynamic attribute reduction. Xu et al. proposed a dynamic attribute reduction algorithm in RST based on 0–1 integer programming when the multiple objects are added into an information system [36]. In the real-life applications, quantitative (e.g., from sensors) and qualitative (e.g., from manufacturing environment) data from diverse sources may be linked, thus signiﬁcantly increasing the number of attributes. The integration and fusion of diverse source data have been investigated in many research ﬁelds [37–39]. However, some obtained attributes may outdate over time. Thus, the information systems need to be updated dynamically by deleting outdated attributes or adding new available attributes. In knowledge discovery and data mining based on rough set methodologies, computation of approximations is a necessary step to induce rules. In order to dynamically maintain knowledge and effectively reduce computational time, many incremental approaches have been proposed for updating approximations based on different binary relations when the attribute set evolves over time [30–33]. As a kind of important tool for handling information with preference-ordered attributes, DRSA cannot handle dynamic data directly. To our best knowledge, there are no studies for incremental updating approximations of DRSA when the attribute set varies so far. The aim of this work is to propose an incremental approach for updating approximations of DRSA under the variation of the attribute set to satisfy the requirements of data mining based on DRSA and other related tasks. In this paper, we introduce a dominance matrix and re-write the deﬁnitions of P-dominating sets and P-dominated sets at ﬁrst. Then, we discuss the principles of incremental updating P-dominating sets and P-dominated sets, and investigate the strategies for updating approximations of DRSA incrementally when the set of attributes varies. With some numerical examples and experimental evaluations, the feasibility and the efﬁciency of our approach are validated. The remainder of this paper is organized as follows. Some basic notions of DRSA are introduced in Section 2. A dominance matrix is introduced and the deﬁnitions of P-dominating sets and P-dominated set are rewritten in Section 3. We discuss the principles for incrementally updating approximations of DRSA in Section 4. In Section 5, we show experimental evaluations of our incremental approach on UCI data sets [40]. This paper ends with conclusions and further research topics in Section 6.

2. Preliminaries In this section, we brieﬂy review some concepts, notations and properties of DRSA [10–13]. An information system is a 4-tuple S = (U, A, V, f), where U is a non-empty ﬁnite set of objects, called the universe. A = C [ {d}. C is a non-empty ﬁnite set of condition attributes, and d is a decision attribute. V is regarded as the domain of all attributes. f:U A ? V is an information function such that f(x, a) 2 Va, "a 2 A and x 2 U, where Va is the domain of attribute a. "a 2 C, there is a preference relation on the set of objects with respect to the attribute a, denoted by a = {(x, y) 2 U U:f(x, a) P f(y, a)}. "x, y 2 U, x a y means ‘‘x is at least as good as y with respect to the attribute a ‘‘. For a nonempty ﬁnite attribute set P, P # C, if x ay, "a 2 P, we say that x dominates y with respect to the set of attributes P, denoted by x DPy. DP is a dominance relation on the universe U with respect to the set of attributes P.

DP ¼ fðx; yÞ 2 U U : f ðx; aÞ P f ðy; aÞ; 8a 2 Pg Therefore, there are the following two sets: A set of objects dominating x, called P-dominating set, Dþ P ðxÞ ¼ fy 2 U : yDP xg; A set of objects dominated by x, called P-dominated set, D P ðxÞ ¼ fy 2 U : xDP yg. The universe U is divided by the decision attribute d into a family of equivalence classes with preference-ordered, called decision classes. Let Cl = {Cln, n 2 T} be a set of decision classes, T = {1, . . . , t}. "r, s 2 T such that r > s, the objects from Clrare preferred to the objects from Cls. In DRSA, the concepts to be approximated are an upward union and a downward union of classes such that

[

P

Cln ¼

n0 Pn

Cln0 ;

6

Cln ¼

[

Cln0 ;

8n; n0 2 T:

n0 6n

P

6

x 2 Cln means ‘‘x belongs to at least class Cln’’, and x 2 Cln means ‘‘x belongs to at most class Cln’’. P The lower and upper approximations of Cln are deﬁned respectively as follows.

P P P Cln ¼ x 2 U : DþP ðxÞ # Cln ; P P P Cln ¼ x 2 U : DP ðxÞ \ Cln – ; ; 6

The lower and upper approximations of Cln are deﬁned respectively as follows.

n o 6 6 P Cln ¼ x 2 U : DP ðxÞ # Cln ; o n 6 6 P Cln ¼ x 2 U : DþP ðxÞ \ Cln – ; : The upper and lower approximations satisfy the inclusion property.

P P P P Cln # Cln # P Cln ; n ¼ 2; . . . ; t 6 6 6 P Cln # Cln # P Cln ; n ¼ 1; . . . ; t 1 The upper and lower approximations of upward and downward unions of decision classes have an important complementarity property.

P P 6 6 P Cln ¼ U P Cln1 and P Cln ¼ U P Cln1 ; n ¼ 2; . . . ; t P P 6 6 P Cln ¼ U P Clnþ1 and P Cln ¼ U P Clnþ1 ; n ¼ 1; . . . ; t 1 P

6

The objects belonging to Cln and Cln with some ambiguity consti P P 6 tute the P-boundaries of Cln and Cln , denoted by BnP Cln and P 6 6 BnP Cln , respectively. BnP Cln and BnP Cln are represented in terms of upper and lower approximations as follows.

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

3

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

P

P P BnP Cln ¼ P Cln P Cln 6 6 6 BnP Cln ¼ P Cln P Cln

2

h

Ra ¼ rai;j

In [13], the deﬁnition of P-generalized decision was proposed to reﬂect an interval of decision classes to which an object may belong. However, the calculation of P-generalized decision is based on approximations of DRSA. To calculate approximations of DRSA, we revise the deﬁnition of P-generalized decision as follows. Deﬁnition 1. "x 2 U, dP(x) = hlP(x), uP(x)i iscalled as P-generalized decision of the object x, where lP ðxÞ ¼ min n 2 T : Dþ P ðxÞ \ Cln – ; and uP ðxÞ ¼ max n 2 T : DP ðxÞ \ Cln – ; .

(1) If

(3) If (4) If

..

.. .

.

3

r ajUj;jUj

7 7 5

ð6Þ

jUjjUj

6 6 ¼6 4

/P1;1

/P1;jUj

.. .

..

.. .

/PjUj;1

.

/PjUj;jUj

n o DþP ðxi Þ ¼ xj 2 U : /Pj;i ¼ jPj n o DP ðxi Þ ¼ xj 2 U : /Pi;j ¼ jPj

P P Cln ¼ fx 2 U : lP ðxÞ P ng P P Cln ¼ fx 2 U : uP ðxÞ P ng 6 P Cln ¼ fx 2 U : uP ðxÞ 6 ng 6 P Cln ¼ fx 2 U : lP ðxÞ 6 ng

3 7 7 7 5

ð7Þ

ð9Þ

From Table 1, the preference matrices of attributes a1, a2, a3 and a4 are as follows.

2

ð1Þ ð2Þ

ð8Þ

Example 1. Given an information system S = (U, C [ {d}, V, f) in Table 1, where U ¼ fx1 ; x2 ; . . . ; x10 g; C ¼ fa1 ; a2 ; a3 ; a4 g; V a1 ¼ V a2 ¼ V a3 ¼ V a4 ¼ V d ¼ f1; 2; 3g.

R a1

ð3Þ ð4Þ

3. A dominance matrix

1 0 1 1 1 1 0

61 6 6 61 6 60 6 6 60 ¼6 60 6 6 61 6 61 6 6 41

0 1 0

3

1 1 1 1 1 1 1 1 17 7 7 0 1 1 1 1 0 0 1 07 7 0 0 1 1 1 0 0 0 07 7 7 0 0 1 1 1 0 0 0 07 7; 0 0 1 1 1 0 0 0 07 7 7 1 1 1 1 1 1 1 1 17 7 1 1 1 1 1 1 1 1 17 7 7 0 1 1 1 1 0 0 1 05

1 1 1 1 1 1 1 1 1 1 3 1 1 0 1 1 1 0 1 1 1 61 1 0 1 1 1 0 1 1 17 7 6 7 6 61 1 1 1 1 1 1 1 1 17 7 6 61 1 0 1 1 1 0 1 1 17 7 6 7 6 60 0 0 0 1 0 0 0 0 07 7 ¼6 6 1 1 0 1 1 1 0 1 1 1 7; 7 6 7 6 61 1 1 1 1 1 1 1 1 17 7 6 61 1 0 1 1 1 0 1 1 17 7 6 7 6 41 1 0 1 1 1 0 1 1 15 2

In DRSA, "a 2 C, there is a preference relation a on the universe U. Let xi, xj 2 U, if (xi, xj) 2 a, then f(xi, a) P f(xj, a); if (xi, xj) R a, then f(xi, a) < f(xj, a). So we can use a characteristic variant r ai;j to indicate whether (xi, xj) belongs to a, where

R a2

1 : f ðxi ; aÞ P f ðxj ; aÞ 0

r a1;jUj

The deﬁnitions of P-dominating sets and P-dominated sets are rewritten as follows:

According to Proposition 1, lower and upper approximations of upward and downward unions of decision classes are redeﬁned as follows.

X a h Pi R ¼ /i;j a2P

Proof 1. Let Vd = {d1, d2, . . . , dt} be the domain of the decision attribute d. Suppose that d1 < d2 < < dt and dn 2 Vd. Since lP ðxÞ ¼ min n 2 T : Dþ f ðy; dÞ P dn ; P ðxÞ \ Cln – ; , if lP(x) P n, we have P þ 8y 2 DP ðxÞ. It follows that DþP ðxÞ # ClP ) x 2 P Cl n n . Similarly, (2)–(4) hold. h

rai;j ¼

2

Proposition 1. "x 2 U, n 2 T, the following items hold P lP(x) P n, then x 2 P Cln ; 6 lP(x) 6 n, then x 2 P Cln ; P uP(x) P n, then x 2 P Cln ; 6 uP(x) 6 n, then x 2 P Cln .

jUjjUj

ra1;1 6 . ¼6 4 .. r ajUj;1

where j j is the cardinality of a set. For P # C, if xiDPyj, then (xi, xj) 2 a, "a 2 P. That is to say, if P rai;j ¼ 1; 8a 2 P, then xiDPxj. Obviously, a variant /Pi;j ¼ a2P rai;j is able to indicate whether (xi, xj) belongs to DP such that if /Pi;j ¼ jPj, then (xi, xj) 2 DP; otherwise, (xi, xj) R DP. Then, a matrix RP is utilized to present the dominance relation on the universe U with respect to P.

RP ¼

(2) If

i

ð5Þ

: f ðxi ; aÞ < f ðxj ; aÞ

r ai;j ¼ 1 means (xi, xj) 2 a, and rai;j ¼ 0 means (xi, xj) R a. It follows that a matrix Ra is utilized to present the preference relation a on the universe U with respect to the attribute a.

1 1 0 1 1 1 0 1 1 1 2

Table 1 An information system. Object

a1

a2

a3

a4

d

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

2 3 2 1 1 1 3 3 2 3

2 2 3 2 1 2 3 2 2 2

1 1 1 3 2 2 1 2 3 3

3 2 1 1 3 1 2 2 1 3

1 2 2 1 1 2 3 2 3 3

Ra3

1 61 6 6 61 6 61 6 6 61 ¼6 61 6 6 61 6 61 6 6 41

3

1 1 0

0

0 1

0

0

0

1 1 0

0

0 1

0

0

1 1 0

0

0 1

0

0

07 7 7 07 7 17 7 7 07 7; 07 7 7 07 7 07 7 7 15

1 1 1 1 1 1 1 1 1 1 0

1 1 1 1 0

1 1 0 1 1 0

1 1 1 1 0 0 0 1 0 0

1 1 0

1 1 1 1 0

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

4

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

2

R a4

1 1 1 1 1 1 1 1 1 1

60 6 6 60 6 60 6 6 61 ¼6 60 6 6 60 6 60 6 6 40

Lemma 1. Let P C, Q C and P \ Q = ;. We have

3

1 1 1 0 1 1 1 1 07 7 7 0 1 1 0 1 0 0 1 07 7 0 1 1 0 1 0 0 1 07 7 7 1 1 1 1 1 1 1 1 17 7: 0 1 1 0 1 0 0 1 07 7 7 1 1 1 0 1 1 1 1 07 7 1 1 1 0 1 1 1 1 07 7 7 0 1 1 0 1 0 0 1 05

1 1 1 1 1 1 1 1 1 1

2 61 6 6 61 6 60 6 6 61 P R ¼6 60 6 6 61 6 61 6 6 41

n o P[Q ¼ jP [ Q j . Since Proof 2. Let xi, xj 2 U. Dþ P[Q ðxi Þ ¼ xj 2 U : /j;i P ¼ a2P[Q r aj;i ¼ P \ Q = ;, jP [ Qj = jPj + jQj. And since /P[Q j;i P P P Q P[Q a a ¼ jPj þ jQ j, then /Pj;i ¼ jPj and a2P r j;i þ a2Q r j;i ¼ /j;i þ /j;i , if /j;i n o P[Q ¼ jPj þ jQ j ¼ /Qj;i ¼ jQ j. It follows that Dþ P[Q ðxi Þ ¼ xj 2 U : /j;i n o n o þ xj 2 U : /Pj;i ¼ jPj \ xj 2 U : /Qj;i ¼ jQ j ¼ Dþ SimiP ðxi Þ \ DQ ðxi Þ. larly, (2) holds. h

Let P = {a1, a4}.

2

þ þ (1) Dþ P[Q ðxÞ ¼ DP ðxÞ \ DQ ðxÞ; (2) DP[Q ðxÞ ¼ DP ðxÞ \ D Q ðxÞ.

1 2 2 2 2 1 1 2 1

3

Example 2 (Continuation of Example 1). Let P = {a1, a4} and Q = {a2, a3}. According to Ra2 and Ra3 , the dominance matrix

2 2 2 1 2 2 2 2 17 7 7 0 2 2 1 2 0 0 2 07 7 0 1 2 1 2 0 0 1 07 7 7 1 1 2 2 2 1 1 1 17 7 0 1 2 1 2 0 0 1 07 7 7 2 2 2 1 2 2 2 2 17 7 2 2 2 1 2 2 2 2 17 7 7 0 2 2 1 2 0 0 2 05

2

RQ ¼ Ra2 þ Ra3

2 2 2 2 2 2 2 2 2 2 According to the dominance matrix RP, P-dominating sets and Pdominated sets are calculated as follows.

DþP ðx1 Þ ¼ fx1 ; x10 g; DþP ðx2 Þ DþP ðx3 Þ DþP ðx4 Þ

DP ðx1 Þ ¼ fx1 ; x3 ; x4 ; x5 ; x6 ; x9 g;

¼ fx2 ; x7 ; x8 ; x10 g;

DP ðx2 Þ

¼ fx2 ; x3 ; x4 ; x6 ; x7 ; x8 ; x9 g;

¼ fx1 ; x2 ; x3 ; x7 ; x8 ; x9 ; x10 g;

DP ðx3 Þ

¼ fx3 ; x4 ; x6 ; x9 g;

¼ U; DP ðx4 Þ ¼ fx4 ; x6 g;

DþP ðx5 Þ ¼ fx1 ; x5 ; x10 g;

DP ðx5 Þ ¼ fx4 ; x5 ; x6 g;

DþP ðx6 Þ ¼ U; DP ðx6 Þ ¼ fx4 ; x6 g;

¼

DP ðx8 Þ

¼ fx2 ; x3 ; x4 ; x6 ; x7 ; x8 ; x9 g;

DþP ðx9 Þ ¼ fx1 ; x2 ; x3 ; x7 ; x8 ; x9 ; x10 g; DþP ðx10 Þ ¼ fx10 g;

3 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 17 7 7 2 2 1 1 1 2 1 1 17 7 2 1 2 2 2 1 2 2 27 7 7 1 1 0 2 1 1 1 0 07 7 2 1 1 2 2 1 2 1 17 7 7 2 2 1 1 1 2 1 1 17 7 2 1 1 2 2 1 2 1 17 7 7 2 1 2 2 2 1 2 2 25 2 1 2 2 2 1 2 2 2

Q-dominating sets and Q-dominated sets are calculated as follows.

DþQ ðx1 Þ ¼ fx1 ; x2 ; x3 ; x4 ; x6 ; x7 ; x8 ; x9 ; x10 g;

DP ðx9 Þ ¼ fx3 ; x4 ; x6 ; x9 g;

DP ðx10 Þ ¼ U:

4. An incremental approach for updating approximations of DRSA The variation of the attribute set in a dynamic information system is composed of two aspects: some new attributes are added and some attributes are deleted. In the following, we discuss principles of updating approximations of DRSA on these two aspects, respectively. 4.1. Addition of some new attributes Suppose Q is a collection of attributes added into the attribute set P, P \ Q = ;. For x, y 2 U, if xDPy, then f(x, a) P f(y, a),"a 2 P. And if f(x, b) P f(y, b), "b 2 Q, then x DP[Qy, where DP[Q is the dominance relation with respect to the attribute set P [ Q. Then, Lemma 1 is proposed to calculate Dþ P[Q ðxÞ and DP[Q ðxÞ; 8x 2 U.

DQ ðx1 Þ ¼ fx1 ; x2 g;

DþQ ðx2 Þ ¼ fx1 ; x2 ; x3 ; x4 ; x6 ; x7 ; x8 ; x9 ; x10 g; DþQ ðx3 Þ DþQ ðx4 Þ

¼ ¼

DþQ ðx5 Þ ¼

DQ ðx2 Þ ¼ fx1 ; x2 g; fx3 ; x7 g; DQ ðx3 Þ ¼ fx1 ; x2 ; x3 ; x7 g; fx4 ; x9 ; x10 g; DQ ðx4 Þ ¼ fx1 ; x2 ; x4 ; x5 ; x6 ; x8 ; x9 ; x10 g; fx4 ; x5 ; x6 ; x8 ; x9 ; x10 g; DQ ðx5 Þ ¼ fx5 g;

DþQ ðx6 Þ ¼ fx4 ; x6 ; x8 ; x9 ; x10 g;

DþP ðx7 Þ ¼ DþP ðx8 Þ ¼ fx2 ; x7 ; x8 ; x10 g; DP ðx7 Þ

2 62 6 6 62 6 62 6 6 61 ¼6 62 6 6 62 6 62 6 6 42 2

DþQ ðx7 Þ DþQ ðx8 Þ

¼ fx2 ; x7 g;

DQ ðx7 Þ

DQ ðx6 Þ ¼ fx1 ; x2 ; x5 ; x6 ; x8 g;

¼ fx1 ; x2 ; x3 ; x7 g;

DQ ðx8 Þ ¼ fx1 ; x2 ; x5 ; x6 ; x8 g; DQ ðx9 Þ ¼ fx1 ; x2 ; x4 ; x5 ; x6 ; x8 ; x9 ; x10 g; DQ ðx10 Þ ¼ fx1 ; x2 ; x4 ; x5 ; x6 ; x8 ; x9 ; x10 g:

¼ fx4 ; x6 ; x8 ; x9 ; x10 g;

DþQ ðx9 Þ ¼ fx4 ; x9 ; x10 g; DþQ ðx10 Þ ¼ fx4 ; x9 ; x10 g;

Assume P0 = P [ Q. By Lemma 1, P0 -dominating sets and P0 -dominated sets are calculable as follows.

DþP0 ðx1 Þ ¼ DþP ðx1 Þ \ DþQ ðx1 Þ ¼ fx1 ; x10 g; DP0 ðx1 Þ ¼ DP ðx1 Þ \ DQ ðx1 Þ ¼ fx1 g; DþP0 ðx2 Þ ¼ DþP ðx2 Þ \ DþQ ðx2 Þ ¼ fx2 ; x7 ; x8 ; x10 g; DP0 ðx2 Þ ¼ DP ðx2 Þ \ DQ ðx2 Þ ¼ fx2 g; DþP0 ðx3 Þ ¼ DþP ðx3 Þ \ DþQ ðx3 Þ ¼ fx3 ; x7 g; DP0 ðx3 Þ ¼ DP ðx3 Þ \ DQ ðx3 Þ ¼ fx3 g; DþP0 ðx4 Þ ¼ DþP ðx4 Þ \ DþQ ðx4 Þ ¼ fx4 ; x9 ; x10 g; DP0 ðx4 Þ ¼ DP ðx4 Þ \ DQ ðx4 Þ ¼ fx4 ; x6 g; DþP0 ðx5 Þ ¼ DþP ðx5 Þ \ DþQ ðx5 Þ ¼ fx5 ; x10 g; DP0 ðx5 Þ ¼ DP ðx5 Þ \ DQ ðx5 Þ ¼ fx5 g;

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

5

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

DþP0 ðx6 Þ ¼ DþP ðx6 Þ \ DþQ ðx6 Þ ¼ fx4 ; x6 ; x8 ; x9 ; x10 g;

6

Proposition 3. Let P C, Q C, P \ Q = ; and Cln ; n ¼ 1; . . . ; t 1. We have

DP0 ðx6 Þ ¼ DP ðx6 Þ \ DQ ðx6 Þ ¼ fx6 g; DþP0 ðx7 Þ DP0 ðx7 Þ

¼

DþP ðx7 Þ

¼

DP ðx7 Þ

\

DþQ ðx7 Þ

¼ fx7 g;

\

DQ ðx7 Þ

¼ fx2 ; x3 ; x7 g;

6 6 (1) P [ Q Cln ¼ P Cln Xn ; 6 6 (2) P [ Q Cln ¼ P Cln [ Wn . n o n 6 6 where Xn ¼ o x 2 P Cln : lP[Q ðxÞ> n and Wn ¼ x 2 BnP Cln : uP[Q ðxÞ 6 n .

DþP0 ðx8 Þ ¼ DþP ðx8 Þ \ DþQ ðx8 Þ ¼ fx8 ; x10 g; DP0 ðx8 Þ ¼ DP ðx8 Þ \ DQ ðx8 Þ ¼ fx2 ; x6 ; x8 g;

Proof 4. It is similar to the proof of Proposition 2.

DþP0 ðx9 Þ ¼ DþP ðx9 Þ \ DþQ ðx9 Þ ¼ fx9 ; x10 g; DP0 ðx9 Þ ¼ DP ðx9 Þ \ DQ ðx9 Þ ¼ fx4 ; x6 ; x9 g;

Corollary 1. For each Yn ; Zn in Proposition 2 and Xn ; Wn in Proposition 3, n = 2, . . . , t, we have

DþP0 ðx10 Þ ¼ DþP ðx10 Þ \ DþQ ðx10 Þ ¼ fx10 g; DP0 ðx10 Þ ¼ DP ðx10 Þ \ DQ ðx10 Þ ¼ fx1 ; x2 ; x4 ; x5 ; x6 ; x8 ; x9 ; x10 g: þ Dþ P[Q ðxÞ # DP ðxÞ and DP[Q ðxÞ # DP ðxÞ hold based on Lemma 1. If downward and upward unions of decision classes remain unchanged, lower approximations of decision class unions may increase when some new attributes are added. Correspondingly, upper approximations of decision class unions may decrease. Then, approximations of decision class unions can be updated by the following Propositions 2 and 3 when some new attributes are added to an information system.

P

Proposition 2. Let P C, Q C, P \ Q = ;. For each Cln ; n ¼ 2; . . . ; t, we have P P (1) P [ Q Cln ¼ P Cln Yn ; P P (2) P [ Q Cln ¼ P Cln [ Zn . o P P where Yn ¼ fx 2 P Cln : uP[Q ðxÞ < n and Zn ¼ x 2 BnP Cln : lP[Q ðxÞ P ng. Proof 3 n

h

o P P (1) Since P [ Q Cln ¼ x 2 U : D fx 2 U : P[Q ðxÞ \ Cln – ; ¼ n P P P D ¼ P Cln x 2 P Cln : D P ðxÞ \ DQ ðxÞ \ Cln – ; P[Q ðxÞ\ P P P Cln ¼ ;g ¼ P Cln x 2 P Cln : uP[Q ðxÞ < n , let Yn ¼ P P fx 2 P Cln : uP[Q ðxÞ < ng, we have P [ Q Cln ¼ P P Cln Yn . o P n P ¼ fx 2 U : (2) We have P [ Q Cln ¼ x 2 U : Dþ P[Q ðxÞ # Cln o n o P P P P þ þ DP ðxÞ # Cln [ x 2 U P Cln : DP[Q ðxÞ # Cln ¼ P Cln [ n o P P P . Since U P Cln ¼ x 2 U P Cln : Dþ P[Q ðxÞ # Cln 6 6 6 6 6 P Cln1 ¼ BnP Cln1 [ P Cln1 ¼ P Cln1 P Cln1 [ P P P 6 6 P Cln1 ¼ U P Cln U þ P Cln [ P Cln1 ¼ BnP Cln n P 6 P # Cln g ¼ x 2 U P Cln : Dþ [P Cln1 , we have P[Q ðxÞ n o P 6 P x 2 BnP Cln [ P Cln1 : Dþ . Since D P[Q ðxÞ # Cln P[Q ðxÞ # 6 6 DP ðxÞ, we have x 2 P Cln1 ) x 2 P [ Q Cln1 ) x R n P P 6 P [ Q Cln . It follows that x 2 BnP Cln [ P Cln1 : o P P ¼ fx 2 BnP Cln : lP[Q ðxÞ P ng. Let Zn ¼ Dþ P[Q ðxÞ # Cln n o P P x 2 BnP Cln : lP[Q ðxÞ P n . So we have P [ Q Cln ¼ P P Cln [ Zn . h

Yn ¼ Wn1

ð10Þ

Zn ¼ Xn1

ð11Þ

P P 6 Proof 5. P [ Q Cln ¼ U P [ Q Cln1 () P Cln Yn ¼ P 6 6 U P Cln1 [ Wn1 () P Cln Yn ¼ U P Cln1 Wn1 () P P P Cln Yn ¼ P Cln Wn1 () Yn ¼ Wn1 : Similarly, Zn ¼ Xn1 holds. h Example 3 (Continuation of Example 2). P = {a1, a4} and Q = {a2, a3}. Lower and upper approximations of upward and downward unions of decision classes with respect to P are calculated as follows:

P P P Cl2 ¼ fx2 ; x7 ; x8 ; x10 g; P Cl2 ¼ U; P P P Cl3 ¼ fx10 g; P Cl3 ¼ fx1 ; x2 ; x3 ; x7 ; x8 ; x9 ; x10 g; 6 6 P Cl1 ¼ ;; P Cl1 ¼ fx1 ; x3 ; x4 ; x5 ; x6 ; x9 g; 6 6 P Cl2 ¼ fx4 ; x5 ; x6 g; P Cl2 ¼ fx1 ; x2 ; x3 ; x4 ; x5 ; x6 ; x7 ; x8 ; x9 g; P P 6 6 P Cl1 ¼ P Cl1 ¼ P Cl3 ¼ P Cl3 ¼ U: Let P0 = P [ Q. Lower and upper approximations of upward and downward unions of decision classes with respect to P0 are calculated as follows. P P 6 6 It is obvious that P0 Cl1 ¼ P0 Cl1 ¼ P0 Cl3 ¼ P0 Cl3 ¼ U. By Propositions 2 and 3, and Corollary 1, lower and upper approxiP P 6 6 mations of Cl2 ; Cl3 ; Cl1 , and Cl2 are updated as follows.

Y2 ¼ fx1 ; x5 g;

P P P0 Cl2 ¼ P Cl2 Y2

¼ fx2 ; x3 ; x4 ; x6 ; x7 ; x8 ; x9 ; x10 g; P P Y3 ¼ fx1 ; x2 ; x3 ; x10 g; P0 Cl3 ¼ P Cl3 Y3 ¼ fx7 ; x8 ; x9 g; 6 6 X1 ¼ fx3 ; x9 g; P0 Cl1 ¼ P Cl1 X1 ¼ fx1 ; x4 ; x5 ; x6 g; 6 6 X2 ¼ fx7 ; x9 g; P0 Cl2 ¼ P Cl2 X2 ¼ fx1 ; x2 ; x3 ; x4 ; x5 ; x6 ; x8 g; P P Z2 ¼ X1 ¼ fx3 ; x9 g; P0 Cl2 ¼ P Cl2 [ Z2 ¼ fx2 ; x3 ; x7 ; x8 ; x9 ; x10 g; P P P0 Cl3 ¼ P Cl3 [ Z3 ¼ fx7 ; x9 ; x10 g; 6 6 W1 ¼ Y2 ¼ fx1 ; x5 g; P0 Cl1 ¼ P Cl1 [ W1 ¼ fx1 ; x5 g; 6 6 W2 ¼ Y3 ¼ fx1 ; x2 ; x3 ; x10 g; P0 Cl2 ¼ P Cl2 [ W2

Z3 ¼ X2 ¼ fx7 ; x9 g;

¼ fx1 ; x2 ; x3 ; x4 ; x5 ; x6 ; x10 g: Based on above-mentioned Lemma 1, Propositions 2 and 3, and Corollary 1, an algorithm for updating approximations of DRSA is presented when some new attributes are added into the information system (See Algorithm 1).

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

6

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

Algorithm 1.

4.2. Deletion of some attributes Let Q be a collection of attributes deleted from the set of attributes P, P Q – ;. For x, y 2 U, if x DPy, then x DPQy, where DPQ is the dominance relation with respect to the set of attributes P Q. However, if (x, y) R DP, there may exist the case that (x, y) 2 DPQ. Thus, Lemma 2 is introduced to calculate Dþ PQ ðxÞ þ and D PQ ðxÞ by the modiﬁcation of DP ðxÞ and DP ðxÞ.

DþP ðx3 Þ ¼ fx3 ; x7 g; DP ðx3 Þ ¼ fx1 ; x3 g;

DþP ðx6 Þ ¼ fx4 ; x6 ; x8 ; x9 ; x10 g; A i ¼

and

DþP ðx7 Þ ¼ fx7 g;

n o 6. Since Dþ ðx Þ ¼ x 2 U : /PQ ¼ jP Q j ¼ xj 2 U : j;i o PQ i n j o þ þ PQ ¼ jPj jQ j ¼ D ðx Þ [ x 2 U D ðx Þ : / ¼ jPj jQ j , let /PQ i j i P P j;i j;i n o þ PQ ¼ jPj jQ j , we have Dþ Aþ PQ ðxi Þ ¼ i ¼ xj 2 U DP ðxi Þ : /j;i þ Dþ P ðxi Þ [ Ai . Similarly, (2) holds. h

DþP ðx9 Þ ¼ fx9 ; x10 g; DþP ðx10 Þ

¼ fx10 g;

DP ðx5 Þ ¼ fx5 g;

DP ðx6 Þ ¼ fx5 ; x6 g;

DP ðx7 Þ ¼ fx1 ; x2 ; x3 ; x7 g;

DþP ðx8 Þ ¼ fx8 ; x10 g;

Proof

DP ðx8 Þ ¼ fx1 ; x2 ; x5 ; x6 ; x8 g; DP ðx9 Þ ¼ fx1 ; x4 ; x5 ; x6 ; x9 g; DP ðx10 Þ

¼ fx1 ; x2 ; x4 ; x5 ; x6 ; x8 ; x9 ; x10 g:

Aþ i and Ai ; 8i 2 f1; . . . ; 10g can be obtained by the Lemma 2 as follows.

Aþ1 ¼ ;; A1 ¼ fx3 ; x4 ; x5 ; x6 ; x9 g;

Example 4 (Continuation of Example 1). Let P = {a1, a2, a3}, Q = {a2, a3}, and P0 = P Q. P0 -dominating sets and P0 -dominated sets are obtained by updating P-dominating sets and P-dominated sets on the universe U as follows. The dominance matrix RP with respect to the set of attributes P is

RP ¼ Ra1 þ Ra2 þ Ra3

DþP ðx2 Þ ¼ fx2 ; x7 ; x8 ; x10 g; DP ðx2 Þ ¼ fx1 ; x2 g;

DþP ðx5 Þ ¼ fx4 ; x5 ; x6 ; x8 ; x9 ; x10 g;

þ þ (1) Dþ PQ ðxi Þ ¼ DP ðxi Þ [ Ai ; (2) D ðx Þ ¼ D ðx Þ [ A i i PQ P i . n o þ where Aþ /PQ ¼ jPj jQ j n i ¼ xj 2 U DP ðxi Þ : o j;i PQ xj 2 U D ¼ jPj jQ j . P ðxi Þ : /i;j

3 63 6 63 6 62 6 61 6 ¼6 62 6 63 6 63 6 43 3

DþP ðx1 Þ ¼ fx1 ; x2 ; x3 ; x7 ; x8 ; x9 ; x10 g; DP ðx1 Þ ¼ fx1 g;

DþP ðx4 Þ ¼ fx4 ; x9 ; x10 g; DP ðx4 Þ ¼ fx4 ; x5 ; x6 g;

Lemma 2. Let Q P, P Q – ; and xi, xj 2 U. We have

2

P-dominating sets and P-dominated sets of all objects on the universe U are calculated according to the dominance matrix RP as follows.

2 3 2 2 1 2 3 3 2 3

2 2 3 1 1 1 3 2 2 2

2 2 2 3 1 2 2 2 3 3

2 2 2 3 3 3 2 3 3 3

2 2 2 3 2 3 2 3 3 3

1 2 2 1 1 1 3 2 1 2

1 2 1 2 1 2 2 3 2 3

2 2 2 2 0 1 2 2 3 3

3

1 27 7 17 7 27 7 07 7 7 17 7 27 7 27 7 25 3

Aþ2 ¼ ;; A2 ¼ fx3 ; x4 ; x5 ; x6 ; x7 ; x8 ; x9 ; x10 g; Aþ3 ¼ fx1 ; x2 ; x8 ; x9 ; x10 g; A3 ¼ fx4 ; x5 ; x6 ; x9 g; Aþ4 ¼ fx1 ; x2 ; x8 ; x9 ; x10 g; A4 ¼ fx4 ; x5 ; x6 ; x9 g; Aþ5 ¼ fx1 ; x2 ; x3 ; x7 g; A5 ¼ fx4 ; x6 g; Aþ6 ¼ fx1 ; x2 ; x3 ; x5 ; x7 g; A6 ¼ fx4 g; Aþ7 ¼ fx2 ; x8 ; x10 g; A7 ¼ fx4 ; x5 ; x6 ; x8 ; x9 ; x10 g; Aþ8 ¼ fx2 ; x7 g; A8 ¼ fx3 ; x4 ; x7 ; x9 ; x10 g; Aþ9 ¼ fx1 ; x2 ; x3 ; x7 ; x8 g; A9 ¼ fx3 g; Aþ10 ¼ fx2 ; x7 ; x8 g; A10 ¼ fx3 ; x7 g: P0 -dominating sets and P0 -dominated sets of all objects on the universe U are as follows.

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

DþP0 ðx1 Þ ¼ DþP0 ðx3 Þ ¼ DþP0 ðx9 Þ ¼ fx1 ; x2 ; x3 ; x7 ; x8 ; x9 ; x10 g; DP0 ðx1 Þ DþP0 ðx2 Þ

¼

DP0 ðx3 Þ

¼

DþP0 ðx7 Þ

¼

DP0 ðx9 Þ

¼ fx1 ; x3 ; x4 ; x5 ; x6 ; x9 g;

¼

DþP0 ðx8 Þ

¼ DþP0 ðx10 Þ ¼ fx2 ; x7 ; x8 ; x10 g;

DP0 ðx2 Þ ¼ DP0 ðx7 Þ ¼ DP0 ðx8 Þ ¼ DP0 ðx10 Þ ¼ U; DþP0 ðx4 Þ ¼ DþP0 ðx5 Þ ¼ DþP0 ðx6 Þ ¼ U; DP0 ðx4 Þ ¼ DP0 ðx5 Þ ¼ DP0 ðx6 Þ ¼ fx4 ; x5 ; x6 g: þ From Lemma 2, it is obvious that Dþ P ðxÞ # DPQ ðxÞ and If downward and upward unions of decision classes remain unchanged, lower approximations of decision class unions may decrease when some attributes are deleted from the information system. Correspondingly, upper approximations of decision class unions may increase. Propositions 4 and 5 are introduced in the following to calculate approximations of decision class unions after some attributes are removed. D P ðxÞ # DPQ ðxÞ.

P

Proposition 4. Let Q P. For each Cln ; n ¼ 2; . . . ; t, we have P P (1) P Q Cln ¼ P Cln Fn ; P P (2) P Q Cln ¼ P Cln [ Gn : o P wheren Fn ¼ fx 2 P Cln o: lPQ ðxÞ
and

Proof 7. P P P P Q Cln ¼ fx 2 U : Dþ (1) Since PQ ðxÞ # Cln g ¼ P Cln P P P P fx 2 P Cln : Dþ fx 2 P Cln : lP[Q ðxÞ PQ ðxÞ Cln g ¼ P Cln P < ng, let Fn ¼ fx 2 P Cln : lPQ ðxÞ < ng, then we have P P P Q Cln ¼ P Cln nFn . o P P P (2) Since P Q Cln ¼ x 2 U : D PQ ðxÞ \ Cln – ; ¼ P Cln [ P P P P Cln [ fx 2 U x 2 U P Cln : D PQ ðxÞ \ Cln – ;g ¼ n 6 6 P Cln1 : uPQ ðxÞ P ng, let Gn ¼ x 2 P Cln1 : uPQ ðxÞ P o P P n , then we have P Q Cln ¼ P Cln [ Gn . h

7

P

P Cl2 ¼ fx2 ; x3 ; x7 ; x8 ; x9 ; x10 g; P P Cl2 ¼ fx2 ; x3 ; x4 ; x6 ; x7 ; x8 ; x9 ; x10 g; P P P Cl3 ¼ fx7 ; x9 ; x10 g; P Cl3 ¼ fx7 ; x9 ; x10 g; 6 6 P Cl1 ¼ fx1 ; x5 g; P Cl1 ¼ fx1 ; x4 ; x5 ; x6 g; 6 P Cl2 ¼ fx1 ; x2 ; x3 ; x4 ; x5 ; x6 ; x8 g; 6 P Cl2 ¼ fx1 ; x2 ; x3 ; x4 ; x5 ; x6 ; x8 g; P P 6 6 P Cl1 ¼ P Cl1 ¼ P Cl3 ¼ P Cl3 ¼ U: Let P0 = P Q. Lower and upper approximations of unions of decision classes with respect to P0 are calculated as follows. P P 6 6 It is obvious that P 0 Cl1 ¼ P 0 Cl1 ¼ P 0 Cl3 ¼ P0 Cl3 ¼ U. P P 6 6 Lower and upper approximations of Cl2 ; Cl3 ; Cl1 , and Cl2 are calculable according to Propositions 4 and 5 and Corollary 2 as follows.

P P F2 ¼ fx3 ; x9 g; P0 Cl2 ¼ P Cl2 F2 ¼ fx2 ; x7 ; x8 ; x10 g; P P F3 ¼ fx7 ; x9 ; x10 g; P0 Cl3 ¼ P Cl3 F3 ¼ ;; 6 6 J1 ¼ fx1 ; x5 g; P0 Cl1 ¼ P Cl1 J1 ¼ ;; 6 6 J2 ¼ fx1 ; x3 ; x8 g; P0 Cl2 ¼ P Cl2 J2 ¼ fx4 ; x5 ; x6 g; P P G2 ¼ J1 ¼ fx1 ; x5 g; P0 Cl2 ¼ P Cl2 [ G2 ¼ U; P P G3 ¼ J2 ¼ fx1 ; x3 ; x8 g; P0 Cl3 ¼ P Cl3 [ G3 ¼ fx1 ; x2 ; x3 ; x7 ; x8 ; x9 ; x10 g; 6 6 K1 ¼ F2 ¼ fx3 ; x9 g; P0 Cl1 ¼ P Cl1 [ K1 ¼ fx1 ; x3 ; x4 ; x5 ; x6 ; x9 g;

6 6 K2 ¼ F3 ¼ fx7 ; x9 ; x10 g; P0 Cl2 ¼ P 0 Cl2 [ K2 ¼ U: According to above discussions, Algorithm 2 is developed for updating approximations of DRSA when some attributes are deleted from the information system. Algorithm 2.

6

Proposition 5. Let Q P. For each Cln ; n ¼ 1; . . . ; t 1, we have 6 6 (1) P Q Cln ¼ P Cln Jn ; 6 6 (2) P Q Cln ¼ P Cln [ Kn : n o 6 where Jn ¼ x 2 P Cln : uPQ ðxÞ > n lPQ ðxÞ 6 ng.

and

Algorithm 2: An incremental algorithm for updating approximations when some attributes are deleted from the information system

P Kn ¼ x 2 P Cln1 :

Proof 8. It is similar to the proof of Proposition 4. h Corollary 2. For each Fn ; Gn in Proposition 4 and Jn and Kn in Proposition 5, n = 2, . . . , t, there are

Fn ¼ Kn1

ð12Þ

Gn ¼ Jn1

ð13Þ

Proof 9. It is similar to the proof of Corollary 1. h Example 5 (Continuation of Example 1). Let P = {a1, a2, a3} and Q = {a2, a3}. Lower and upper approximations of unions of decision classes with respect to the set of attributes P are as follows.

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

8

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

Table 2 The basic information of data sets. Data Set

Number of objects

Number of attributes

Classiﬁcations

Sonar SPECTF Ionosphere Movementlibras Robot

208 267 351 360

60 45 34 90

2 2 2 15

463

90

15

Table 3 A comparison of the computational time between Algorithm 1 and the nonincremental algorithm when a new attribute is added. Data set

Number of attributes selected

Non-incremental algorithm(s)

Algorithm 1(s)

Sonar SPECTF Ionosphere Movementlibras Robot

15 14 7 20

0.1259 0.2001 0.2429 0.9242

0.0101 0.0159 0.0269 0.0380

20

0.9669

0.0505

5. Experimental evaluations Some experiments are conducted to evaluate the performance of the proposed incremental approaches. We select ﬁve data sets from the machine learning data repository in Table 2, University of California at Irvine [40] as benchmark databases for performance tests. The incremental algorithms and the non-incremental algorithm are coded by C++ and run on a personal computer with Windows 7 and Inter(R) E5800@ 3.2 GHz and 2.0 GB memory. The experiments are divided into two classes: experiments on updating approximations of DRSA when some attributes are added and experiments on updating approximations of DRSA when some attributes are deleted. 5.1. Addition of some new attributes For this case, we make three groups of experiments to evaluate Algorithm 1, comparing the computational time between Algorithm 1 and the non-incremental algorithm and exploring whether the cardinality of the attribute set inﬂuences the performance of Algorithm 1. In the ﬁrst group of experiments, some attributes are selected randomly from the condition attributes to constitute an experimental attribute set on each of the data sets in Table 2, and cardinalities of experimental attribute sets are listed in Table 3. According to the experimental attribute set, we compute approximations of DRSA with the non-incremental algorithm and save the

Table 4 A comparison of the computational time between Algorithm 1 and the nonincremental algorithm when four new attributes are added. Data set

Number of attributes selected

Non-incremental algorithm(s)

Algorithm 1(s)

Sonar SPECTF Ionosphere Movementlibras Robot

30 21 14 45

0.2200 0.2782 0.3696 1.3505

0.0328 0.0541 0.0920 0.1070

45

1.6699

0.1646

Table 5 A comparison of the computational time between Algorithm 1 and the nonincremental algorithm when eight new attributes are added. Data set

Number of attributes selected

Non-incremental algorithm(s)

Algorithm 1(s)

Sonar SPECTF Ionosphere Movementlibras Robot

43 30 21 72

0.3087 0.3891 0.5255 1.8227

0.0637 0.1041 0.1797 0.1975

72

2.4735

0.3156

results. Then we select randomly an attribute from the rest of condition attributes and add it into the experimental attribute set. Algorithm 1 and the non-incremental algorithm are employed to update approximations of DRSA, respectively. The computational time of two algorithms is shown in Table 3. For the experimental procedures, the remaining two groups of experiments are similar to the ﬁrst group of experiments. In the second group of experiments, cardinalities of experimental attribute sets are larger than that of the ﬁrst group of experiments (see Table 4). Four attributes are selected randomly from the rest of condition attributes and added into the experimental attribute set. The computational time of Algorithm 1 and the non-incremental algorithm is outlined in Table 4. In the third group of experiments, cardinalities of experimental attribute sets are larger than that of the second group of experiments (see Table 5). Eight attributes are selected randomly from the rest of condition attributes and added into the experimental attribute set. The computational time of Algorithm 1 and the non-incremental algorithm is listed in Table 5. From Tables 3–5, the computational time of Algorithm 1 and the non-incremental algorithm rises monotonically with the increasing number of added attributes and the increasing cardinality of the attribute set. Algorithm 1 is always faster than the nonincremental algorithm in these three groups of experiments. In addition, to make sure whether and how the cardinality of the experimental attribute set inﬂuences the performance of Algorithm 1 on different data sets when numbers of attributes added are the same, the speed-up ratios between Algorithm 1 and the non-incremental algorithm are calculated based on the computational time of the second group of experiments and are shown in Fig. 1. In Fig. 1, the line indicates the trend of speed-up ratio. The x-coordinate pertains to data sets (these ﬁve data sets starting from the one with the smallest cardinality of the experimental attribute set). The y-coordinate pertains to speed-up ratio. From Fig. 1, the trend of speed-up ratios increases when the cardinality of the experimental attribute set enlarges on different data sets. It means that the performance of Algorithm 1 rises with the increasing cardinality of the attribute set. For Robot and Movement-libras, cardinalities of experimental attribute sets are the same. However, for the number of the objects, Robot is larger than Movement-libras. Fig. 1 shows that the speed-up ratio of Movement-libras is higher than that of Robot. It implies that the size of a data set is another factor inﬂuencing the performance of Algorithm 1. 5.2. Deletion of some attributes We also make three groups of experiments on these ﬁve data sets in Table 2 to evaluate the performance of Algorithm 2. In the ﬁrst group of experiments, we select randomly some attributes from the condition attributes to constitute the experimental attribute set on each of data sets in Table 2. The cardinalities of experimental attribute sets are shown in Table 6. Based on the

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

9

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

20

14

Speed−up ratio

Speed−up ratio

12 10 8

15

10

5

6 4 Ionosphere

SPECTF

Sonar Data sets

Robot

Movement−libras

Fig. 1. Trend of speed-up ratios between Algorithm 1 to the non-incremental algorithm when four attributes are added.

Table 6 A comparison of the computational time between Algorithm 2 and the nonincremental algorithm when one attribute is deleted. Data set

Number of attributes selected

Non-incremental algorithm(s)

Algorithm 1(s)

Sonar SPECTF Ionosphere Movementlibras Robot

20 20 14 45

0.1413 0.2295 0.3075 1.2674

0.0121 0.0186 0.0310 0.0424

45

1.5590

0.0571

Table 7 A comparison of the computational time between Algorithm 2 and the nonincremental algorithm when four attributes are deleted. Data set

Number of attributes selected

Non-incremental algorithm(s)

Algorithm 1(s)

Sonar SPECTF Ionosphere Movementlibras Robot

32 30 21 70

0.1883 0.2866 0.3619 1.6150

0.0348 0.0562 0.0996 0.1116

70

2.1216

0.1711

Table 8 A comparison of the computational time between Algorithm 2 and the nonincremental algorithm when eight attributes are deleted. Data set

Number of attributes selected

Non-incremental algorithm(s)

Algorithm 1(s)

Sonar SPECTF Ionosphere Movementlibras Robot

50 40 30 86

0.2614 0.3387 0.4261 1.7984

0.0656 0.1079 0.1834 0.2027

86

2.4381

0.3243

experimental attribute set, we compute approximations of DRSA with the non-incremental algorithm and save the results. Then, we select randomly an attribute from the experimental attribute set and remove it. Algorithm 2 and the non-incremental algorithm are utilized to update approximations of DRSA. The computational time of two algorithms on each of these ﬁve data sets in Table 2 is shown in Table 6. The rest of experiments are similar to the ﬁrst group of experiments on the experimental procedure. In the second group of experiments, cardinalities of experimental attribute sets are larger than that of the ﬁrst group of experiments (see Table 7). Four attributes are deleted from the

0 Ionosphere

SPECTF

Sonar Data sets

Robot

Movement−libras

Fig. 2. Trend of speed-up ratios between Algorithm 2 to the non-incremental algorithm when four attributes are deleted.

experimental attribute set on each of these ﬁve data sets. The computational time of Algorithm 2 and the non-incremental algorithm is shown in Table 7. In the third group of experiments, cardinalities of experimental attribute sets are larger than that of the second group of experiments (see Table 8). We select randomly eight attributes from the experimental attribute set and remove them. The computational time of Algorithm 2 and the non-incremental algorithm on these ﬁve data sets is shown in Table 8. From Tables 6–8, the computational time of Algorithm 2 and the non-incremental algorithm rises monotonically with the increase of the cardinality of the experimental attribute set and the number of attributes removed. Algorithm 2 performs always faster than the non-incremental algorithm in these three groups of experiments. Moreover, the speed-up ratios between Algorithm 2 and the non-incremental algorithm are calculated based on the computational time of the second group of experiments and shown in Fig. 2. From Fig. 2, the trend of speed-up ratios goes up with the increasing cardinality of the experimental attribute set on different data sets. It means that the performance of Algorithm 2 is closely related to the cardinality of the attribute set. The larger cardinality of the attribute set, the higher performance of Algorithm 2 is. For Robot and Movement-libras, cardinalities of the experimental attribute set are the same. However, for the number of the objects, Robot is larger than Movement-libras. Fig. 2 shows that the speedup ratio of Movement-libras is higher than that of Robot. It implies that the size of a data size is another factor inﬂuencing the performance of Algorithm 2.

6. Conclusions and future work Incremental updating approximations of rough sets is important to decision-making and data mining in dynamic data environment. Keeping the decision classes unchanged, we analyzed the reasons that cause the variations of approximations in DRSA when some attributes were added into or deleted from an information system in this paper. We realized to update incremental approximations of DRSA without recomputing from scratch when the set of attributes varies. With some numerical examples and the experimental results, we can obtain two conclusions as follows: (1) Incrementally updating approximations of DRSA is feasible and can effectively reduce the computational time. (2) The cardinality of the attribute set and the size of data set inﬂuence the performance of the proposed incremental approaches. To handle massive data, we need to further improve our approach in order to reduce the computational time. Our future research work will focus on the development of algorithms for incrementally updating approximations on massive data sets.

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

10

S. Li et al. / Knowledge-Based Systems xxx (2012) xxx–xxx

Acknowledgments This work is supported by the National Science Foundation of China (Nos. 60873108, 61175047, 61100117) and NSAF (No. U1230117), the Youth Social Science Foundation of the Chinese Education Commission (10YJCZH117 and 11YJC630127), the Fundamental Research Funds for the Central Universities (No. SWJTU11ZT08) and the Research Fund of Traction Power State Key Laboratory, Southwest Jiaotong University (No. 2012TPL_T15) References [1] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11 (1982) 341–356. [2] Z. Pawlak, A. Skowron, Rough sets: some extensions, Information Sciences 177 (2007) 28–40. [3] Y. Qian, J. Liang, W. Pedrycz, An efﬁcient accelerator for attribute reduction from incomplete data in rough set framework, Pattern Recognition 44 (2011) 1658–1670. [4] X. Chen, W. Ziarko, Experiments with rough set approach to face recognition, International Journal of Intelligent Systems 26 (2011) 3–21. [5] J. Blaszczynski, S. Greco, R. Slowinski, Inductive discovery of laws using monotonic rules, Engineering Applications of Artiﬁcial Intelligence 25 (2012) 284–294. [6] A. Abbas, J. Liu, Designing an intelligent recommender system using partial credit model and bayesian rough set, International Arab Journal of Information Technology 9 (2012) 179–187. [7] B. Chang, H. Hung, A study of using RST to create the supplier selection model and decision-making rules, Expert Systems with Applications 37 (2010) 8284– 8295. [8] Z. Pawlak, Rough set approach to knowledge-based decision support, European Journal of Operational Research 99 (1997) 48–57. [9] Z. Pawlak, R. Slowinski, Rough set approach to multi-attribute decision analysis, European Journal of Operational Research 72 (1994) 443–459. [10] S. Greco, B. Matarazzo, R. Slowinski, Rough approximation by dominance relations, International Journal of Intelligent Systems 17 (2002) 153–171. [11] S. Greco, B. Matarazzo, R. Slowinski, Rough sets theory for multicriteria decision analysis, European Journal of Operational Research 129 (2001) 1– 47. [12] R. Slowinski, S. Greco, B. Matarazzo, Axiomatization of utility, outranking and decision rule preference models for multiple-criteria classiﬁcation problems under partial inconsistency with the dominance principle, Control and Cybernetics 31 (2002) 1005–1035. [13] W. Kotlowski, J. Blaszczynski, S. Greco, R. Slowinski, Stochastic dominancebased rough set model for ordinal classiﬁcation, Information Sciences 178 (2008) 4019–4037. [14] K. Dembczynski, S. Greco, R. Slowinski, Rough set approach to multiple criteria classiﬁcation with imprecise evaluations and assignments, European Journal of Operational Research 198 (2009) 626–636. [15] G. Luo, X. Yang, Limited dominance-based rough set model and knowledge reductions in incomplete decision system, Journal of Information Science and Engineering 26 (2010) 2199–2211. [16] M. Inuiguchi, Y. Yoshioka, Y. Kusunoki, Variable-precision dominance-based rough set approach and attribute reduction, International Journal of Approximate Reasoning 50 (2009) 1199–1214. [17] X. Yang, J. Yang, C. Wu, D. Yu, Dominance-based rough set approach and knowledge reductions in incomplete ordered information system, Information Sciences 178 (2008) 1219–1234.

[18] X. Yang, D. Yu, J. Yang, L. Wei, Dominance-based rough set approach to incomplete interval-valued information system, Data & Knowledge Engineering 68 (2009) 1331–1347. [19] X. Yang, J. Xie, X. Song, J. Yang, Credible rules in incomplete decision system based on descriptors, Knowledge-Based Systems 22 (2009) 8–17. [20] L. Shan, W. Ziarko, Data-based acquisition and incremental modiﬁcation of classiﬁcation rules, Computational Intelligence 11 (1995) 357–370. [21] W. Bang, Z. Bien, New incremental learning algorithm in the framework of rough set theory, International Journal of Fuzzy Systems 1 (1999) 25–36. [22] L. Tong, L. An, Incremental learning of decision rules based on rough set theory, in: Proceedings of the World Congress on Intelligent Control and Automation (WCICA2002), 2002, pp. 420–425. [23] J. Blaszczynski, R. Slowinski, Incremental induction of decision rules from dominance-base rough approximations, Electronic Notes in Theoretical Computer Science 82 (2003) 40–45. [24] S. Greco, R. Slowinski, J. Stefanowski, M. Zurawski, Incremental versus nonincremental rule induction for multicriteria classiﬁcation, Transaction on Rough Sets II (2004) 33–53. [25] F. Hu, G. Wang, H. Huang, Y. Wu, Incremental attribute reduction based on elementary sets, in: RSFDGrC2005, No. 3641, LNAI, 2005, pp. 185–193. [26] D. Liu, T. Li, D. Ruan, W. Zou, An incremental approach for inducing knowledge from dynamic information systems, Fundamenta Informaticae 94 (2009) 245– 260. [27] D. Liu, T. Li, D. Ruan, J. Zhang, Incremental learning optimization on knowledge discovery in dynamic business intelligent systems, Journal of Global Optimization 51 (2011) 325–344. [28] X. Jia, L. Shang, J. Chen, Incremental versus non-incremental: data and algorithms based on ordering relations, International Journal of Computational Intelligence Systems 4 (2011) 112–122. [29] J. Zhang, T. Li, D. Ruan, Neighborhood rough sets for dynamic data mining, International Journal of Intelligent Systems 27 (2012) 317–342. [30] C. Chan, A rough set approach to attribute generalization in data mining, Information Sciences 107 (1998) 177–194. [31] T. Li, D. Ruan, W. Geert, J. Song, Y. Xu, A rough sets based characteristic relation approach for dynamic attribute generalization in data mining, KnowledgeBased Systems 20 (2007) 485–494. [32] Y. Cheng, The incremental method for fast computing the rough fuzzy approximations, Data & Knowledge Engineering 70 (2011) 84–100. [33] J. Zhang, T. Li, D. Ruan, Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems, International Journal of Approximate Reasoning 53 (2012) 620–635. [34] H. Chen, T. Li, S. Qiao, D. Ruan, A rough set based dynamic maintenance approach for approximations in coarsening and reﬁning attribute values, International Journal of Intelligent Systems 25 (2010) 1005–1026. [35] H. Chen, T. Li, D. Ruan, Maintenance of approximations in incomplete ordered decision systems while attribute values coarsening or reﬁning, KnowledgeBased Systems 31 (2012) 140–161. [36] Y. Xu, L. Wang, R. Zhang, A dynamic attribute reduction algorithm based on 0– 1 integer programming, Knowledge-Based Systems 24 (2011) 1341–1347. [37] M. Heckera, S. Lambecka, S. Toepferb, E. van Somerenc, R. Guthkea, Gene regulatory network inference: data integration in dynamic models – a review, BioSystems 96 (2009) 86–103. [38] N. Ravindran, Y. Liang, X. Liang, A labeled-tree approach to semantic and structural data interoperability applied in hydrology domain, Information Sciences 180 (2010) 5008–5028. [39] B. Khaleghi, A. Khamis, F.O. Karray, S.N. Razavi, Multisensor data fusion: a review of the state-of-the-art, Information Fusion 14 (2013) 28–44. [40] D. Newman, S. Hettich, C. Blake, C. Merz, UCI Repository of Machine Learning Databases, University of California, Department of Information and Computer Science, Irvine, CA, 1998. .

Please cite this article in press as: S. Li et al., Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.11.002

Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set

Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set

Recommend Documents