Hierarchical attribute reduction algorithms for big data using MapReduce

Hierarchical attribute reduction algorithms for big data using MapReduce

Accepted Manuscript Hierarchical attribute reduction algorithms for big data using MapReduce Jin Qian, Ping Lv, Xiaodong Yue, Caihui Liu, Zhengjun Jin...

596KB Sizes 3 Downloads 162 Views

Accepted Manuscript Hierarchical attribute reduction algorithms for big data using MapReduce Jin Qian, Ping Lv, Xiaodong Yue, Caihui Liu, Zhengjun Jing PII: DOI: Reference:

S0950-7051(14)00331-1 http://dx.doi.org/10.1016/j.knosys.2014.09.001 KNOSYS 2948

To appear in:

Knowledge-Based Systems

Received Date: Revised Date: Accepted Date:

10 November 2013 21 August 2014 7 September 2014

Please cite this article as: J. Qian, P. Lv, X. Yue, C. Liu, Z. Jing, Hierarchical attribute reduction algorithms for big data using MapReduce, Knowledge-Based Systems (2014), doi: http://dx.doi.org/10.1016/j.knosys.2014.09.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1 2

3

Hierarchical attribute reduction algorithms for big data using MapReduce

Jin Qian

a,b,∗

4

5 6 7 8 9 10 11

a Key

Laboratory of Cloud Computing and Intelligent Information Processing of Changzhou City, Jiangsu University of Technology, Changzhou, 213001, China b School

of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, China

c School

of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China

d Department

12

13

14 15 16 17 18 19 20 21 22 23 24

25 26

, Ping Lv a,b , Xiaodong Yue c , Caihui Liu d Zhengjun Jing a,b

of Mathematics and Computer Science, Gannan Normal University, Ganzhou, 341000, China

Abstract Attribute reduction is one of the important research issues in rough set theory. Most existing attribute reduction algorithms are now faced with two challenging problems. On one hand, they have seldom taken granular computing into consideration. On the other hand, they still cannot deal with big data. To address these issues, the hierarchical encoded decision table is first defined. The relationships of hierarchical decision tables are then discussed under different levels of granularity. The parallel computations of the equivalence classes and the attribute significance are further designed for attribute reduction. Finally, hierarchical attribute reduction algorithms are proposed in data and task parallel using MapReduce. Experimental results demonstrate that the proposed algorithms can scale well and efficiently process big data. Key words: Hierarchical attribute reduction; Granular computing; Data and task parallelism; MapReduce; Big data

∗ Corresponding author. Tel.:+86-519-86953252 Email addresses: [email protected] (Jin Qian ), [email protected] (Ping Lv), [email protected] (Xiaodong Yue), liu [email protected] (Caihui Liu), [email protected] (Zhengjun Jing).

Preprint submitted to

11 September 2014

27

1

28

With an increasing amount of scientific and industrial datasets, mining the useful information from big data is growing today for business intelligence. The classical data mining algorithms are becoming more challenging from both data and computational intensive perspectives [14]. As indicated in [30], datasets sizes are too massive to fit the main memory and the search space sizes are too large to explore using a single machine. As we all know, not all these attributes are necessary or sufficient for decision making. Irrelevant or redundant attributes not only increase the size of the search space, but also make generalization more difficult [11]. Hence attribute reduction, also called feature selection, is often carried out as a preprocessing step in knowledge acquisition to find a minimum subset of attributes that provides the same descriptive or classification ability as the whole attributes. Using attribute reduction method in rough set theory [24], we can acquire different reducts which can be used to induce different concise sets of rules. It has been successfully applied in many fields such as machine learning, data analysis and data mining. A major challenge is how to speed up the attribute reduction process. Many existing algorithms [4,17,23,25,26,27,29,32,38,40,42] have been developed and cut the time complexity into max(O(|C||U|),O(|C|2|U/C|)) for small datasets [26,27,38]. Unfortunately, these existing attribute reduction algorithms have seldom taken granular computing into consideration and difficulty in discovering the general valuable decision rules with higher supports at low or primitive levels. Moreover, these algorithms cannot deal with big data.

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

Introduction

Granular computing [45,2,22,41] offers a unified framework for problem solving under different levels of granularity, which can seek for a different hierarchical view of a complex structured problem flexibly by granules. A structured problem may comprise a web of interacting and interrelated granules. For effective problem solving, we must provide a new insight at a certain level of granulation. Since deriving the rules from higher concept level may lead to acquire more general and important knowledge, some scholars [15,16,49,8,36] began to mine hierarchical decision rules from different levels of abstraction. Hong et al. [15] represented hierarchical attribute values by hierarchical trees to construct a new learning method for deriving cross-level certain and possible rules from the training data. Ziarko [49] presented an approach to forming a linear hierarchy of decision tables using the decomposition of the boundary region and the variable precision rough set model. Feng and Miao [8] provided an approach to mining hierarchical decision rules from different levels by combing the hierarchical structure of multidimensional data model with the techniques of rough set theory. Wu and Leung [36] introduced multi-scale information table from the perspective of granular computation and mined hierarchical decision rules from multi-scale decision tables under different levels of granularity. Ye et al. [43] extended conditional entropy under single-level granulation 2

69 70 71 72 73 74 75 76 77 78 79 80

81 82 83 84 85 86 87 88 89 90 91 92 93 94

95 96 97 98 99 100 101 102 103 104 105 106 107

108 109 110

to hierarchical conditional entropy under multi-level granulation and studied attribute-generalization reduct by coarsening and refining attribute values. Yuan et al. [44] proposed a hierarchical reduction algorithm for concept hierarchy. Zhang et al. [48] presented an attribute reduction method to acquire multi-confidence rules from the covering decision systems. However, as the size of the dataset can be quite large, it is difficult or even impractical for such big data to be stored and dealt with on a single machine, which renders the existing serial methods unusable. Furthermore, these algorithms do not have the ability to switch among different levels of abstraction flexibility. Therefore, it is necessary to develop an effective and efficient approach to hierarchical attribute reduction for big data on different levels to accommodate different user requirements. For big data, it would appear that sampling technique can be applied. Wang et al. [34] selected the most informative instances from large datasets, constructed the corresponding discernibility matrix and acquire all reducts. However, sampling guarantees often only hold if the samples can represent all of the data or satisfy the hypothesis space. Parallel computing may be a good solution to attribute reduction [6,31,33,18]. Deng et al. [6], Susmaga et al. [31] and Wang et al. [33] computed a reduct or all reducts in parallel for small datasets. Liang et al. [18] considered the sub-tables within a large-scale dataset as small granularities. The reducts of small granularities can be computed separately and finally be fused together to generate the reduct of the whole data. However, these reducts are not guaranteed to be the same as those using the whole dataset since these subsystems (sub-tables) do not exchange the information for each other. Thus, these techniques cannot acquire an exact reduct for big data in most cases. Recently, a extremely simple parallel computation approach—MapReduce [5], has been applied in data-intensive and computation-intensive tasks. It is quite novel, since it interleaves parallel and sequential computation, automatically parallelizes the computation across large-scale clusters of machines, and hides many system-level details. Furthermore, MapReduce implementations offer their own distributed file systems that provide a scalable mechanism for storing massive datasets [10]. It has been applied in data mining [14,30,50] and machine learning [1,3,47]. In rough set theory, Zhang et al. [46] proposed a parallel method for computing equivalence classes to acquire lower and upper approximations using MapReduce. Yang et al. [39] computed the reduct redi for each sub-decision table Si , combined ∪redi , and generated the reduct Red by deleting redundant attributes using MapReduce. Qian et al. [28] proposed a parallel attribute reduction algorithm using MapReduce in cloud computing. To the best of our knowledge, relatively little work has been done for hierarchical attribute reduction for big data using MapReduce. In this paper, we investigate the following issue: How to use MapReduce programming model 3

111 112 113 114 115 116 117 118 119

to design a parallel hierarchical attribute reduction algorithm so that we can mine decision rules under different levels of granularity? We first discuss a hierarchical encoded decision table and some properties, and analyze parallel and serial operations among the classical attribute reduction algorithms, then design proper < key, value > pairs and implement map/reduce functions for computing the hierarchical encoded decision table, and finally propose hierarchical attribute reduction algorithms in data and task parallel using MapReduce by Hadoop which are applicable to big data. Experimental results demonstrate that our proposed algorithms can efficiently deal with big data.

125

The rest of this paper is organized as follows. Section 2 reviews necessary concepts in rough set theory and MapReduce framework. Section 3 proposes an approach to computing the hierarchical encoded decision table and hierarchical attribute reduction algorithms. In section 4, we discuss empirical evaluation of our proposed parallel algorithms using MapReduce. Finally, the paper work is concluded in Section 5.

126

2

127 128

In this section, we will review some notions of the Pawlak rough set model [24,25,27] and MapReduce programming model [5].

129

2.1 Rough set theory

120 121 122 123 124

130 131 132 133 134

135 136 137 138 139 140 141

142 143 144

Basic notions

In Pawlak’s rough set model, the indiscernibility relation and equivalence class are important concepts. The indiscernibility relation expresses the fact that due to lack of information (or knowledge), we are unable to discern some objects by employing available information. It determines a partition of U and is used to construct the equivalence classes. Let S = (U, At = C ∪ D, {Va |a ∈ At}, {Ia |a ∈ At}) be a decision table, where U is a finite non-empty set of objects, At is a finite nonempty set of attributes, C = {c1 , c2 , . . . , cm } is a set of conditional attributes describing the objects, and D is a set of decision attributes that indicates the classes of objects. Va is a nonempty set of values of a ∈ At, and Ia is an information function that maps an object in U to exactly one value in Va , Ia (x)=v means that the object x has the value v on attribute a. For simplicity, we assume D = {d} in this paper, where d is a decision attribute which describes the decision for each object. A table with multiple decision attributes can be easily transformed into a table with a single decision attribute 4

145

by coding the corresponding Cartesian product of attribute values. An indiscernibility relation with respect to A ⊆ C is defined as: IND(A) = {(x, y) ∈ U × U|∀a ∈ A, Ia (x) = Ia (y)}.

146 147

(1)

The partition generated by IND(A) is denoted as πA . [x]A denotes the block of the partition πA containing x ; moreover, [x]A = ∩a∈A [x]a .

148

149 150 151 152 153

154 155

Consider a partition πD = {D1 , D2 , . . . , Dk } of the universe U with respect to the decision attribute D and another partition πA ={A1 , A2 , . . . , Ar } defined by a set of conditional attributes A. The equivalence classes induced by the partition are the basic blocks to construct the Pawlak rough set approximations. Definition 1 For a decision class Di ∈ πD , the lower and upper approximations of Di with respect to a partition πA are defined as:

apr A (Di ) = {x ∈ U|[x]A ⊆ Di };

(2)

apr A (Di ) = {x ∈ U|[x]A ∩ Di = ∅}. 156 157

Definition 2 For a decision table S, the positive region and boundary region of a partition πD with respect to a partition πA are defined as:

P OS(D|A) = BND(D|A) =



 1≤i≤k

apr A (Di );

(apr A (Di ) − apr A (Di )).

(3)

1≤i≤k 158 159 160 161

Definition 3 For a decision table S, let A ⊆ C and πA ={A1 , A2 , . . ., Ar }, D )) and the number the indiscernibility object pair set with respect to A (DOP A

 of the corresponding pairs of objects that A can not discern(DIS(D|A)) are defined as

D = {< x, y > |x ∈ Di , y ∈ Dj , ∀a ∈ A, Ia (x) = Ia (y)} DOP A

162

where Di ∈ πD , Dj ∈ πD , 1 ≤ i < j ≤ k. 5

(4)

 = DIS(D|A)





nip njp

(5)

1≤p≤r 1≤i
164

where njp (nip ) denotes the number of these objects equal to j(i) on d in an equivalence class Ap .

165

Definition 4 For a decision table S, information entropy of D is given by:

163

Inf o(D) = −

k  nj j=1

166 167

r k  njp np  p=1

169

170 171 172 173 174 175 176 177 178 179

180 181 182 183

nj n

(6)

The entropy Inf o(D|A), conditional entropy of A conditioned on D, is given by

Inf o(D|A) = −

168

n

log2

njp log2 n j=1 np np

(7)

where np , njp and nj denote the number of the objects, the number of the objects equal to j on d in Ap , and the number of the objects equal to j on d, respectively. Definition 5 For a decision table S, an attribute set A ⊆ C is a reduct of C with respect to D if it satisfies the following two conditions: (1) (D|A) = (D|C); (2) for any attribute c ∈ A, (D|[A − {c}]) = (D|A). An attribute c ∈ C is a core attribute of C with respect to D if it satisfies the following condition: (3) (D|[C − {c}]) = (D|C);  dewhere (.|.) denotes the classification ability, ={POS, BND, Info, DIS} notes the attribute reduction method based on positive region, boundary region, information entropy and discernibility matrix. Definition 6 [7] For a decision table S, an attribute set A ⊆ C, if A → D ⊆ U/A × U/D implies {Ai , Dj } ∈ A → D ⇐⇒ Ai ⊆ apr A (Dj ), then {Ai , Dj } ∈ A → D is called the rule of A as to D. The approximation quality γ(A → D) or a rule A → D is defined as: 6



γ(A → D) =

184

185 186 187 188

MapReudce [5], introduced by Google, is a programming model. The user expresses the computation by means of designing < key, value > pairs and implementing map and reduce function. Specially, the map and reduce functions are illustrated as follows: map :< K1 , V1 >−→ [< K2 , V2 >] reduce :< K2 , [V2 ] >−→ [< K3 , V3 >]

190

192

193 194 195 196 197 198 199 200

(8)

2.2 MapReduce programming model

189

191

|aprA (Dj ) : Dj ∈ πD | |U|

where all Ki and Vi (i=1,. . .,3) are user-defined data types, and the convention [. . .] is used throughout this paper to denote a list. The MapReduce programming model consists of three phases: map, shuffle and reduce. The map phase takes as a < K1 , V1 > pair and produces a set of intermediate < K2 , V2 > pairs which are stored into the local machine. In the shuffle phase, the MapReduce library groups together all intermediate values V2 associated with the same K2 and passes them to the same machine. The reduce phase accepts an K2 and a set of values for that key, merges together these values to form a possibly smaller set of values, and finally takes the < K3 , V3 > pairs as output.

205

Since MapReduce provides a robust and scalable framework for executing parallel computation, we mainly focus on designing data structures for < key, value > pairs as well as implementing the map and reduce functions for hierarchical encoded decision table and parallel hierarchical attribute reduction algorithms.

206

3

207

In this section, we first discuss the hierarchical encoded decision table, then design the parallel computation algorithms of transforming the original decision table into the hierarchical encoded decision table and hierarchical attribute reduction algorithms for big data, finally give the time complexity of these parallel algorithms.

201 202 203 204

208 209 210 211

Hierarchical attribute reduction algorithms using MapReduce

7

212

213 214 215 216 217 218 219 220 221 222 223 224 225 226 227

3.1 Hierarchical encoded decision table

Most of previous studies on rough sets focused on mining certain rules and possible rules on the single concept level. However, the value set of some attributes can form a generalization hierarchy in real-world applications such as P roduct < Brand < Category for product attribute and can be represented by a concept hierarchy tree. The formal use of concept hierarchies as the most important background knowledge in data mining is introduced by Han, Cai and Cercone [12,13]. A concept hierarchy is often related to a specific attribute and is partially order relation according to general-to-specific ordering. The most general node is the null (any), and the most specific node corresponds to the specific values of an attribute in the original decision table. Thus, one can reduce the data table by replacing low-level concepts with higher-level concepts using the concept hierarchy, and may mine more general and important knowledge from a decision table under different levels of granularity [8]. To this end, it is necessary to develop hierarchical attribute reduction algorithms for mining hierarchical decision rules. Any(*)

Any(*)

Any(*)

(1,1,2,2) Youth (2,2,2,2) Below 30 11

1

Middle_aged

31-35

36-40

41-50

12

21

22

2

The old

High

3

51-60

61-70

31

32

Doctoral Student 11

1

Low

Postgraduate student

2

Undergraduate

Others

21

22

12

Enterprise State-owned Enterprise

1

Institution

Private Enterpise

Education

12

21

11

2

Civil Servant 22

(a) Concept tree for contional attributes {Age, Education level, Occupation} Any(*) (1,1,2,1)

(1,1,2,2)

High

1

Middle

Above 100000

50001100000

11

12

2500150000

Low

2 1000125000

21 22 (b) Concept tree for decision attribute Salary

3

500110000

Below 5000

31

32

Fig. 1. Concept hierarchy tree for each attribute 228 229 230 231 232 233 234 235 236 237 238 239

In general, the hierarchical attribute values can be represented by a concept hierarchy tree. Terminal leaf nodes on the trees represent actual attribute values appearing in the original decision table; internal nodes represent value clusters formed from their lower-level nodes. In a concept hierarchy, each level can be denoted by a digit. Usually the level number for the root node is assigned with the digit 0, while the level number for the other concept nodes is one plus its father’s level number. In this case, the concepts at each level can be called a concept at level w (see Fig. 1). We employ an integer encoding mode to encode the concept hierarchy: for any concept v at level w, the corresponding encoding string is lv1 /. . ./lvw , where “ / ” is list separator, and lv1 /. . ./lvw−1 is  the corresponding encoding of v s father concept. As indicated in [21], we can conclude the following two propositions as well. 8

240 241 242

243 244 245

246 247 248 249 250 251 252 253 254

255 256 257 258 259 260 261 262

263 264 265 266

267 268

269 270 271 272 273 274

275 276 277 278

Proposition 1 For any two concept nodes A and B with the encoding string lA1 /. . ./lAi and lB1 /. . ./lBj , respectively, A is a brother node of B if and only if i=j and lAw =lBw for w=1, . . ., i-1. Proposition 2 For any two concept nodes A and B with the encoding string lA1 /. . ./lAi and lB1 /. . ./lBj , respectively, A is a child node of B if and only if i=j+1 and lAw =lBw for w=1, . . ., j. For simplicity, we can also denote lv1 /. . ./lvw by lv1 . . .lvw without the ambiguity. By Proposition 1 and 2, we can discern the relationship from any two concepts in a concept hierarchy tree. Attribute ci is formed along l(ci )+1 hierarchy levels: 0, 1, . . ., l(ci ) with level 0 being the special value ANY(*). Given a decision table S and the concept hierarchies of all attributes, we will denote the depth of concept hierarchy of conditional attribute ci and decision attribute d as l(ci )+1(i =1, . . ., m) and l(d)+1, respectively. Thus, our decision table spawns the (m+1)-attributes in this paper. Without loss of generality, we define a generalized hierarchical decision table as follows. Definition 7 Let S = (U, At = C ∪D, {Va |a ∈ At}, {Ia |a ∈ At}) be a decision table, Sk1 k2 ...km kd =(Uk1 k2 ...km kd , At = C ∪ D, HAt , V k1 k2 ...km kd , Ik1 k2 ...km kd ) is denoted as a (k1 k2 . . .km kd )-th hierarchical decision table induced by S, where Uk1 k2 ...km kd is a finite non-empty set of objects, HAt = {Ha |a ∈ At} denotes the  kt concept hierarchy trees of the set of attributes, V k1 k2 ...km kd = m ∪ V kd t=1 V is the domain of ct (t=1, 2, . . .,m) at kt -th level and d at kd -th level of their concept hierarchies, and Ik1 k2 ...km kd is the information function from Uk1 k2 ...km kd to V k1 k2 ...km kd . Definition 8 For a decision table S, denote the domain of attribute ct at the i-th level in its concept hierarchy as V it , we will say that V it is coarser than V jt if and only if for any b ∈ V jt , there always exists a ∈ V it such that b is subconcept of a, and denote it as V it  V jt . In general, given two concept levels i and j for attribute ct , if i ≤ j, the values of ct in concept level i are more generalized. Definition 9 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, denote the domain of conditional attribute set C in two hierarchical decision tables as V i1 i2 ...im id and V j1 j2 ...jm jd , we will say that V i1 i2 ...im id is coarser than V j1 j2 ...jm jd if and only if for any t ∈ {1, 2, . . . , m} and decision attribute d, there always have V it  V jt and V id  V jd , denoted by V i1 i2 ...im id  V j1 j2 ...jm jd . Definition 10 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )th decision table, if V it  V jt for any t ∈ {1, 2, . . . , m}, then U/cit is coarser than U/cjt , or U/cjt is finer than U/cit , denoted by U/cit  U/cjt . Correspondingly, if V i1 i2 ...im  V j1 j2 ...jm , then U/C i1 i2 ...im is coarser than U/C j1 j2 ...jm , or 9

279

280 281 282 283 284 285 286

U/C j1 j2 ...jm is finer than U/C i1 i2 ...im , denoted by U/C i1 i2 ...im  U/C j1 j2 ...jm . Example 1: Table 1 summarizes the characteristics of a decision table. With concept hierarchies of all attributes {Age(A), Education Level(EL), Occupation(O), Salary(Sa)} in Fig.1, Table 1 can be transformed into different hierarchical encoded decision tables as shown in Fig. 2. Table 2, Table 3 and Table 4 only illustrate the details of S2222 , S1122 , S1112 , S1121 and S1111 hierarchical decision tables in which A2 , A1 , EL2 , EL1 , O 1, O 2 , Sa1 and Sa2 represent the concept levels for the conditional attributes and decision attribute respectively. Table 1 Description of the datasets U

Age

Education Level

Occupation

Salary

1

31-35

Doctoral Student

State-owned Enterprise

10001-25000

2

Below 30

Others

Civil Servant

5001-10000

3

31-35

Postgraduate Student

State-owned Enterprise

10001-25000

4

36-40

Postgraduate Student

Private Enterprise

25001-50000

5

41-50

Undergraduate

Education

25001-50000

6

51-60

Others

Private Enterprise

Below 5000

7

41-50

Doctoral Student

State-owned Enterprise

25001-50000

8

36-40

Postgraduate Student

Private Enterprise

10001-25000

9

61-70

Postgraduate Student

State-owned Enterprise

Above 100000

10

41-50

Undergraduate

Education

50001-100000

287

293

In Fig. 2, all hierarchical decision tables can be represented as a lattice, in which the arrows illustrate the possible generalization paths. The top node S1111 represents the most generalized hierarchical decision table, while the bottom node S2222 denotes the raw decision table. A path from S2222 to S1222 , to S1122 , S1112 and S1111 , is a generalization strategy for different level ascensions of the conditional attributes and decision attribute.

294

3.1.1 Conditional attribute level ascension in hierarchical decision table

288 289 290 291 292

295 296 297

298 299

In what follows, we first discuss some properties of these hierarchical decision tables under different levels of granularity for conditional attributes from the perspectives of uncertainty, core attribute, reduct and decision rule. Theorem 1 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it ≤ jt (t=1,2,. . .,m) and id =jd , then 10

S1,1,1,1

S1,2,1,1

S1,1,1,2

S1,1,2,2

S2,1,1,2 S1,2,2, 2

S2,1,1,1

S1,1,2,1

S1,2,1,2

S1,2,2,1

S 2,1,2, 2

S2,1,2,1

S 2,2,1, 2

S2,2,1,1 S 2,2,2,1

S2,2,2,2

Fig. 2. Different hierarchical decision tables under different levels of granularity Table 2 S2222 original decision table

300 301 302

303 304 305 306 307

308 309 310

U

Age(A2 )

Education level(EL2 )

Occupation(O2 )

Salary(Sa2 )

1

12

11

11

22

2

11

22

22

31

3

12

12

11

22

4

21

12

12

21

5

22

21

21

21

6

31

22

12

32

7

22

11

11

21

8

21

12

12

22

9

32

12

11

11

10

22

21

21

12

(1) |P OSi1i2 ...im id (D|C)| ≤ |P OSj1j2 ...jm jd (D|C)|;  (2) DISi1 i 2 ...im id (D|C) ≥ DISj1 j2 ...jm jd (D|C); (3) Inf oi1 i2 ...im id (D|C) ≥ Inf oj1 j2 ...jm jd (D|C). Proof: Since it ≤ jt (t=1,2,. . .,m) and id =jd , we can acquire U/C i1 i2 ...im  U/C j1 j2 ...jm and πDid = πDjd . Assume that πC j1 j2 ...jm ={P1 , . . ., Pl } and πC i1 i2 ...im ={Q1 , . . ., Qw }, it is obvious that l > w. There exists a subset Ei of a set {1, 2, . . ., l} such that Ei ∩ Eh = φ where i = h, i, h = 1, 2, . . . w. Thus, Qi = ∪j∈Ei Pj , where i=1, 2, . . . , w. (1) For any equivalence class Dq ∈ πDid , since apr Q (Dq ) ⊆ ∪j∈Ei apr P (Dq ), we i j can have P OSC j1 j2 ...jm (D|C) ⊇ P OSC i1 i2 ...im (D|C), thus |P OSC j1j2 ...jm (D|C)| ≥ |P OSC i1i2 ...im (D|C)|. 11

Table 3 S1122 and S1112 hierarchical decision table S1122 decision table

S1112 decision table

U

A1

EL1

O2

Sa2

U

A1

EL1

O1

Sa2

1

1

1

11

22

1

1

1

1

22

2

1

2

22

31

2

1

2

2

31

3

1

1

11

22

3

1

1

1

22

4

2

1

12

21

4

2

1

1

21

5

2

2

21

21

5

2

2

2

21

6

3

2

12

32

6

3

2

1

32

7

2

1

11

21

7

2

1

1

21

8

2

1

12

22

8

2

1

1

22

9

3

1

11

11

9

3

1

1

11

10

2

2

21

12

10

2

2

2

12

Table 4 S1121 and S1111 hierarchical decision table S1121 decision table

S1111 decision table

U

A1

EL1

O2

Sa1

U

A1

EL1

O1

Sa1

1

1

1

11

2

1

1

1

1

2

2

1

2

22

3

2

1

2

2

3

3

1

1

11

2

3

1

1

1

2

4

2

1

12

2

4

2

1

1

2

5

2

2

21

2

5

2

2

2

2

6

3

2

12

3

6

3

2

1

3

7

2

1

11

2

7

2

1

1

2

8

2

1

12

2

8

2

1

1

2

9

3

1

11

1

9

3

1

1

1

10

2

2

21

1

10

2

2

2

1

311

 (2) DISj1 j2 ...jm (D|C) =

312

=

313





1≤j≤l1≤k1




nkj 1 nkj 2

nkj 1 nkj 2

1≤k1




(



1≤k1
nkj 1 )(



j∈Ei

nkj 2 )

12





nki 1 nki 2

314

=

315

= DISi1 i2 ...im (D|C).

316

(3) p(Qi ) =

1≤k1
 j∈Ei

p(Pj ), where i=1, 2, . . ., w. As

317

we have

318

Inf oj1 j2 ...jm (D|C)=-

319

320

≤≤-

w   i=1 j∈Ei w  i=1

p(Pj )

p(Qi )

k  q=1

k  q=1

l  j=1

p(Pj )

k  q=1

l  j=1

p(Pj ) =

w   i=1

j∈Ei

p(Pj ),

p(Dq |Pj )log2 p(Dq |Pj )

p(Dq |Pj )log2 p(Dq |Qi )

p(Dq |Qi )log2 p(Dq |Qi )

321

=Inf oi1 i2 ...im (D|C). 

322

Corollary 1 If {11 . . . 1 id , . . ., i1 i2 . . .im id , . . ., l(c1 )l(c2 ). . .l(cm )id } is a total  

323

ordering relation, we have

324

(1) |P OS11 . . . 1 id (D|C)| ≤ . . . ≤ |P OSi1i2 ...im id (D|C)| ≤ . . . ≤ |P OSl(c1)l(c2 )...l(cm )id (D|C)|

m

325 326

 

≤ |U|;

m

  (2) DIS11  . . . 1 id (D|C) ≥ . . . ≥ DISi1 i2 ...im id (D|C) ≥ . . . ≥ DISl(c1 )l(c2 )...l(cm )id (D|C)   m

327 328

329

330 331 332

333 334 335 336

337 338 339 340

≥ 0; (3) Inf o11 . . . 1 id (D|C) ≥ . . . ≥ Inf oi1 i2 ...im id (D|C) ≥ . . . ≥ Inf ol(c1 )l(c2 )...l(cm )id (D|C)  

≥ 0.

m

Theorem 1 and Corollary 1 indicates that the positive region decreases monotonously as the concept levels of conditional attributes ascend, while the value of indiscernibility object pairs and information entropy increases. Theorem 2 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1 i2 ...im id (D|C), where it ≤ jt (t=1, 2, . . . m) and id = jd . If ck is a core attribute of (j1 j2 . . .jm jd )-th decision table, then ck is a core attribute of the (i1 i2 . . .im id )-th decision table. Proof: Since ck is a core attribute of the (j1 j2 . . .jm jd )-th decision table, P OSj1j2 ...jm jd (D|A) ⊂ P OSj1j2 ...jm jd (D|C) where A = C - {ck }. Since it ≤ jt (t=1, 2, . . ., m) and id = jd , we have πAi1 i2 ...im  πAj1 j2 ...jm and πDid = πDjd , thus P OSi1i2 ...im id (D|A) ⊆ P OSj1j2 ...jm jd (D|A) ⊂ P OSj1j2 ...jm jd (D|C) = P OSi1i2 ...im id (D|C). Therefore, 13

341

342 343 344 345 346

347 348 349 350

351 352 353

ck is a core attribute of the (i1 i2 . . .im id )-th decision table.  Theorem 3 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1i2 ...im id (D|C), where it ≤ jt (t=1,2,. . .,m) and id =jd . If Redj ⊆ C is a reduct of the (j1 j2 . . .jm jd )-th decision table, then there exists a reduct of (i1 i2 . . .im id )-th decision table Redi such that Redi ⊇ Redj . Proof: Redj is a reduct of the (j1 j2 . . .jm jd )-th decision table, then |P OSj1j2 ...jm jd (D|Redj )| = |P OSj1j2 ...jm jd (D|C)|. Since it ≤ jt (t=1, 2, . . ., m) and id = jd , we can acquire |P OSi1i2 ...im id (D|Redj )| ≤ |P OSj1j2 ...jm jd (D|Redj )|. Thus there exist the following two cases. Case 1: If |P OSi1i2 ...im id (D|Redj )| = |P OSi1i2 ...im id (D|C)|, Redj is a reduct of (i1 i2 . . .im id )-th decision table, so there exists a reduct Redi such that Redi = Redj .

357

Case 2: If |P OSi1i2 ...im id (D|Redj )| < |P OSi1i2 ...im id (D|C)|, it must add some attributes A such that |P OSi1i2 ...im id (D|[Redj ∪ A])| = |P OSi1i2 ...im id (D|C)|. Thus, Redj ∪ A is a reduct of (i1 i2 . . .im id )-th decision table. Therefore, there exists a reduct Redi such that Redi = Redj ∪ A.

358

From the above two cases, we have Redi ⊇ Redj . 

354 355 356

359 360 361

362 363 364 365 366

Theorem 4 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it ≤ jt (t=1,2,. . .,m) and id =jd , then γ(C i1 i2 ...im → D id ) ≤ γ(C j1 j2 ...jm → D jd ) . Proof: Since it ≤ jt (t=1,2,. . .,m) and id = jd , we can acquire U/C i1 i2 ...im  U/C j1 j2 ...jm and πDid = πDjd . Assume that πC j1 j2 ...jm = {P1 , . . ., Pl } and πC i1 i2 ...im = {Q1 , . . ., Qw }, it is obvious that l > w. There exists a subset Ei of a set {1, 2, . . ., l} such that Qi = ∪j∈Ei Pj (i=1, 2, . . ., w). For any  equivalence class Dl ∈ πDid , apr Q (Dl ) ≤ apr P (Dl ). Therefore, we have i

i1 i2 ...im

→ D ) ≤ γ(C id

j1 j2 ...jm

j∈Ei

j

→ D ).  jd

367

γ(C

368

3.1.2 Decision attribute level ascension in hierarchical decision table

369 370 371 372 373

374 375

In what follows, we consider the case that keeps the concept levels of conditional attributes unchanged and ascends the level of decision attribute. In such case, we further discuss the relationships among hierarchical decision tables from the perspectives of the uncertainty, attribute core, reduct and decision rule. Theorem 5 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it =jt (t=1,2,. . .,m) and id ≤ jd , then 14

376 377 378

(1) |P OSi1i2 ...im id (D|C)| ≥ |P OSj1j2 ...jm jd (D|C)|;  (2) DISi1 i 2 ...im id (D|C) ≤ DISj1 j2 ...jm jd (D|C); (3) Inf oi1 i2 ...im id (D|C) ≤ Inf oj1 j2 ...jm jd (D|C).

382

Proof: Since it = jt (t=1,2,. . .,m), πC i1 i2 ...im = πC j1 j2 ...jm . As id ≤ jd , πDid  πDjd . Let πDid ={D1 , . . ., Dl } and πDjd ={D1 , . . ., Dw }. Obviously, l < w. There exists a subset Ei of a set {1, 2, . . ., w} such that Ei ∩ Eh = φ where  i = h, i, h= 1, 2, . . . l. For any equivalence class, Di = Dj , where i=1, 2,

383

. . ., l.

379 380 381

384 385

386

387

388

389

j∈Ei

(1) For any equivalence class Cq ∈ πC i1 i2 ...im , apr Cq (Di ) ≥ ∪j∈Ei aprCq (Dj ). Thus, |P OSi1i2 ...im id (D|C)| ≥ |P OSj1j2 ...jm jd (D|C)|. (2) For any equivalence class Cq ∈ πC i1 i2 ...im , we have  1≤j1
 1≤i1
2 njq1 njq2 = [|Cq |2 - (n1q )2 - . . . - (njq )2 - . . . - (nw q ) ]/2.

niq1 niq2 = [|Cq |2 - (n1q )2 - . . . - (niq )2 - . . . - (nlq )2 ]/2.

Since Di =

 j∈Ei



Dj , niq =

njq1 njq2 ≥

 j∈Ei

njq , (niq )2 = (



j∈Ei



njq )2 ≥

 j∈Ei

(njq )2 .

niq1 niq2 .

390

Thus,

391

 Therefore, we can conclude that DISi1 i 2 ...im id (D|C) ≤ DISj1 j2 ...jm jd (D|C).

392 393

394

395

396

397

399 400

1≤i1
(3)For any equivalence class Cq ∈ πC i1 i2 ...im , we have p(Di |Cq ) = where i=1, 2, . . ., l. Thus, Inf oi1 i2 ...im id (D|C)=≤≤=-

r  q=1 r  q=1 r 

q=1 398

1≤j1
p(Cq ) p(Cq ) p(Cq )

l   i=1 j∈Ei l   i=1 j∈Ei n  j=1

r  q=1

p(Cq )

l  i=1

p(Dj |Cq )log2

 j∈Ei

p(Dj |Cq ),

p(Di |Cq )log2 p(Di |Cq )

 j∈Ei

p(Dj |Cq )

p(Dj |Cq )log2 p(Dj |Cq )

p(Dj |Cq )log2 p(Dj |Cq )

= Inf oj1 j2 ...jm jd (D|C).  Corollary 2 If {i1 i2 . . .im l(d), . . ., i1 i2 . . .im id , . . ., i1 i2 . . .im 1} is a total ordering relation, we have 15

401 402 403 404 405 406

407 408 409

410 411 412 413

414 415 416 417 418

419 420 421 422 423

424 425 426 427 428 429 430

431 432 433

434 435 436 437

(1) |P OSi1i2 ...im l(d) (D|C)| ≤ . . . ≤ |P OSi1i2 ...im id (D|C)| ≤ . . . ≤ |P OSi1i2 ...im 1 (D|C)| ≤ |U|;   (2) DISi1 i2 ...im l(d) (D|C) ≥ . . . ≥ DISi1 i2 ...im id (D|C) ≥ . . . ≥ DISi1 i2 ...im 1 (D|C) ≥ 0; (3) Inf oi1 i2 ...im l(d) (D|C) ≥ . . . ≥ Inf oi1 i2 ...im id (D|C) ≥ . . . ≥ Inf oi1 i2 ...im 1 (D|C) ≥ 0. Theorem 5 and Corollary 2 indicates that the size of the positive region increases monotonously as the concept level of decision attribute ascends, while the values of indiscernibility object pairs and information entropy decrease. Theorem 6 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1 i2 ...im id (D|C), where it =jt (t=1, 2, . . ., m) and id ≤ jd . If ck is a core attribute of the (i1 i2 . . .im id )-th decision table, then ck is also a core attribute of the (j1 j2 . . .jm jd )-th decision table. Proof: Since ck is a core attribute of the (i1 i2 . . .im id )-th decision table, A = C - {ck }, P OSi1i2 ...im id (D|A) ⊂ P OSi1i2 ...im id (D|C). As it = jt (t=1,2,. . .,m) and id ≤ jd , πAi1 i2 ...im = πAj1 j2 ...jm , πDid  πDjd , thus P OSj1j2 ...jm jd (D|A) ⊆ P OSi1i2 ...im id (D|A) ⊂ P OSi1i2 ...im id (D|C) = P OSj1j2 ...jm jd (D|C). Therefore, ck is a core attribute of the (j1 j2 . . .jm jd )-th decision table.  Theorem 7 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1i2 ...im id (D|C) and Redj ⊆ C is a reduct of the (j1 j2 . . .jm jd )-th decision table. If ip =jp (p=1,2,. . .,m) and id ≤ jd , then there exists a reduct of (i1 i2 . . .im id )-th decision table, Redi , such that Redi ⊆ Redj . Proof: Redj is a reduct of the (j1 j2 . . .jm jd )-th decision table, then |P OSj1j2 ...jm jd (D|Redj )| = |P OSj1j2 ...jm jd (D|C)|. Since it = jt (t=1,2,. . .,m) and id ≤ jd , we can acquire |P OSi1i2 ...im id (D|Redj )| = |P OSj1j2 ...jm jd (D|Redj )|. Thus, Redj is a super reduct of the (i1 i2 . . .im id )-th decision table. We can delete some attributes A such that |P OSi1i2 ...im id (D|[Redj − A])| = |P OSi1i2 ...im id (D|C)|. If Redj − A is a reduct of (i1 i2 . . .im id )-th decision table, then there exists a reduct Redi such that Redi = Redj − A. Thus, Redi ⊆ Redj .  Theorem 8 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it = jt (t=1,2,. . .,m) and id ≤ jd , then γ(C i1 i2 ...im → D id ) ≥ γ(C j1 j2 ...jm → D jd ) . Proof: Since it = jt (t=1, 2, . . ., m), πC i1 i2 ...im = πC j1 j2 ...jm . As id ≤ jd , πDid  πDjd . Let πDid ={D1 , . . ., Dl } and πDjd ={D1 , . . ., Dw }. Obviously, l ¡ w.  There exists a subset Ei of a set {1, 2, . . ., w} such that Di = Dj (i=1, 2, . . ., l). For any equivalence class Cq ∈ πC i1 i2 ...im , apr Cq (Di ) ≥

16

j∈Ei



j∈Ei

aprCq (Dj ).

438

439 440 441

Therefore, we have γ(C i1 i2 ...im → D id ) ≥ γ(C j1 j2 ...jm → D jd ).  Example 2: (Continued from Example 1) Different hierarchical decision tables under different levels of granularity are shown in Fig. 2 and Tables 2—4. We here illustrate some cases at different levels of granularity in details.

453

(1) For S1112 and S2222 hierarchical decision tables, as |P OS1112 (D|C)| =5 and |P OS2222(D|C)|=6, |P OS1112 (D|C)| ≤ |P OS2222 (D|C)| holds. For S1122 and S1121 hierarchical decision tables, since |P OS1122 ( D|C)|=6 and |P OS1121 (D|C)|=8, |P OS1121 (D|C)| ≥ |P OS1122(D|C)| holds. (2) For S1122 and S2222 hierarchical decision tables, {c1 } is a core attribute of S2222 , then it is also a core attribute of S1122 . Meanwhile, {c1 , c2 } is a reduct of S2222 , while {c1 , c2 , c3 } is a reduct of S1122 , thus {c1 , c2 } ⊆ {c1 , c2 , c3 } holds. (3) For S1112 and S2222 hierarchical decision tables, γ(C 111 → D 2 ) = 0.5 and γ(C 222 → D 2 ) = 0.6, thus γ(C 111 → D 2 ) ≤ γ(C 222 → D 2 ). For S1111 and S1112 hierarchical decision tables, γ(C 111 → D 1 ) = 0.8 and γ(C 111 → D 2 ) = 0.5, thus γ(C 111 → D 1 ) ≥ γ(C 111 → D 2 ) holds.

454

3.2 The parallelism of hierarchical attribute reduction algorithms

442 443 444 445 446 447 448 449 450 451 452

455 456

457 458

In what follows, we discuss the parallelism in hierarchical attribute reduction algorithms using MapReduce. Definition 11 For a hierarchical decision table Sk1 k2 ...km kd , let Ski 1 k2 ...km kd (i=1,  2, . . ., N) denote a sub-decision table, if it satisfies (1)Sk1 k2 ...km kd = Ski 1 k2 ...km kd ; 1≤i≤N

459 460

461 462 463

464 465 466

(2)Ski 1 k2 ...km kd ∩Skj1 k2 ...km kd = ∅, where i, j = 1,2,. . .,N and i = j, then Ski 1 k2 ...km kd is called a data split. By Definition 11, Ski 1 k2 ...km kd corresponds to a data split in MapReduce framework. Thus, we can split the whole dataset into many data splits using MapReduce. Theorem 9 For a decision table Sk1 k2 ...km kd , let πA ={A1 , A2 , . . ., Ar }, Ski 1 k2 ...km kd (i=1, 2, . . ., N) be a data split of Sk1 k2 ...km kd , Ski 1 k2 ...km kd /A = {Ai1 , Ai2 , . . .,  Air }, then Ap = Aip (p=1,2,. . .,r). 1≤i≤N

467 468 469 470 471 472

Proof. Assume that x ∈ Aip in Ski 1 k2 ...km kd and y ∈ Ajp in Skj1 k2 ...km kd , since the equivalence classes Aip and Ajp with respect to A are the same as Ap , x∪y ∈ Ap . It means that the same objects among different data splits can be combined as one equivalence class. Thus, for πA ={A1 , A2 , . . ., Ar } and Ski 1 k2 ...km kd /A =  Aip (p=1,2,. . .,r). {Ai1 , Ai2 , . . ., Air }, we can get Ap = 1≤i≤N

17

473 474 475 476 477 478 479 480 481 482 483 484

According to Theorem 9, one can compute the equivalence classes independently for each data split, thus all equivalence classes are the same as those in serial computing. Therefore, we can implement the data parallelism for big data using MapReduce. For hierarchical attribute reduction, we first compute a hierarchical encoded decision table, then compute the equivalence classes and the attribute significance, and finally acquire a reduct. Fig. 3 illustrates the parallelism of a hierarchical attribute reduction algorithm, where S denotes the whole dataset, S i (i=1,2,. . .,N) is a data split, Aj (j=1,2,. . .,M) is a candidate attribute subset which can be regarded as a task for computing the corresponding equivalence classes, ECi (i=1,2,. . .,r) or ECAj (j=1,2,. . .,M) denotes the different equivalence classes. The regions dotted with dash lines denote the parallel computing parts. In what follows, we discuss the parallel computations of the hierarchical encoded decision table in details.

Begin Compute hierarchical encoded decision table under different levels using concept hierarchy tree S

S2

S1

EC1 ...ECr

Ă

EC1 ...ECr

c1



SN

cm

EC1 ...ECr

d

Ă

( EC1 , EC2 ,..., ECr ) Ÿ S k1k2 ...km kd Generate a list of candidate attribute subset { A1 ,

A2 , Ă , AM}

Decompose the hierarchical encoded decision table and compute equivalence classes

Sk1k2 ...kmkd

S1 A1 A AM 2 EC A1 ...EC AM

EC A1

S2 A1 A AM 2 EC A1 ...EC AM ECA2

Ă

SN A1 A AM 2 EC A1 ...EC AM

Ă

EC AM

Compute attribute significance for each equivalence class

Sum the attribute significance for each candidate attribute subset Choose the best candidate attribute N Reduct Y End

Fig. 3. The parallelism of a hierarchical attribute reduction algorithm

485

18

486 487

488 489 490 491 492 493 494 495 496 497 498 499

500 501 502 503 504

505 506 507 508 509 510 511 512 513 514 515 516

517 518 519 520 521 522 523 524

3.3 Parallel computation algorithms for the hierarchical encoded decision table using MapReduce

Given a decision table S and the corresponding Concept Hierarchy Tree(CHT), we can acquire a hierarchical encoded decision table. However, when we view the data under different levels of granularity, the hierarchical decision table may be regenerated from the original decision table S using the concept hierarchy tree. To address this problem, we employ the general encoding technique to acquire a hierarchical encoded decision table. Encoding can be performed during the collection of the relevant data, and thus there is no extra “encoding pass” required. In addition, an encoded string which represents a position in a concept hierarchy tree requires fewer bits than the corresponding attribute value. Therefore, it is often beneficial to use an encoded decision table, although our approach does not rely on the derivation of such an encoded decision table because the encoding can always be performed on the fly. To simplify our discussion, we assume that the given concept hierarchy trees are balanced. We transform a large, low-level, detailed data into a compressed, high-level, generalized data. As we all know, it is obvious that the encoding for one object is independent of that of another object. Thus, different encoding computations for all objects can be executed in parallel. The large-scale data is stored on Hadoop Distributed File System as a sequence file of < key, value > pairs, in which the key is each object, and the value is null. The MapReduce framework partitioned the whole dataset into many data splits, and globally broadcast them to the corresponding mapper. For each map task, map function receives the encoded string array of a concept hierarchy tree as a global parameter, encodes each object in a data split, and writes them into local files on each node. These files consist of two parts: the encoded string of the conditional attributes(for brevity, ConStr) and the counterpart of the decision attribute (DecStr). Here we design a Parallel Computation Algorithm for Hierarchical Encoded Decision Table(PCAHEDT) using MapReduce. The pseudocode of the map function of Algorithm PCAHEDT is shown in Algorithm 1. Algorithm 1: PCAHEDT-Map(key, value) //Map phase of transforming the objects into the encoded string using concept hierarchy tree Input: The encoded string array of concept hierarchy tree, ESA; a data split, Si Output:< ConStr, DecStr >, where ConStr is an encoded conditional string and DecStr is an encoded decision string Begin 19

533

ConStr =“”; for each x ∈ S i do {for any attribute a ∈ C do { Acquire the encoded string of object x on attribute a from ESA, es(x, a) ; Step 5. ConStr = ConStr + es(x,a) + “ ”;} Step 6. Acquire the encoded string of object x on attribute d from ESA, es(x, d), and assign es(x, d) to DecStr; Step 7. Emit < ConStr, DecStr > }.

534

End.

535

By Algorithm 1, we can acquire the encoded objects for each data split and store them into the local files. The MapReduce framework copies all the encoded objects from the Mapper nodes to the corresponding Reducer nodes and sort these objects. We can store the objects with the same encoded string of the conditional attributes by the following reduce function. The pseudocode of the reduce function of Algorithm PCAHEDT is outlined in Algorithm 2.

525 526 527 528 529 530 531 532

536 537 538 539 540

541 542 543 544 545 546

Step Step Step Step

1. 2. 3. 4.

Algorithm 2: PCAHEDT-Reduce(key,V) //Reduce phase of computing hierarchical encoded decision table Input: an encoded conditional string, key; the list of the decision values, V Output:< key, DecStr >, where key is an encoded conditional string ConStr and DecStr is an encoded decision string Begin

548

Step 1. for each DecStr ∈ V do Step 2. { Emit < key, DecStr >. }

549

End.

550

By Algorithm 2, we can compute the encoded decision table. Note that if some objects with the same ConStr have different DecStrs, we still emit each object with the different DecStr, instead of only one object with the DecStr ‘∗ ∗  . . . ∗ ’,

547

551 552

l(d) 553 554

555 556 557 558

559 560 561

since such objects may be used for the different hierarchical decision tables. All the encoded objects form a hierarchical encoded decision table. Parallel Computation Algorithm for the Hierarchical Encoded Decision Table (PCAHEDT) requires one kind of MapReduce job for executing the map and reduce function so that we can transform an original decision table into a hierarchical encoded decision table as outlined in Algorithm 3. Algorithm 3: Parallel computation algorithm for the hierarchical encoded decision table (PCAHEDT) Input: A decision table, S; the encoded string array of the concept hierarchy 20

562 563 564

tree, ESA Output: A hierarchical encoded decision table, HEDT Begin

568

Step 1. Compute the encoded objects by executing PCAHEDT-Map function of Algorithm 1; Step 2. Acquire the hierarchical decision table by executing PCAHEDTReduce function of Algorithm 2.

569

End

570

572

By Algorithm 3, we acquire a hierarchical encoded decision table. Thus, we can construct the hierarchical attribute reduction algorithms for a hierarchical encoded decision table under different levels of granularity.

573

3.4 Hierarchical attribute reduction algorithms using MapReduce

565 566 567

571

574 575 576 577 578 579 580 581

582 583

584 585 586 587 588 589 590 591

592 593 594 595 596

In this section, we mainly discuss the parallel computations of the hierarchical attribute reduction algorithms for big data using MapReduce, which use a hierarchical encoded decision table instead of the original decision table. As we all know, it is obvious that the computations of the equivalence classes are independent of each other. Thus, different computations of equivalence classes from a subset of attributes can be executed in parallel. In what follows, we first illustrate how to compute the equivalence classes and attribute significance of different candidate attributes in parallel.

3.4.1 Parallel computations of the equivalence classes for candidate attribute subset As discussed in [27], three classical algorithms often iteratively compute the equivalence classes from different candidate attribute set, so we use a candidate subset of the attributes as a global parameter in Map function. The pseudocode of Map function is illustrated in Algorithm 4, where es(x, a) denotes the encoded string on attribute a for object x, substr(1, lna ) function acquires the substring of the length lna from the beginning, c EquivalenceClass is an equivalence class with a flag of attribute c, and < DecStr, 1 > is a pair where the DecStr is the encoded decision string. Algorithm 4:Map(key, value) //Map phase of computing equivalence classes Input: Selected attributes, A; any candidate attribute, c ∈ C − A; a data split, S i ; different level number of all the attributes, lna (a ∈ C ∪ {d}) Output:< c EquivalenceClass, < DecStr, 1 >>, 21

597

Begin

598

Step Step Step Step Step Step

606

EquivalenceClass=“”, DecStr=“”; for each x ∈ S i do {for any attribute a ∈ A do {EquivalenceClass=EquivalenceClass + es(x, a).substr(1,lna ) + “ ”;} for any attribute c ∈ C − A do { c EquivalenceClass =c + “ ” + es(x, c).substr(1,lnc )+ “ ” + EquivalenceClass; Step 7. DecStr =es(x,d).substr(1,lnd ); Step 8. Emit < c EquivalenceClass, < DecStr, 1 >>.}}

607

End

608 609

By Algorithm 4, we can compute the equivalence classes from different candidate attribute subset in task parallel for each data split.

610

3.4.2 Parallel computations of attribute significance for the equivalence classes

599 600 601 602 603 604 605

611 612 613 614 615 616 617

1. 2. 3. 4. 5. 6.

The computation starts with a map phase in which the map function is applied in parallel on different partitions of the input dataset. The MapReduce framework will shuffle and group the same equivalence classes into a node, that is, all the pair values that share a certain key are passed to a single reduce function. Thus the reduce function can compute the attribute significance among the same equivalence classes in parallel and store < c, c AttrSig > into a file in the HDFS. We can define the different attribute significance as follows. Definition 12 [27] For a hierarchical encoded decision table S, let A ⊆ C and c ∈ C − A, then the significance of attribute c is defined by: sigP OS (c, A, D) =

|P OS(D|[A ∪ {c}])| − |P OS(D|A)| |U|

|BND(D|A)| − |BND(D|[A ∪ {c}])| |U| sigInf o (c, A, D) = Inf o(D|A) − Inf o(D|[A ∪ {c}])

sigBN D (c, A, D) =

sigDIS  (c, A, D) =

 − DIS(D|[A ∪ {c}]) DIS(D|A)  ni nj

(9) (10) (11) (12)

1≤i
621 622

 where sig (c, A, D)( ={POS,BND,DIS,Info}) denotes the attribute significance in positive region, boundary region, discernibility matrix and information entropy algorithms, respectively. Note that although sigP OS (c, A, D) and sigBN D (c, A, D) here are identical, but they have different performances for big data with high dimensions in cloud 22

623 624 625

626 627 628 629 630 631 632 633

computing (See Fig. 5). Therefore, we only give the pseudocode of reduce function for three different algorithms in view of boundary region as follows (Algorithm 5). Algorithm 5:Reduce(key,V) //Reduce phase of computing the attribute significance of single attribute among the same equivalence classes Input: An equivalence class, c EquivalenceClass; the list of pairs < DecStr, F requencies > ,V Output: < c, c AttrSig >, where c is an attribute name and c AttrSig is the value of attribute significance Begin

640

Step 1. c AttrSig = 0; Step 2. for each v ∈ V do { Step 3. Compute the frequencies (n1 ,n2 , . . .,nk ) of the different encoded decision string;} Step 4. Compute c AttrSig according to Definition 12. Step 5. if all the decision values are different, then Step 6. {Emit < c, c AttrSig >. }

641

End

634 635 636 637 638 639

650

For boundary region algorithm, when computing sigBN D (c, A, D) for c ∈ C −A, we only calculate the number of boundary region objects, |BND(D|[A∪ {c}])|, in reduce phase. For information entropy algorithm, when computing sigInf o (c, A, D) for c ∈ C − A, one can check that the value of Inf o(D|A) is the same in this iteration, thus we only calculate the information entropy Inf o(D|[A ∪ {c}]). For discernibility matrix algorithm, when computing  sigDIS  (c, A, D) for c ∈ C −A, one can check that DIS(D|A) is the same in this iteration, thus we only calculate the number of indiscernibility object pairs, ∪ {c}]), in reduce phase. DIS(D|[A

651

3.4.3 Parallel hierarchical attribute reduction algorithms using MapReduce

642 643 644 645 646 647 648 649

652 653 654 655 656 657 658 659 660

Parallel hierarchical attribute reduction algorithms based on boundary region(positive region), discernibility matrix and information entropy requires one kind of MapReduce job. The MapReduce driver program reads the attribute significance values of different candidate attribute subset from the output files in reduce phase, sums up the value of each candidate attribute, and determines some attribute to be added into a reduct. Thus, we can get the newly candidate attributes which are used for the next iteration. This procedure must be executed serially in each iteration. A general parallel hierarchical attribute reduction algorithm can be summarized in Algorithm 6. 23

661 662 663 664 665

666 667 668 669

Algorithm 6: Parallel hierarchical attribute reduction algorithms using MapReduce Input: a hierarchical encoded decision table, HEDT Output: a reduct, Red Begin Step Step Step Step

1. 2. 3. 4.

670 671 672

Step 5. Step 6.

673 674 675 676

Step 7. Step 8.

677 678

Step 9.

679 680

Step 10.

Let Red = ∅; If (D|Red) = (D|C) , then turn to Step 8; For each attribute c ∈ C − Red do {Initate a MapReduce job and execute map function of Algorithm 4 and reduce function of Algorithm 5; Read the output files and acquire sig (c, Red, D)};   Choose an attribute c with sig (c , Red, D) = best(sig (c, Red, D)) from C − Red (if the attribute like that is not only one, select one attribute arbitrarily );  Red = Red ∪{c }, turn to Step 2; If (D|[Red − {c}]) = (D|Red) for every c ∈ Red, then go to Step 10.   If there exists an attribute c ∈ Red such that (D|[Red − {c }]) =  (D|Red), then Red = Red - {c } and go to Step 8. Output Red.

681

End

682

By Algorithm 6, we can acquire each reduct for different hierarchical attribute algorithms based on boundary region, discernibility matrix and information entropy. It can be known from Steps 2–7 that (D|Red) = (D|C). On the other hand, according to Steps 8 and 9 of Algorithm 6, we have (D|[Red − {c}]) = (D|Red) for every c ∈ Red. By Definition 5, Algorithm 6 is complete, that is, the output set Red is a reduct with certainty [19].

683 684 685 686 687

688 689 690 691

692 693 694 695 696 697 698 699

Example 3: (Continued from Example 1) Different hierarchical decision tables under different levels of granularity are shown in Fig. 2. We illustrate some reducts under different levels of granularity for conditional attributes and decision attribute in Fig. 4. (1) For S2222 , we first get a reduct {A2 , EL2 }. In order to mine the hierarchical decision rules, we begin to ascend the levels of conditional attributes A2 or/and EL2 . In the end, we can acquire an attribute-generalization reduct {A2 , EL1 } as shown in Fig. 4(a). Thus, S2222 can be generalized into S21∗2 . (2) For S2221 , we first get a reduct {A2 , EL2 } as well. Then, we ascend the levels of conditional attributes A2 or/and EL2 . Finally, we can acquire an attribute-generalization reduct {A1 , EL1 } as shown in Fig. 4(b). Thus, S2221 can be generalized into S11∗1 . 24

S1,1,*,1 { A1 , EL1}

S1,2,*,2

S 2,1,*,2

S1,2,*,1

S 2,1,*,1

{ A1 , EL2 }

{ A2 , EL1}

{ A1 , EL2 }

{ A2 , EL1}

u S2,2,2,2

S 2,2,2,1

{ A2 , EL2 } (a) attribute-generalization reduction for

{ A2 , EL2 } (b) attribute-generalization reduction for S 2,2,2,1

S2,2,2,2

Fig. 4. An illustration of attribute-generalization reduction process 700

701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717

3.5 Time complexity analysis of parallel algorithms

To demonstrate the efficiency of our proposed parallel attribute reduction algorithms, i.e. Algorithms 1–6, we compare these parallel algorithms which are implemented on multiple nodes using MapReduce with their corresponding serial algorithms implemented on single node. Given the same decision table, Table 5 lists the computational complexities of Algorithms 1–6 and their serialversion algorithms. In Table 5, m denotes the number of conditional attributes, n is the total number of objects and N is the number of slave nodes. To simplify our discussion, we assume the number of the objects in each data split is Nn . k  is the number of decision values, n is the number of the objects in the maximal equivalence class. l(ct ), lnct (t=1, 2, . . ., m), l(d) and lnd denote the concept hierarchy depths and the hierarchical encoded string of conditional attribute ct and decision attribute d respectively. From Table 5, we can find that the time complexities of the parallel attributes reduction algorithms are far less than the traditional serial algorithms. Under the parallel strategy, the computation task of attribute reduction is divided and distributed into N nodes and thus the computational complexity is greatly reduced N times. Generally speaking, the more slave nodes we have, the faster attribute reduction processing can achieve. Table 5 Time complexity analysis of parallel algorithms Algorithm Nos. Algorithm 1

Serial  O(n( m t=1 l(ct ) + l(d)))

Parallel

m

O(

n(

t=1

l(ct )+l(d)) ) N 

max(O(n ), O( Nn ))

Algorithm 4

O(n)  O(n( m t=1 l(ct ) + l(d))) m O(n( t=1 lnct + lnd ))

Algorithm 5

O(kn)

max(O(kn ),O( kn N ))

Algorithm 2 Algorithm 3

Algorithm 6

m



max(O(n ), O( m

O(

n(

t=1

n(

t=1

l(ct )+l(d)) )) N

lnct +lnd ) ) N 

 max(O(m2 n( m t=1 lnct + lnd )), 2 O(m kn))

718

25

m

m2 n(

lnc +lnd )

t t=1 max(O( N 2  O(m2 kn ), O( mNkn ))

),

719

4

720

726

This section presents experimental results of running four hierarchical attribute reduction algorithms based on positive region , boundary region, discernibility matrix and information entropy in data and task parallel(HARAPOS, HARABND, HARADIS and HARAInfo). We are interested in the performances such as speedup and scaleup [37] of parallel attribute reduction algorithms. We do not consider the relative performance of all these algorithms since the reducts may be different.

727

4.1 Experiment setup

721 722 723 724 725

728 729 730 731 732 733 734 735 736

737 738 739 740 741

Experimental evaluation

We run four parallel algorithms on a cluster of 17 nodes. For distributed experiments, one is set as a master node and the rest are configured as slave nodes. Each node has 2GB of main memory and use Intel Pentium Processer with Dual-Core(2 cores in all, each has a clock frequency of 2.6GHz), and connects via an Ethernet (100Mbit/sec). For each node, we install Cygwin 2.697(a Linux-like environment in Windows), Hadoop 0.20.2 and Java 1.6.20. We make the following changes to the default Hadoop configuration: we run two map and two reduce tasks in parallel on each node, and set the replication factor to 3. We conduct an extensive series of experiments on one commonly used machine learning data sets Mushroom from the UCI Machine Learning repository[9] and three synthetic big data(DS1, DS3, DS4). Each dataset has only one decision attribute. We duplicate the dataset Mushroom to generate a big data(DS2). Table 6 summarizes the characteristics of each dataset. Table 6 Description of the datasets

742

743 744 745

Datasets

objects

attributes

Classes

Memo

Size(GB)

DS1

10,000

5,000

2

single level

1.3

DS2

40,620,000

22

2

single level

2.2

DS3

40,000,000

30

5*5*5=125

multiple levels

2.6

DS4

40,000,000

50

9*9*9=729

multiple levels

4.3

4.2 The running time on different datasets

For big data, we partition it as many splits in data parallel using MapReduce, and deal with each data split in task parallel. Fig. 5 shows the running time and ratio of selecting some most important attributes in each iteration for 26

746 747 748 749 750 751 752

HARAPOS and HARABND. From Fig. 5, one can check that the running time of HARAPOS is longer than that of HARABND, moreover the ratio of HARAPOS/HARABND is higher when the number of selected attributes increases for big data with high dimensions. Note that the first value in each subgraph denotes the running time of computing the classification capability for the original dataset. Hence we focus on the performances of HARABND, HARADIS and HARAInfo on DS2, DS3 and DS4. 4

2

(a) Running time on DS1 and DS2

x 10

(b) The ratio of HARAPOS and HARABND 14

HARAPOS on DS1 HARABND on DS1 HARAPOS on DS2 HARABND on DS2

1.8 1.6

Ratio on DS1 Ratio on DS2 12

10

1.2

8

Ratio

Running time[s]

1.4

1

6

0.8 0.6

4

0.4 2 0.2 0

0

1

2 3 4 Number of selected attributes

5

0

6

0

1

2 3 4 Number of selected attributes

5

6

Fig. 5. The comparisons of HARAPOS and HARABND on two datasets 753 754 755 756 757 758

Fig. 6 shows the running time of selecting some most important attributes in each iteration and the total running time for three datasets. In Fig. 6, we can check that three parallel algorithms exhibit a similar pattern of increase in the running time, since they compute the attribute significance from the boundary regions. Note that the running time of HARADIS is longer in that it uses BigInteger type in JAVA language as shown in Fig. 6(d).

766

For two datasets with multiple levels, we compute the running time of different hierarchical decision tables with cidj(cidj denotes the i th level of all conditional attributes and the j th level of decision attribute, respectively) as shown in Fig. 7. From Fig. 7, the running time of the decision table at higher levels is much longer than that at lower levels because a parallel hierarchical attribute reduction algorithm must select more attributes at higher level. However, the running times are almost long for decision attribute ascension because the sizes of the reducts of different parallel algorithms are the same.

767

4.3 The performance evaluations on different datasets

759 760 761 762 763 764 765

768 769

In what follows, we examine the speedup and scaleup of our proposed algorithms in data and task parallel. 27

(a) DS2

(b) DS3−2

800 600

HARABND HARADIS HARAInfo

400 0

1 2 3 4 Number of selected attributes

6000 4000 2000 0

5

4

Total running time[s]

Running time[s]

1000

5

8000 HARABND HARADIS HARAInfo

8000 Running time[s]

Running time[s]

1200

200

(c) DS4−2

10000

1400

0

1

3 5 7 9 Number of selected attributes

HARABND HARADIS HARAInfo

6000

4000

2000 0

11

0

2 4 6 Number of selected attributes

8

(d) Total running time for hierarchical attribute reduction

x 10

HARABND HARADIS HARAInfo

4 3 2 1 0

DS2

DS3−2

DS4−2

Fig. 6. The running times of three parallel algorithms

x 10

4

(a) condition level ascension 10 c3d3 c2d3 c1d3

9 8

(b) decision level ascension 10 c3d3 c3d2 c3d1

9 8

6 5 4

7

6 5 4

6 5 4

3

3

2

2

2

1

1

1

1

2

0

c3d3 c2d2 c1d1

8

3

0

4 x 10 (c) condition/decision level ascension

9

7 Running time[s]

7 Running time[s]

x 10

Running time[s]

4

10

1

2

0

1

2

Fig. 7. The running times on two datasets under different levels of granularity

770

771 772 773 774 775 776 777 778 779

4.3.1 Speedup In order to measure the speedup, we keep the dataset constant and increase the number of computers in the system. The perfect parallel algorithm demonstrates linear speedup: a system with m times the number of computers yields a speedup of m. However, linear speedup is difficult to achieve because of serial computing, the communication costs, the faults and the overheads in job scheduling, monitoring and control. We evaluate the speedup on datasets with different nodes. The number of nodes varied from 1 to 16. Fig. 8 shows the speedup of three parallel hierarchical attribute reduction algorithms for DS2, DS3-2 and DS4-2. 28

(a) DS2

(b) DS3−2

16 Linear HARABND HARADIS HARAInfo

16 Linear HARABND HARADIS HARAInfo

8

4

8

4

1

4

8 Number of nodes

1

16

Linear HARABND HARADIS HARAInfo

12 Speedup

12 Speedup

Speedup

12

1

(c) DS4−2

16

8

4

1

4

8 Number of nodes

1

16

1

4

8 Number of nodes

16

Fig. 8. The speedup of different parallel algorithms 780

781 782 783 784 785 786

4.3.2 Scaleup Scaleup is defined as the ability of a m-times larger cluster to perform a mtimes larger job in the same run-time as the original system. To demonstrate how well the three parallel algorithms handle larger datasets when more slave nodes are available, we have conducted the scaleup experiments where we make the size of the datasets grow in proportion to the number of salve nodes in the common computer cluster. Fig. 9 shows the scaleup of three parallel hierarchical attribute reduction algorithms. (b) DS3−2

(a) DS2 HARABND HARADIS HARAInfo

0.95

1 HARABND HARADIS HARAInfo

0.95

0.9 Scaleup

Scaleup

0.85

0.85

0.85

0.8

0.8

0.8

0.75

0.75

0.75

0.7

8

16 Number of cores

32

0.7

HARABND HARADIS HARAInfo

0.95

0.9

0.9 Scaleup

(c) DS4−2

1

1

8

16 Number of cores

32

0.7

8

16 Number of cores

32

Fig. 9. The scaleup of different parallel algorithms 787

788

5

789

In this paper, we first introduce granular computing for concept ascension, then define the hierarchical encoded decision table and discuss some the cor-

790

Conclusions

29

791 792 793 794 795 796

responding properties, and finally propose hierarchical attribute reduction algorithms in data and task parallel for big data using MapReduce, which are based on boundary region, discernibility matrix and information entropy. The experimental results demonstrate that the hierarchical attribute reduction algorithms using MapReduce can scale well and efficiently process big data on commodity computers in cloud computing.

798

Furthermore, the parallelization of other attribute reduction algorithms and the extended rough set models should be considered in the future.

799

Acknowledgments

800

809

The research is supported by the National Natural Science Foundation of China under Grant Nos: 61103067, 61305052, the Natural Science Foundation of Jiangsu Province under Grant No.BK20141152, the Key Laboratory of Cloud Computing and Intelligent Information Processing of Changzhou City under Grant No: CM20123004, Qing Lan Project of Jiangsu Province of China, the Natural Science Foundation of Universities of Jiangsu Province under Grant No. 13KJB520005, the Key Laboratory of Embedded System and Service Computing, Ministry of Education under Grant No: ESSCKF201303, the Natural Science Foundation and Doctoral Research Foundation of Jiangsu University of Technology under Grant Nos: kyy12018, kyy13003.

810

References

811

[1] V. Abhishek, L. Xavier, E. David, H. Roy, Scaling genetic algorithms using MapReduce, In: Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications, IEEE, 2009, pp.13-18.

797

801 802 803 804 805 806 807 808

812 813 814 815 816 817 818 819 820 821 822 823 824 825

[2] A. Bargiela, W. Pedrycz, Toward a theory of granular computing for human centered information processing, IEEE Transactions on Fuzzy Systems, 16(2) (2008) 320-330. [3] C.T. Chu, S. Kim,Y.A. Lin, Y.Y. Yu, G. Bradskl, A. Y. Ng, K. Olukotun, MapReduce for machine learning on multicore, In: Proceedings of the 20th Conference on Advances in Neural Information Processing Systems(NIPS2006),Vol.6, 2006, pp.281-288. [4] J.H. Dai, W.T. Wang, H.W. Tian, L. Liu, Attribute selection based on a new conditional entropy for incomplete decision systems, Knowledge-Based System 39 (2013) 207-213. [5] J. Dean, S. Ghemawat, MapReduce:Simplified data processing on large clusters, Communications of the ACM 51(1) (2008) 107-114.

30

826 827 828 829

830 831

832 833

834 835 836

837 838

839 840

841 842 843

844 845

846 847 848

849 850 851

852 853 854

855 856 857

858 859 860

861 862 863

[6] D.Y. Deng, D.X. Yan, J.Y. Wang, Parallel reducts based on attribute significance, In: J. Yu, S. Greco, P. Lingras, et al.(Eds.): Rough Set and Knowldge Technology, Lecture Notes in Computer Science, vol. 6401, Springer, Berlin/Heidelberg, 2010, pp.336-343. [7] I. Duntsch, G. Gediga. Simple data filtering in rough set systems. International Journal of Approximate Reasoning , 18(1) (1998) 93-106. [8] Q.R. Feng, D.Q. Miao, Y. Cheng, Hierarchical decision rules mining, Expert Systems with Applications 37(3)(2010) 2081-2091. [9] A. Frank, A. Asuncion. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. [10] S. Ghemawat, H. Gobioff, S.T. Leung, The Google file system, SIGOPS Operating Systems Review, 37(5) (2003) 29-43. [11] I. Guyan, A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research 3 (2003) 1157-1182. [12] J. Han, Y. Cai, N. Cercone, Data-driven discovery of quantitative rules in relational databases, IEEE Transaction on Knowledge and Data Engineering, 5(1)(1993) 29-40. [13] J. Han, Y. Fu, Mining multiple-lvel association rules in large database, IEEE Transaction on Knowledge and Data Engineering, 11(5)(1999) 798-805. [14] L.X. Han, C.S. Liew, J.V. Hemert, M. Atkinson, A generic parallel processing model for facilitating data mining and integration, Parallel Computing 37(2011) 157-171. [15] T.P. Hong, C.E. Lin, J.H. Lin, S.L. Wang, Learning cross-level certain and possible rules by rough sets, Expert Systems with Applications, 34(3)(2008) 1698-1706. [16] X.H. Hu, N. Cercone, Discovering maximal generalized decision rules through horizontal and vertical data reduction, Computational Intelligence, 17(4)(2001) 685-702. [17] Q.H. Hu, W. Pedrycz, D.R. Yu, J. Lang, Selecting discrete and continuous features based on neighborhood decision error minimization, IEEE Transactions on Systems, man, and Cybernetics-Part B: Cybernetics 40(1) (2010) 137-150. [18] J.Y. Liang, F. Wang, C.Y. Dang, Y.H. Qian, An efficient rough feature selection algorithm with a multi-granulation view, International Journal of Approximate Reasoning, 53(6) (2012) 912-926 [19] J.H. Li, C.L. Mei, Y.J. Lv, A heuristic knowledge-reduction method for decision formal contexts, Computers and Mathematics with Applications, 61(2011) 10961106.

31

864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902

[20] C.H. Liu, D.Q. Miao, N. Zhang, Graded rough set model based on two universes and its properties, Knowledge-Based Systems, 33(2012) 65-72. [21] Y.J. Lu, Concept hierarchy in data mining: specification, generation and implementation, Master Degree Dissertation, Simon Fraser University, Cnanda, 1997. [22] D.Q. Miao, G. Y. Wang, Q. Liu, et. al., Granular Computing: Past, Nowday and Future. Beijing: Science Publisher, 2007. [23] D.Q. Miao, Y. Zhao, Y.Y. Yao, F.F. Xu, H.X. Li, Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model, Information Sciences 179(24) (2009) 4140-4150. [24] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11(5) (1982) 341-356. [25] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Boston, 1991. [26] Y.H. Qian, J.Y. Liang, W. Pedrycz, C.Y. Dang. Positive approximation: An accelerator for attribute reduction in rough set theory, Artificial Intelligence 174(9) (2010) 597-618. [27] J. Qian, D.Q. Miao, Z.H. Zhang, W. Li, Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation, International Journal of Approximate Reasoning 52(2) (2011) 212-230. [28] J. Qian, D.Q. Miao, Z.H. Zhang, Knowledge reduction algorithms in cloud computing, Chinese Journal of Computers, 34(12) (2011) 2332-2343. in Chinese. [29] A. Skowron, C. Rauszer, The discernibility matrices and functions in information systems, In: R. Slowi´ nski (Eds.), Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, Dordrecht, 1992. [30] A. Srinivasan, T. A. Faruquie, Sachindra Joshi, Data and task parallelism in ILP using MapReduce, Machine Learning 86(1) (2012), 141-168. [31] R. Susmaga, Tree-like parallelization of reduct and construct computation, In: S. Tsumoto et al.(Eds.): RSCTC 2004, LNAI 3066, pp. Springer Berlin Heidelberg, 2004, pp.455-464. [32] G.Y. Wang, H. Yu, D.C. Yang, Decision table reduction based on conditional information entropy, Chinese Journal of Computers 25(7) (2002) 760-766. in Chinese. [33] L.H. Wang, G.F. Wu, Attribute reduction based on parallel symbiotic evolution, Chinese Journal of Computers, 26(5) (2003) 630-635. in Chinese. [34] X.Z. Wang, T.T. Wang, J.H. Zhai, An attribute reduction algorithm based on instantce selection. Journal of Computer Research and Development, 49(11) (2012) 2305-2310. in Chinese.

32

903 904

905 906

907 908

909 910 911

912 913 914 915

916 917 918

919 920 921 922

923 924

925 926

927 928 929 930

931 932 933

934 935

936 937 938

939 940 941

[35] F. Wang, J.Y. Liang, Y.H. Qian, Attribute reduction: A dimension incremental strategy, Knowledge-Based System 39 (2013) 95-108. [36] W.Z. Wu, Y. Leung, Theory and applications of granular labelled partitions in multi-scale decision tables. Information Sciences, 181(18) (2011)3878-3897. [37] X. Xu, J. Jager, H.P. Kriegel, A fast parallel clustering algorithm for large spatial databases, Data Mining and Knowledge Discovery 3 (1999) 263-290. [38] Z.Y. Xu, Z.P. Liu, B.R Yang, et.al., A quick attribute reduction algorithm with complexity of max(O(|C||U |), O(|C|2 |U/C|)), Chinese Journal of Computers 29(3) (2006) 611-615. in Chinese. [39] Y. Yang, Z. Chen, Z. Liang, G. Wang, Attribute reduction for massive data based on rough set theory and MapReduce, In: J. Yu, S. Greco, P. Lingras, et al.(Eds.): Rough Set and Knowldge Technology, Lecture Notes in Computer Science, vol. 6401, Springer, Berlin/Heidelberg, 2010, pp.672-678. [40] X.B. Yang, M. Zhang, H.L. Dou, J.Y. Yang. Neighborhood systems-based rough sets in incomplete information system, Knowledge-Based Systems, 24(6)(2011), 858-867. [41] Y.Y. Yao, Stratified rough sets and granular computing, In: Dave, R.N. and Sudkamp. T. (Eds.): Proceedings of the 18th International Conference of the North American Fuzzy Information Processing Society, New York, USA, IEEE Press, 1999, pp.800-804. [42] Y.Y. Yao, Y. Zhao, Discernibility matrix simplification for constructing attribute reducts, Information Sciences 7 (2009) 867-882. [43] M.Q. Ye, X.D. Wu, X.G. Hu, D.H. Hu. Knowledge reduction for decision tables with attribute value taxonomies, Knowledge-Based Systems, 56(2014):68-78. [44] J.P. Yuan, D.H. Zhu, A hierarchical reduction algorithm for concept hierarchy, In: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA’06), IEEE Computer Society, Vol.1, 2006, pp.724-729. [45] L. A. Zadeh, Fuzzy sets and information granularity, In: M. Gupta, R. Ragade and R. Yager (eds.), Advances in Fuzzy Set Theory and applications, Amsterdam, North-Holland Publishing (1979), pp. 3-18. [46] J.B. Zhang, T.R. Li, D. Ruan, et.al., A parallel method for computing rough set approximations, Information Sciences 194 (2012) 209-223. [47] W.Z. Zhao, H.F. Ma, Q. He, Parallel K-Means clustering based on MapReduce, In: M.G. Jaatun, G. Zhao, and C. Rong (Eds.): Cloud Computing(CloudCom2009), Springer Berlin Heidelberg, 2009, pp. 674-679. [48] X. Zhang, C.L. Mei, D.G. Chen, J.H. Li, Multi-confidence rule acquisition oriented attribute reduction of covering decision systems via combinatorial optimization, Knowledge-Based System 50 (2013) 187-197.

33

942 943 944 945 946

[49] W. Ziarko, Acquisition of hierarchy-structured probabilistic decision tables and rules from data, Expert Systems, 20(5)(2003) 305-310. [50] D. Zinn, S. Bowers, S. K¨ ohler, B. Lud¨ ascher. Parallelizing XML data-streaming workflows via MapReduce, Journal of Computer and System Sceiences, 76(6) (2010) 447-463.

34