Accepted Manuscript Hierarchical attribute reduction algorithms for big data using MapReduce Jin Qian, Ping Lv, Xiaodong Yue, Caihui Liu, Zhengjun Jing PII: DOI: Reference:
S0950-7051(14)00331-1 http://dx.doi.org/10.1016/j.knosys.2014.09.001 KNOSYS 2948
To appear in:
Knowledge-Based Systems
Received Date: Revised Date: Accepted Date:
10 November 2013 21 August 2014 7 September 2014
Please cite this article as: J. Qian, P. Lv, X. Yue, C. Liu, Z. Jing, Hierarchical attribute reduction algorithms for big data using MapReduce, Knowledge-Based Systems (2014), doi: http://dx.doi.org/10.1016/j.knosys.2014.09.001
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1 2
3
Hierarchical attribute reduction algorithms for big data using MapReduce
Jin Qian
a,b,∗
4
5 6 7 8 9 10 11
a Key
Laboratory of Cloud Computing and Intelligent Information Processing of Changzhou City, Jiangsu University of Technology, Changzhou, 213001, China b School
of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, China
c School
of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
d Department
12
13
14 15 16 17 18 19 20 21 22 23 24
25 26
, Ping Lv a,b , Xiaodong Yue c , Caihui Liu d Zhengjun Jing a,b
of Mathematics and Computer Science, Gannan Normal University, Ganzhou, 341000, China
Abstract Attribute reduction is one of the important research issues in rough set theory. Most existing attribute reduction algorithms are now faced with two challenging problems. On one hand, they have seldom taken granular computing into consideration. On the other hand, they still cannot deal with big data. To address these issues, the hierarchical encoded decision table is first defined. The relationships of hierarchical decision tables are then discussed under different levels of granularity. The parallel computations of the equivalence classes and the attribute significance are further designed for attribute reduction. Finally, hierarchical attribute reduction algorithms are proposed in data and task parallel using MapReduce. Experimental results demonstrate that the proposed algorithms can scale well and efficiently process big data. Key words: Hierarchical attribute reduction; Granular computing; Data and task parallelism; MapReduce; Big data
∗ Corresponding author. Tel.:+86-519-86953252 Email addresses:
[email protected] (Jin Qian ),
[email protected] (Ping Lv),
[email protected] (Xiaodong Yue), liu
[email protected] (Caihui Liu),
[email protected] (Zhengjun Jing).
Preprint submitted to
11 September 2014
27
1
28
With an increasing amount of scientific and industrial datasets, mining the useful information from big data is growing today for business intelligence. The classical data mining algorithms are becoming more challenging from both data and computational intensive perspectives [14]. As indicated in [30], datasets sizes are too massive to fit the main memory and the search space sizes are too large to explore using a single machine. As we all know, not all these attributes are necessary or sufficient for decision making. Irrelevant or redundant attributes not only increase the size of the search space, but also make generalization more difficult [11]. Hence attribute reduction, also called feature selection, is often carried out as a preprocessing step in knowledge acquisition to find a minimum subset of attributes that provides the same descriptive or classification ability as the whole attributes. Using attribute reduction method in rough set theory [24], we can acquire different reducts which can be used to induce different concise sets of rules. It has been successfully applied in many fields such as machine learning, data analysis and data mining. A major challenge is how to speed up the attribute reduction process. Many existing algorithms [4,17,23,25,26,27,29,32,38,40,42] have been developed and cut the time complexity into max(O(|C||U|),O(|C|2|U/C|)) for small datasets [26,27,38]. Unfortunately, these existing attribute reduction algorithms have seldom taken granular computing into consideration and difficulty in discovering the general valuable decision rules with higher supports at low or primitive levels. Moreover, these algorithms cannot deal with big data.
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
Introduction
Granular computing [45,2,22,41] offers a unified framework for problem solving under different levels of granularity, which can seek for a different hierarchical view of a complex structured problem flexibly by granules. A structured problem may comprise a web of interacting and interrelated granules. For effective problem solving, we must provide a new insight at a certain level of granulation. Since deriving the rules from higher concept level may lead to acquire more general and important knowledge, some scholars [15,16,49,8,36] began to mine hierarchical decision rules from different levels of abstraction. Hong et al. [15] represented hierarchical attribute values by hierarchical trees to construct a new learning method for deriving cross-level certain and possible rules from the training data. Ziarko [49] presented an approach to forming a linear hierarchy of decision tables using the decomposition of the boundary region and the variable precision rough set model. Feng and Miao [8] provided an approach to mining hierarchical decision rules from different levels by combing the hierarchical structure of multidimensional data model with the techniques of rough set theory. Wu and Leung [36] introduced multi-scale information table from the perspective of granular computation and mined hierarchical decision rules from multi-scale decision tables under different levels of granularity. Ye et al. [43] extended conditional entropy under single-level granulation 2
69 70 71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90 91 92 93 94
95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110
to hierarchical conditional entropy under multi-level granulation and studied attribute-generalization reduct by coarsening and refining attribute values. Yuan et al. [44] proposed a hierarchical reduction algorithm for concept hierarchy. Zhang et al. [48] presented an attribute reduction method to acquire multi-confidence rules from the covering decision systems. However, as the size of the dataset can be quite large, it is difficult or even impractical for such big data to be stored and dealt with on a single machine, which renders the existing serial methods unusable. Furthermore, these algorithms do not have the ability to switch among different levels of abstraction flexibility. Therefore, it is necessary to develop an effective and efficient approach to hierarchical attribute reduction for big data on different levels to accommodate different user requirements. For big data, it would appear that sampling technique can be applied. Wang et al. [34] selected the most informative instances from large datasets, constructed the corresponding discernibility matrix and acquire all reducts. However, sampling guarantees often only hold if the samples can represent all of the data or satisfy the hypothesis space. Parallel computing may be a good solution to attribute reduction [6,31,33,18]. Deng et al. [6], Susmaga et al. [31] and Wang et al. [33] computed a reduct or all reducts in parallel for small datasets. Liang et al. [18] considered the sub-tables within a large-scale dataset as small granularities. The reducts of small granularities can be computed separately and finally be fused together to generate the reduct of the whole data. However, these reducts are not guaranteed to be the same as those using the whole dataset since these subsystems (sub-tables) do not exchange the information for each other. Thus, these techniques cannot acquire an exact reduct for big data in most cases. Recently, a extremely simple parallel computation approach—MapReduce [5], has been applied in data-intensive and computation-intensive tasks. It is quite novel, since it interleaves parallel and sequential computation, automatically parallelizes the computation across large-scale clusters of machines, and hides many system-level details. Furthermore, MapReduce implementations offer their own distributed file systems that provide a scalable mechanism for storing massive datasets [10]. It has been applied in data mining [14,30,50] and machine learning [1,3,47]. In rough set theory, Zhang et al. [46] proposed a parallel method for computing equivalence classes to acquire lower and upper approximations using MapReduce. Yang et al. [39] computed the reduct redi for each sub-decision table Si , combined ∪redi , and generated the reduct Red by deleting redundant attributes using MapReduce. Qian et al. [28] proposed a parallel attribute reduction algorithm using MapReduce in cloud computing. To the best of our knowledge, relatively little work has been done for hierarchical attribute reduction for big data using MapReduce. In this paper, we investigate the following issue: How to use MapReduce programming model 3
111 112 113 114 115 116 117 118 119
to design a parallel hierarchical attribute reduction algorithm so that we can mine decision rules under different levels of granularity? We first discuss a hierarchical encoded decision table and some properties, and analyze parallel and serial operations among the classical attribute reduction algorithms, then design proper < key, value > pairs and implement map/reduce functions for computing the hierarchical encoded decision table, and finally propose hierarchical attribute reduction algorithms in data and task parallel using MapReduce by Hadoop which are applicable to big data. Experimental results demonstrate that our proposed algorithms can efficiently deal with big data.
125
The rest of this paper is organized as follows. Section 2 reviews necessary concepts in rough set theory and MapReduce framework. Section 3 proposes an approach to computing the hierarchical encoded decision table and hierarchical attribute reduction algorithms. In section 4, we discuss empirical evaluation of our proposed parallel algorithms using MapReduce. Finally, the paper work is concluded in Section 5.
126
2
127 128
In this section, we will review some notions of the Pawlak rough set model [24,25,27] and MapReduce programming model [5].
129
2.1 Rough set theory
120 121 122 123 124
130 131 132 133 134
135 136 137 138 139 140 141
142 143 144
Basic notions
In Pawlak’s rough set model, the indiscernibility relation and equivalence class are important concepts. The indiscernibility relation expresses the fact that due to lack of information (or knowledge), we are unable to discern some objects by employing available information. It determines a partition of U and is used to construct the equivalence classes. Let S = (U, At = C ∪ D, {Va |a ∈ At}, {Ia |a ∈ At}) be a decision table, where U is a finite non-empty set of objects, At is a finite nonempty set of attributes, C = {c1 , c2 , . . . , cm } is a set of conditional attributes describing the objects, and D is a set of decision attributes that indicates the classes of objects. Va is a nonempty set of values of a ∈ At, and Ia is an information function that maps an object in U to exactly one value in Va , Ia (x)=v means that the object x has the value v on attribute a. For simplicity, we assume D = {d} in this paper, where d is a decision attribute which describes the decision for each object. A table with multiple decision attributes can be easily transformed into a table with a single decision attribute 4
145
by coding the corresponding Cartesian product of attribute values. An indiscernibility relation with respect to A ⊆ C is defined as: IND(A) = {(x, y) ∈ U × U|∀a ∈ A, Ia (x) = Ia (y)}.
146 147
(1)
The partition generated by IND(A) is denoted as πA . [x]A denotes the block of the partition πA containing x ; moreover, [x]A = ∩a∈A [x]a .
148
149 150 151 152 153
154 155
Consider a partition πD = {D1 , D2 , . . . , Dk } of the universe U with respect to the decision attribute D and another partition πA ={A1 , A2 , . . . , Ar } defined by a set of conditional attributes A. The equivalence classes induced by the partition are the basic blocks to construct the Pawlak rough set approximations. Definition 1 For a decision class Di ∈ πD , the lower and upper approximations of Di with respect to a partition πA are defined as:
apr A (Di ) = {x ∈ U|[x]A ⊆ Di };
(2)
apr A (Di ) = {x ∈ U|[x]A ∩ Di = ∅}. 156 157
Definition 2 For a decision table S, the positive region and boundary region of a partition πD with respect to a partition πA are defined as:
P OS(D|A) = BND(D|A) =
1≤i≤k
apr A (Di );
(apr A (Di ) − apr A (Di )).
(3)
1≤i≤k 158 159 160 161
Definition 3 For a decision table S, let A ⊆ C and πA ={A1 , A2 , . . ., Ar }, D )) and the number the indiscernibility object pair set with respect to A (DOP A
of the corresponding pairs of objects that A can not discern(DIS(D|A)) are defined as
D = {< x, y > |x ∈ Di , y ∈ Dj , ∀a ∈ A, Ia (x) = Ia (y)} DOP A
162
where Di ∈ πD , Dj ∈ πD , 1 ≤ i < j ≤ k. 5
(4)
= DIS(D|A)
nip njp
(5)
1≤p≤r 1≤i
164
where njp (nip ) denotes the number of these objects equal to j(i) on d in an equivalence class Ap .
165
Definition 4 For a decision table S, information entropy of D is given by:
163
Inf o(D) = −
k nj j=1
166 167
r k njp np p=1
169
170 171 172 173 174 175 176 177 178 179
180 181 182 183
nj n
(6)
The entropy Inf o(D|A), conditional entropy of A conditioned on D, is given by
Inf o(D|A) = −
168
n
log2
njp log2 n j=1 np np
(7)
where np , njp and nj denote the number of the objects, the number of the objects equal to j on d in Ap , and the number of the objects equal to j on d, respectively. Definition 5 For a decision table S, an attribute set A ⊆ C is a reduct of C with respect to D if it satisfies the following two conditions: (1) (D|A) = (D|C); (2) for any attribute c ∈ A, (D|[A − {c}]) = (D|A). An attribute c ∈ C is a core attribute of C with respect to D if it satisfies the following condition: (3) (D|[C − {c}]) = (D|C); dewhere (.|.) denotes the classification ability, ={POS, BND, Info, DIS} notes the attribute reduction method based on positive region, boundary region, information entropy and discernibility matrix. Definition 6 [7] For a decision table S, an attribute set A ⊆ C, if A → D ⊆ U/A × U/D implies {Ai , Dj } ∈ A → D ⇐⇒ Ai ⊆ apr A (Dj ), then {Ai , Dj } ∈ A → D is called the rule of A as to D. The approximation quality γ(A → D) or a rule A → D is defined as: 6
γ(A → D) =
184
185 186 187 188
MapReudce [5], introduced by Google, is a programming model. The user expresses the computation by means of designing < key, value > pairs and implementing map and reduce function. Specially, the map and reduce functions are illustrated as follows: map :< K1 , V1 >−→ [< K2 , V2 >] reduce :< K2 , [V2 ] >−→ [< K3 , V3 >]
190
192
193 194 195 196 197 198 199 200
(8)
2.2 MapReduce programming model
189
191
|aprA (Dj ) : Dj ∈ πD | |U|
where all Ki and Vi (i=1,. . .,3) are user-defined data types, and the convention [. . .] is used throughout this paper to denote a list. The MapReduce programming model consists of three phases: map, shuffle and reduce. The map phase takes as a < K1 , V1 > pair and produces a set of intermediate < K2 , V2 > pairs which are stored into the local machine. In the shuffle phase, the MapReduce library groups together all intermediate values V2 associated with the same K2 and passes them to the same machine. The reduce phase accepts an K2 and a set of values for that key, merges together these values to form a possibly smaller set of values, and finally takes the < K3 , V3 > pairs as output.
205
Since MapReduce provides a robust and scalable framework for executing parallel computation, we mainly focus on designing data structures for < key, value > pairs as well as implementing the map and reduce functions for hierarchical encoded decision table and parallel hierarchical attribute reduction algorithms.
206
3
207
In this section, we first discuss the hierarchical encoded decision table, then design the parallel computation algorithms of transforming the original decision table into the hierarchical encoded decision table and hierarchical attribute reduction algorithms for big data, finally give the time complexity of these parallel algorithms.
201 202 203 204
208 209 210 211
Hierarchical attribute reduction algorithms using MapReduce
7
212
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227
3.1 Hierarchical encoded decision table
Most of previous studies on rough sets focused on mining certain rules and possible rules on the single concept level. However, the value set of some attributes can form a generalization hierarchy in real-world applications such as P roduct < Brand < Category for product attribute and can be represented by a concept hierarchy tree. The formal use of concept hierarchies as the most important background knowledge in data mining is introduced by Han, Cai and Cercone [12,13]. A concept hierarchy is often related to a specific attribute and is partially order relation according to general-to-specific ordering. The most general node is the null (any), and the most specific node corresponds to the specific values of an attribute in the original decision table. Thus, one can reduce the data table by replacing low-level concepts with higher-level concepts using the concept hierarchy, and may mine more general and important knowledge from a decision table under different levels of granularity [8]. To this end, it is necessary to develop hierarchical attribute reduction algorithms for mining hierarchical decision rules. Any(*)
Any(*)
Any(*)
(1,1,2,2) Youth (2,2,2,2) Below 30 11
1
Middle_aged
31-35
36-40
41-50
12
21
22
2
The old
High
3
51-60
61-70
31
32
Doctoral Student 11
1
Low
Postgraduate student
2
Undergraduate
Others
21
22
12
Enterprise State-owned Enterprise
1
Institution
Private Enterpise
Education
12
21
11
2
Civil Servant 22
(a) Concept tree for contional attributes {Age, Education level, Occupation} Any(*) (1,1,2,1)
(1,1,2,2)
High
1
Middle
Above 100000
50001100000
11
12
2500150000
Low
2 1000125000
21 22 (b) Concept tree for decision attribute Salary
3
500110000
Below 5000
31
32
Fig. 1. Concept hierarchy tree for each attribute 228 229 230 231 232 233 234 235 236 237 238 239
In general, the hierarchical attribute values can be represented by a concept hierarchy tree. Terminal leaf nodes on the trees represent actual attribute values appearing in the original decision table; internal nodes represent value clusters formed from their lower-level nodes. In a concept hierarchy, each level can be denoted by a digit. Usually the level number for the root node is assigned with the digit 0, while the level number for the other concept nodes is one plus its father’s level number. In this case, the concepts at each level can be called a concept at level w (see Fig. 1). We employ an integer encoding mode to encode the concept hierarchy: for any concept v at level w, the corresponding encoding string is lv1 /. . ./lvw , where “ / ” is list separator, and lv1 /. . ./lvw−1 is the corresponding encoding of v s father concept. As indicated in [21], we can conclude the following two propositions as well. 8
240 241 242
243 244 245
246 247 248 249 250 251 252 253 254
255 256 257 258 259 260 261 262
263 264 265 266
267 268
269 270 271 272 273 274
275 276 277 278
Proposition 1 For any two concept nodes A and B with the encoding string lA1 /. . ./lAi and lB1 /. . ./lBj , respectively, A is a brother node of B if and only if i=j and lAw =lBw for w=1, . . ., i-1. Proposition 2 For any two concept nodes A and B with the encoding string lA1 /. . ./lAi and lB1 /. . ./lBj , respectively, A is a child node of B if and only if i=j+1 and lAw =lBw for w=1, . . ., j. For simplicity, we can also denote lv1 /. . ./lvw by lv1 . . .lvw without the ambiguity. By Proposition 1 and 2, we can discern the relationship from any two concepts in a concept hierarchy tree. Attribute ci is formed along l(ci )+1 hierarchy levels: 0, 1, . . ., l(ci ) with level 0 being the special value ANY(*). Given a decision table S and the concept hierarchies of all attributes, we will denote the depth of concept hierarchy of conditional attribute ci and decision attribute d as l(ci )+1(i =1, . . ., m) and l(d)+1, respectively. Thus, our decision table spawns the (m+1)-attributes in this paper. Without loss of generality, we define a generalized hierarchical decision table as follows. Definition 7 Let S = (U, At = C ∪D, {Va |a ∈ At}, {Ia |a ∈ At}) be a decision table, Sk1 k2 ...km kd =(Uk1 k2 ...km kd , At = C ∪ D, HAt , V k1 k2 ...km kd , Ik1 k2 ...km kd ) is denoted as a (k1 k2 . . .km kd )-th hierarchical decision table induced by S, where Uk1 k2 ...km kd is a finite non-empty set of objects, HAt = {Ha |a ∈ At} denotes the kt concept hierarchy trees of the set of attributes, V k1 k2 ...km kd = m ∪ V kd t=1 V is the domain of ct (t=1, 2, . . .,m) at kt -th level and d at kd -th level of their concept hierarchies, and Ik1 k2 ...km kd is the information function from Uk1 k2 ...km kd to V k1 k2 ...km kd . Definition 8 For a decision table S, denote the domain of attribute ct at the i-th level in its concept hierarchy as V it , we will say that V it is coarser than V jt if and only if for any b ∈ V jt , there always exists a ∈ V it such that b is subconcept of a, and denote it as V it V jt . In general, given two concept levels i and j for attribute ct , if i ≤ j, the values of ct in concept level i are more generalized. Definition 9 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, denote the domain of conditional attribute set C in two hierarchical decision tables as V i1 i2 ...im id and V j1 j2 ...jm jd , we will say that V i1 i2 ...im id is coarser than V j1 j2 ...jm jd if and only if for any t ∈ {1, 2, . . . , m} and decision attribute d, there always have V it V jt and V id V jd , denoted by V i1 i2 ...im id V j1 j2 ...jm jd . Definition 10 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )th decision table, if V it V jt for any t ∈ {1, 2, . . . , m}, then U/cit is coarser than U/cjt , or U/cjt is finer than U/cit , denoted by U/cit U/cjt . Correspondingly, if V i1 i2 ...im V j1 j2 ...jm , then U/C i1 i2 ...im is coarser than U/C j1 j2 ...jm , or 9
279
280 281 282 283 284 285 286
U/C j1 j2 ...jm is finer than U/C i1 i2 ...im , denoted by U/C i1 i2 ...im U/C j1 j2 ...jm . Example 1: Table 1 summarizes the characteristics of a decision table. With concept hierarchies of all attributes {Age(A), Education Level(EL), Occupation(O), Salary(Sa)} in Fig.1, Table 1 can be transformed into different hierarchical encoded decision tables as shown in Fig. 2. Table 2, Table 3 and Table 4 only illustrate the details of S2222 , S1122 , S1112 , S1121 and S1111 hierarchical decision tables in which A2 , A1 , EL2 , EL1 , O 1, O 2 , Sa1 and Sa2 represent the concept levels for the conditional attributes and decision attribute respectively. Table 1 Description of the datasets U
Age
Education Level
Occupation
Salary
1
31-35
Doctoral Student
State-owned Enterprise
10001-25000
2
Below 30
Others
Civil Servant
5001-10000
3
31-35
Postgraduate Student
State-owned Enterprise
10001-25000
4
36-40
Postgraduate Student
Private Enterprise
25001-50000
5
41-50
Undergraduate
Education
25001-50000
6
51-60
Others
Private Enterprise
Below 5000
7
41-50
Doctoral Student
State-owned Enterprise
25001-50000
8
36-40
Postgraduate Student
Private Enterprise
10001-25000
9
61-70
Postgraduate Student
State-owned Enterprise
Above 100000
10
41-50
Undergraduate
Education
50001-100000
287
293
In Fig. 2, all hierarchical decision tables can be represented as a lattice, in which the arrows illustrate the possible generalization paths. The top node S1111 represents the most generalized hierarchical decision table, while the bottom node S2222 denotes the raw decision table. A path from S2222 to S1222 , to S1122 , S1112 and S1111 , is a generalization strategy for different level ascensions of the conditional attributes and decision attribute.
294
3.1.1 Conditional attribute level ascension in hierarchical decision table
288 289 290 291 292
295 296 297
298 299
In what follows, we first discuss some properties of these hierarchical decision tables under different levels of granularity for conditional attributes from the perspectives of uncertainty, core attribute, reduct and decision rule. Theorem 1 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it ≤ jt (t=1,2,. . .,m) and id =jd , then 10
S1,1,1,1
S1,2,1,1
S1,1,1,2
S1,1,2,2
S2,1,1,2 S1,2,2, 2
S2,1,1,1
S1,1,2,1
S1,2,1,2
S1,2,2,1
S 2,1,2, 2
S2,1,2,1
S 2,2,1, 2
S2,2,1,1 S 2,2,2,1
S2,2,2,2
Fig. 2. Different hierarchical decision tables under different levels of granularity Table 2 S2222 original decision table
300 301 302
303 304 305 306 307
308 309 310
U
Age(A2 )
Education level(EL2 )
Occupation(O2 )
Salary(Sa2 )
1
12
11
11
22
2
11
22
22
31
3
12
12
11
22
4
21
12
12
21
5
22
21
21
21
6
31
22
12
32
7
22
11
11
21
8
21
12
12
22
9
32
12
11
11
10
22
21
21
12
(1) |P OSi1i2 ...im id (D|C)| ≤ |P OSj1j2 ...jm jd (D|C)|; (2) DISi1 i 2 ...im id (D|C) ≥ DISj1 j2 ...jm jd (D|C); (3) Inf oi1 i2 ...im id (D|C) ≥ Inf oj1 j2 ...jm jd (D|C). Proof: Since it ≤ jt (t=1,2,. . .,m) and id =jd , we can acquire U/C i1 i2 ...im U/C j1 j2 ...jm and πDid = πDjd . Assume that πC j1 j2 ...jm ={P1 , . . ., Pl } and πC i1 i2 ...im ={Q1 , . . ., Qw }, it is obvious that l > w. There exists a subset Ei of a set {1, 2, . . ., l} such that Ei ∩ Eh = φ where i = h, i, h = 1, 2, . . . w. Thus, Qi = ∪j∈Ei Pj , where i=1, 2, . . . , w. (1) For any equivalence class Dq ∈ πDid , since apr Q (Dq ) ⊆ ∪j∈Ei apr P (Dq ), we i j can have P OSC j1 j2 ...jm (D|C) ⊇ P OSC i1 i2 ...im (D|C), thus |P OSC j1j2 ...jm (D|C)| ≥ |P OSC i1i2 ...im (D|C)|. 11
Table 3 S1122 and S1112 hierarchical decision table S1122 decision table
S1112 decision table
U
A1
EL1
O2
Sa2
U
A1
EL1
O1
Sa2
1
1
1
11
22
1
1
1
1
22
2
1
2
22
31
2
1
2
2
31
3
1
1
11
22
3
1
1
1
22
4
2
1
12
21
4
2
1
1
21
5
2
2
21
21
5
2
2
2
21
6
3
2
12
32
6
3
2
1
32
7
2
1
11
21
7
2
1
1
21
8
2
1
12
22
8
2
1
1
22
9
3
1
11
11
9
3
1
1
11
10
2
2
21
12
10
2
2
2
12
Table 4 S1121 and S1111 hierarchical decision table S1121 decision table
S1111 decision table
U
A1
EL1
O2
Sa1
U
A1
EL1
O1
Sa1
1
1
1
11
2
1
1
1
1
2
2
1
2
22
3
2
1
2
2
3
3
1
1
11
2
3
1
1
1
2
4
2
1
12
2
4
2
1
1
2
5
2
2
21
2
5
2
2
2
2
6
3
2
12
3
6
3
2
1
3
7
2
1
11
2
7
2
1
1
2
8
2
1
12
2
8
2
1
1
2
9
3
1
11
1
9
3
1
1
1
10
2
2
21
1
10
2
2
2
1
311
(2) DISj1 j2 ...jm (D|C) =
312
=
313
≤
1≤j≤l1≤k1
nkj 1 nkj 2
nkj 1 nkj 2
1≤k1
(
1≤k1
nkj 1 )(
j∈Ei
nkj 2 )
12
nki 1 nki 2
314
=
315
= DISi1 i2 ...im (D|C).
316
(3) p(Qi ) =
1≤k1
j∈Ei
p(Pj ), where i=1, 2, . . ., w. As
317
we have
318
Inf oj1 j2 ...jm (D|C)=-
319
320
≤≤-
w i=1 j∈Ei w i=1
p(Pj )
p(Qi )
k q=1
k q=1
l j=1
p(Pj )
k q=1
l j=1
p(Pj ) =
w i=1
j∈Ei
p(Pj ),
p(Dq |Pj )log2 p(Dq |Pj )
p(Dq |Pj )log2 p(Dq |Qi )
p(Dq |Qi )log2 p(Dq |Qi )
321
=Inf oi1 i2 ...im (D|C).
322
Corollary 1 If {11 . . . 1 id , . . ., i1 i2 . . .im id , . . ., l(c1 )l(c2 ). . .l(cm )id } is a total
323
ordering relation, we have
324
(1) |P OS11 . . . 1 id (D|C)| ≤ . . . ≤ |P OSi1i2 ...im id (D|C)| ≤ . . . ≤ |P OSl(c1)l(c2 )...l(cm )id (D|C)|
m
325 326
≤ |U|;
m
(2) DIS11 . . . 1 id (D|C) ≥ . . . ≥ DISi1 i2 ...im id (D|C) ≥ . . . ≥ DISl(c1 )l(c2 )...l(cm )id (D|C) m
327 328
329
330 331 332
333 334 335 336
337 338 339 340
≥ 0; (3) Inf o11 . . . 1 id (D|C) ≥ . . . ≥ Inf oi1 i2 ...im id (D|C) ≥ . . . ≥ Inf ol(c1 )l(c2 )...l(cm )id (D|C)
≥ 0.
m
Theorem 1 and Corollary 1 indicates that the positive region decreases monotonously as the concept levels of conditional attributes ascend, while the value of indiscernibility object pairs and information entropy increases. Theorem 2 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1 i2 ...im id (D|C), where it ≤ jt (t=1, 2, . . . m) and id = jd . If ck is a core attribute of (j1 j2 . . .jm jd )-th decision table, then ck is a core attribute of the (i1 i2 . . .im id )-th decision table. Proof: Since ck is a core attribute of the (j1 j2 . . .jm jd )-th decision table, P OSj1j2 ...jm jd (D|A) ⊂ P OSj1j2 ...jm jd (D|C) where A = C - {ck }. Since it ≤ jt (t=1, 2, . . ., m) and id = jd , we have πAi1 i2 ...im πAj1 j2 ...jm and πDid = πDjd , thus P OSi1i2 ...im id (D|A) ⊆ P OSj1j2 ...jm jd (D|A) ⊂ P OSj1j2 ...jm jd (D|C) = P OSi1i2 ...im id (D|C). Therefore, 13
341
342 343 344 345 346
347 348 349 350
351 352 353
ck is a core attribute of the (i1 i2 . . .im id )-th decision table. Theorem 3 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1i2 ...im id (D|C), where it ≤ jt (t=1,2,. . .,m) and id =jd . If Redj ⊆ C is a reduct of the (j1 j2 . . .jm jd )-th decision table, then there exists a reduct of (i1 i2 . . .im id )-th decision table Redi such that Redi ⊇ Redj . Proof: Redj is a reduct of the (j1 j2 . . .jm jd )-th decision table, then |P OSj1j2 ...jm jd (D|Redj )| = |P OSj1j2 ...jm jd (D|C)|. Since it ≤ jt (t=1, 2, . . ., m) and id = jd , we can acquire |P OSi1i2 ...im id (D|Redj )| ≤ |P OSj1j2 ...jm jd (D|Redj )|. Thus there exist the following two cases. Case 1: If |P OSi1i2 ...im id (D|Redj )| = |P OSi1i2 ...im id (D|C)|, Redj is a reduct of (i1 i2 . . .im id )-th decision table, so there exists a reduct Redi such that Redi = Redj .
357
Case 2: If |P OSi1i2 ...im id (D|Redj )| < |P OSi1i2 ...im id (D|C)|, it must add some attributes A such that |P OSi1i2 ...im id (D|[Redj ∪ A])| = |P OSi1i2 ...im id (D|C)|. Thus, Redj ∪ A is a reduct of (i1 i2 . . .im id )-th decision table. Therefore, there exists a reduct Redi such that Redi = Redj ∪ A.
358
From the above two cases, we have Redi ⊇ Redj .
354 355 356
359 360 361
362 363 364 365 366
Theorem 4 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it ≤ jt (t=1,2,. . .,m) and id =jd , then γ(C i1 i2 ...im → D id ) ≤ γ(C j1 j2 ...jm → D jd ) . Proof: Since it ≤ jt (t=1,2,. . .,m) and id = jd , we can acquire U/C i1 i2 ...im U/C j1 j2 ...jm and πDid = πDjd . Assume that πC j1 j2 ...jm = {P1 , . . ., Pl } and πC i1 i2 ...im = {Q1 , . . ., Qw }, it is obvious that l > w. There exists a subset Ei of a set {1, 2, . . ., l} such that Qi = ∪j∈Ei Pj (i=1, 2, . . ., w). For any equivalence class Dl ∈ πDid , apr Q (Dl ) ≤ apr P (Dl ). Therefore, we have i
i1 i2 ...im
→ D ) ≤ γ(C id
j1 j2 ...jm
j∈Ei
j
→ D ). jd
367
γ(C
368
3.1.2 Decision attribute level ascension in hierarchical decision table
369 370 371 372 373
374 375
In what follows, we consider the case that keeps the concept levels of conditional attributes unchanged and ascends the level of decision attribute. In such case, we further discuss the relationships among hierarchical decision tables from the perspectives of the uncertainty, attribute core, reduct and decision rule. Theorem 5 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it =jt (t=1,2,. . .,m) and id ≤ jd , then 14
376 377 378
(1) |P OSi1i2 ...im id (D|C)| ≥ |P OSj1j2 ...jm jd (D|C)|; (2) DISi1 i 2 ...im id (D|C) ≤ DISj1 j2 ...jm jd (D|C); (3) Inf oi1 i2 ...im id (D|C) ≤ Inf oj1 j2 ...jm jd (D|C).
382
Proof: Since it = jt (t=1,2,. . .,m), πC i1 i2 ...im = πC j1 j2 ...jm . As id ≤ jd , πDid πDjd . Let πDid ={D1 , . . ., Dl } and πDjd ={D1 , . . ., Dw }. Obviously, l < w. There exists a subset Ei of a set {1, 2, . . ., w} such that Ei ∩ Eh = φ where i = h, i, h= 1, 2, . . . l. For any equivalence class, Di = Dj , where i=1, 2,
383
. . ., l.
379 380 381
384 385
386
387
388
389
j∈Ei
(1) For any equivalence class Cq ∈ πC i1 i2 ...im , apr Cq (Di ) ≥ ∪j∈Ei aprCq (Dj ). Thus, |P OSi1i2 ...im id (D|C)| ≥ |P OSj1j2 ...jm jd (D|C)|. (2) For any equivalence class Cq ∈ πC i1 i2 ...im , we have 1≤j1
1≤i1
2 njq1 njq2 = [|Cq |2 - (n1q )2 - . . . - (njq )2 - . . . - (nw q ) ]/2.
niq1 niq2 = [|Cq |2 - (n1q )2 - . . . - (niq )2 - . . . - (nlq )2 ]/2.
Since Di =
j∈Ei
Dj , niq =
njq1 njq2 ≥
j∈Ei
njq , (niq )2 = (
j∈Ei
njq )2 ≥
j∈Ei
(njq )2 .
niq1 niq2 .
390
Thus,
391
Therefore, we can conclude that DISi1 i 2 ...im id (D|C) ≤ DISj1 j2 ...jm jd (D|C).
392 393
394
395
396
397
399 400
1≤i1
(3)For any equivalence class Cq ∈ πC i1 i2 ...im , we have p(Di |Cq ) = where i=1, 2, . . ., l. Thus, Inf oi1 i2 ...im id (D|C)=≤≤=-
r q=1 r q=1 r
q=1 398
1≤j1
p(Cq ) p(Cq ) p(Cq )
l i=1 j∈Ei l i=1 j∈Ei n j=1
r q=1
p(Cq )
l i=1
p(Dj |Cq )log2
j∈Ei
p(Dj |Cq ),
p(Di |Cq )log2 p(Di |Cq )
j∈Ei
p(Dj |Cq )
p(Dj |Cq )log2 p(Dj |Cq )
p(Dj |Cq )log2 p(Dj |Cq )
= Inf oj1 j2 ...jm jd (D|C). Corollary 2 If {i1 i2 . . .im l(d), . . ., i1 i2 . . .im id , . . ., i1 i2 . . .im 1} is a total ordering relation, we have 15
401 402 403 404 405 406
407 408 409
410 411 412 413
414 415 416 417 418
419 420 421 422 423
424 425 426 427 428 429 430
431 432 433
434 435 436 437
(1) |P OSi1i2 ...im l(d) (D|C)| ≤ . . . ≤ |P OSi1i2 ...im id (D|C)| ≤ . . . ≤ |P OSi1i2 ...im 1 (D|C)| ≤ |U|; (2) DISi1 i2 ...im l(d) (D|C) ≥ . . . ≥ DISi1 i2 ...im id (D|C) ≥ . . . ≥ DISi1 i2 ...im 1 (D|C) ≥ 0; (3) Inf oi1 i2 ...im l(d) (D|C) ≥ . . . ≥ Inf oi1 i2 ...im id (D|C) ≥ . . . ≥ Inf oi1 i2 ...im 1 (D|C) ≥ 0. Theorem 5 and Corollary 2 indicates that the size of the positive region increases monotonously as the concept level of decision attribute ascends, while the values of indiscernibility object pairs and information entropy decrease. Theorem 6 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1 i2 ...im id (D|C), where it =jt (t=1, 2, . . ., m) and id ≤ jd . If ck is a core attribute of the (i1 i2 . . .im id )-th decision table, then ck is also a core attribute of the (j1 j2 . . .jm jd )-th decision table. Proof: Since ck is a core attribute of the (i1 i2 . . .im id )-th decision table, A = C - {ck }, P OSi1i2 ...im id (D|A) ⊂ P OSi1i2 ...im id (D|C). As it = jt (t=1,2,. . .,m) and id ≤ jd , πAi1 i2 ...im = πAj1 j2 ...jm , πDid πDjd , thus P OSj1j2 ...jm jd (D|A) ⊆ P OSi1i2 ...im id (D|A) ⊂ P OSi1i2 ...im id (D|C) = P OSj1j2 ...jm jd (D|C). Therefore, ck is a core attribute of the (j1 j2 . . .jm jd )-th decision table. Theorem 7 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, let P OSj1j2 ...jm jd (D|C) = P OSi1i2 ...im id (D|C) and Redj ⊆ C is a reduct of the (j1 j2 . . .jm jd )-th decision table. If ip =jp (p=1,2,. . .,m) and id ≤ jd , then there exists a reduct of (i1 i2 . . .im id )-th decision table, Redi , such that Redi ⊆ Redj . Proof: Redj is a reduct of the (j1 j2 . . .jm jd )-th decision table, then |P OSj1j2 ...jm jd (D|Redj )| = |P OSj1j2 ...jm jd (D|C)|. Since it = jt (t=1,2,. . .,m) and id ≤ jd , we can acquire |P OSi1i2 ...im id (D|Redj )| = |P OSj1j2 ...jm jd (D|Redj )|. Thus, Redj is a super reduct of the (i1 i2 . . .im id )-th decision table. We can delete some attributes A such that |P OSi1i2 ...im id (D|[Redj − A])| = |P OSi1i2 ...im id (D|C)|. If Redj − A is a reduct of (i1 i2 . . .im id )-th decision table, then there exists a reduct Redi such that Redi = Redj − A. Thus, Redi ⊆ Redj . Theorem 8 Given the (i1 i2 . . .im id )-th decision table and the (j1 j2 . . .jm jd )-th decision table, if it = jt (t=1,2,. . .,m) and id ≤ jd , then γ(C i1 i2 ...im → D id ) ≥ γ(C j1 j2 ...jm → D jd ) . Proof: Since it = jt (t=1, 2, . . ., m), πC i1 i2 ...im = πC j1 j2 ...jm . As id ≤ jd , πDid πDjd . Let πDid ={D1 , . . ., Dl } and πDjd ={D1 , . . ., Dw }. Obviously, l ¡ w. There exists a subset Ei of a set {1, 2, . . ., w} such that Di = Dj (i=1, 2, . . ., l). For any equivalence class Cq ∈ πC i1 i2 ...im , apr Cq (Di ) ≥
16
j∈Ei
j∈Ei
aprCq (Dj ).
438
439 440 441
Therefore, we have γ(C i1 i2 ...im → D id ) ≥ γ(C j1 j2 ...jm → D jd ). Example 2: (Continued from Example 1) Different hierarchical decision tables under different levels of granularity are shown in Fig. 2 and Tables 2—4. We here illustrate some cases at different levels of granularity in details.
453
(1) For S1112 and S2222 hierarchical decision tables, as |P OS1112 (D|C)| =5 and |P OS2222(D|C)|=6, |P OS1112 (D|C)| ≤ |P OS2222 (D|C)| holds. For S1122 and S1121 hierarchical decision tables, since |P OS1122 ( D|C)|=6 and |P OS1121 (D|C)|=8, |P OS1121 (D|C)| ≥ |P OS1122(D|C)| holds. (2) For S1122 and S2222 hierarchical decision tables, {c1 } is a core attribute of S2222 , then it is also a core attribute of S1122 . Meanwhile, {c1 , c2 } is a reduct of S2222 , while {c1 , c2 , c3 } is a reduct of S1122 , thus {c1 , c2 } ⊆ {c1 , c2 , c3 } holds. (3) For S1112 and S2222 hierarchical decision tables, γ(C 111 → D 2 ) = 0.5 and γ(C 222 → D 2 ) = 0.6, thus γ(C 111 → D 2 ) ≤ γ(C 222 → D 2 ). For S1111 and S1112 hierarchical decision tables, γ(C 111 → D 1 ) = 0.8 and γ(C 111 → D 2 ) = 0.5, thus γ(C 111 → D 1 ) ≥ γ(C 111 → D 2 ) holds.
454
3.2 The parallelism of hierarchical attribute reduction algorithms
442 443 444 445 446 447 448 449 450 451 452
455 456
457 458
In what follows, we discuss the parallelism in hierarchical attribute reduction algorithms using MapReduce. Definition 11 For a hierarchical decision table Sk1 k2 ...km kd , let Ski 1 k2 ...km kd (i=1, 2, . . ., N) denote a sub-decision table, if it satisfies (1)Sk1 k2 ...km kd = Ski 1 k2 ...km kd ; 1≤i≤N
459 460
461 462 463
464 465 466
(2)Ski 1 k2 ...km kd ∩Skj1 k2 ...km kd = ∅, where i, j = 1,2,. . .,N and i = j, then Ski 1 k2 ...km kd is called a data split. By Definition 11, Ski 1 k2 ...km kd corresponds to a data split in MapReduce framework. Thus, we can split the whole dataset into many data splits using MapReduce. Theorem 9 For a decision table Sk1 k2 ...km kd , let πA ={A1 , A2 , . . ., Ar }, Ski 1 k2 ...km kd (i=1, 2, . . ., N) be a data split of Sk1 k2 ...km kd , Ski 1 k2 ...km kd /A = {Ai1 , Ai2 , . . ., Air }, then Ap = Aip (p=1,2,. . .,r). 1≤i≤N
467 468 469 470 471 472
Proof. Assume that x ∈ Aip in Ski 1 k2 ...km kd and y ∈ Ajp in Skj1 k2 ...km kd , since the equivalence classes Aip and Ajp with respect to A are the same as Ap , x∪y ∈ Ap . It means that the same objects among different data splits can be combined as one equivalence class. Thus, for πA ={A1 , A2 , . . ., Ar } and Ski 1 k2 ...km kd /A = Aip (p=1,2,. . .,r). {Ai1 , Ai2 , . . ., Air }, we can get Ap = 1≤i≤N
17
473 474 475 476 477 478 479 480 481 482 483 484
According to Theorem 9, one can compute the equivalence classes independently for each data split, thus all equivalence classes are the same as those in serial computing. Therefore, we can implement the data parallelism for big data using MapReduce. For hierarchical attribute reduction, we first compute a hierarchical encoded decision table, then compute the equivalence classes and the attribute significance, and finally acquire a reduct. Fig. 3 illustrates the parallelism of a hierarchical attribute reduction algorithm, where S denotes the whole dataset, S i (i=1,2,. . .,N) is a data split, Aj (j=1,2,. . .,M) is a candidate attribute subset which can be regarded as a task for computing the corresponding equivalence classes, ECi (i=1,2,. . .,r) or ECAj (j=1,2,. . .,M) denotes the different equivalence classes. The regions dotted with dash lines denote the parallel computing parts. In what follows, we discuss the parallel computations of the hierarchical encoded decision table in details.
Begin Compute hierarchical encoded decision table under different levels using concept hierarchy tree S
S2
S1
EC1 ...ECr
Ă
EC1 ...ECr
c1
SN
cm
EC1 ...ECr
d
Ă
( EC1 , EC2 ,..., ECr ) S k1k2 ...km kd Generate a list of candidate attribute subset { A1 ,
A2 , Ă , AM}
Decompose the hierarchical encoded decision table and compute equivalence classes
Sk1k2 ...kmkd
S1 A1 A AM 2 EC A1 ...EC AM
EC A1
S2 A1 A AM 2 EC A1 ...EC AM ECA2
Ă
SN A1 A AM 2 EC A1 ...EC AM
Ă
EC AM
Compute attribute significance for each equivalence class
Sum the attribute significance for each candidate attribute subset Choose the best candidate attribute N Reduct Y End
Fig. 3. The parallelism of a hierarchical attribute reduction algorithm
485
18
486 487
488 489 490 491 492 493 494 495 496 497 498 499
500 501 502 503 504
505 506 507 508 509 510 511 512 513 514 515 516
517 518 519 520 521 522 523 524
3.3 Parallel computation algorithms for the hierarchical encoded decision table using MapReduce
Given a decision table S and the corresponding Concept Hierarchy Tree(CHT), we can acquire a hierarchical encoded decision table. However, when we view the data under different levels of granularity, the hierarchical decision table may be regenerated from the original decision table S using the concept hierarchy tree. To address this problem, we employ the general encoding technique to acquire a hierarchical encoded decision table. Encoding can be performed during the collection of the relevant data, and thus there is no extra “encoding pass” required. In addition, an encoded string which represents a position in a concept hierarchy tree requires fewer bits than the corresponding attribute value. Therefore, it is often beneficial to use an encoded decision table, although our approach does not rely on the derivation of such an encoded decision table because the encoding can always be performed on the fly. To simplify our discussion, we assume that the given concept hierarchy trees are balanced. We transform a large, low-level, detailed data into a compressed, high-level, generalized data. As we all know, it is obvious that the encoding for one object is independent of that of another object. Thus, different encoding computations for all objects can be executed in parallel. The large-scale data is stored on Hadoop Distributed File System as a sequence file of < key, value > pairs, in which the key is each object, and the value is null. The MapReduce framework partitioned the whole dataset into many data splits, and globally broadcast them to the corresponding mapper. For each map task, map function receives the encoded string array of a concept hierarchy tree as a global parameter, encodes each object in a data split, and writes them into local files on each node. These files consist of two parts: the encoded string of the conditional attributes(for brevity, ConStr) and the counterpart of the decision attribute (DecStr). Here we design a Parallel Computation Algorithm for Hierarchical Encoded Decision Table(PCAHEDT) using MapReduce. The pseudocode of the map function of Algorithm PCAHEDT is shown in Algorithm 1. Algorithm 1: PCAHEDT-Map(key, value) //Map phase of transforming the objects into the encoded string using concept hierarchy tree Input: The encoded string array of concept hierarchy tree, ESA; a data split, Si Output:< ConStr, DecStr >, where ConStr is an encoded conditional string and DecStr is an encoded decision string Begin 19
533
ConStr =“”; for each x ∈ S i do {for any attribute a ∈ C do { Acquire the encoded string of object x on attribute a from ESA, es(x, a) ; Step 5. ConStr = ConStr + es(x,a) + “ ”;} Step 6. Acquire the encoded string of object x on attribute d from ESA, es(x, d), and assign es(x, d) to DecStr; Step 7. Emit < ConStr, DecStr > }.
534
End.
535
By Algorithm 1, we can acquire the encoded objects for each data split and store them into the local files. The MapReduce framework copies all the encoded objects from the Mapper nodes to the corresponding Reducer nodes and sort these objects. We can store the objects with the same encoded string of the conditional attributes by the following reduce function. The pseudocode of the reduce function of Algorithm PCAHEDT is outlined in Algorithm 2.
525 526 527 528 529 530 531 532
536 537 538 539 540
541 542 543 544 545 546
Step Step Step Step
1. 2. 3. 4.
Algorithm 2: PCAHEDT-Reduce(key,V) //Reduce phase of computing hierarchical encoded decision table Input: an encoded conditional string, key; the list of the decision values, V Output:< key, DecStr >, where key is an encoded conditional string ConStr and DecStr is an encoded decision string Begin
548
Step 1. for each DecStr ∈ V do Step 2. { Emit < key, DecStr >. }
549
End.
550
By Algorithm 2, we can compute the encoded decision table. Note that if some objects with the same ConStr have different DecStrs, we still emit each object with the different DecStr, instead of only one object with the DecStr ‘∗ ∗ . . . ∗ ’,
547
551 552
l(d) 553 554
555 556 557 558
559 560 561
since such objects may be used for the different hierarchical decision tables. All the encoded objects form a hierarchical encoded decision table. Parallel Computation Algorithm for the Hierarchical Encoded Decision Table (PCAHEDT) requires one kind of MapReduce job for executing the map and reduce function so that we can transform an original decision table into a hierarchical encoded decision table as outlined in Algorithm 3. Algorithm 3: Parallel computation algorithm for the hierarchical encoded decision table (PCAHEDT) Input: A decision table, S; the encoded string array of the concept hierarchy 20
562 563 564
tree, ESA Output: A hierarchical encoded decision table, HEDT Begin
568
Step 1. Compute the encoded objects by executing PCAHEDT-Map function of Algorithm 1; Step 2. Acquire the hierarchical decision table by executing PCAHEDTReduce function of Algorithm 2.
569
End
570
572
By Algorithm 3, we acquire a hierarchical encoded decision table. Thus, we can construct the hierarchical attribute reduction algorithms for a hierarchical encoded decision table under different levels of granularity.
573
3.4 Hierarchical attribute reduction algorithms using MapReduce
565 566 567
571
574 575 576 577 578 579 580 581
582 583
584 585 586 587 588 589 590 591
592 593 594 595 596
In this section, we mainly discuss the parallel computations of the hierarchical attribute reduction algorithms for big data using MapReduce, which use a hierarchical encoded decision table instead of the original decision table. As we all know, it is obvious that the computations of the equivalence classes are independent of each other. Thus, different computations of equivalence classes from a subset of attributes can be executed in parallel. In what follows, we first illustrate how to compute the equivalence classes and attribute significance of different candidate attributes in parallel.
3.4.1 Parallel computations of the equivalence classes for candidate attribute subset As discussed in [27], three classical algorithms often iteratively compute the equivalence classes from different candidate attribute set, so we use a candidate subset of the attributes as a global parameter in Map function. The pseudocode of Map function is illustrated in Algorithm 4, where es(x, a) denotes the encoded string on attribute a for object x, substr(1, lna ) function acquires the substring of the length lna from the beginning, c EquivalenceClass is an equivalence class with a flag of attribute c, and < DecStr, 1 > is a pair where the DecStr is the encoded decision string. Algorithm 4:Map(key, value) //Map phase of computing equivalence classes Input: Selected attributes, A; any candidate attribute, c ∈ C − A; a data split, S i ; different level number of all the attributes, lna (a ∈ C ∪ {d}) Output:< c EquivalenceClass, < DecStr, 1 >>, 21
597
Begin
598
Step Step Step Step Step Step
606
EquivalenceClass=“”, DecStr=“”; for each x ∈ S i do {for any attribute a ∈ A do {EquivalenceClass=EquivalenceClass + es(x, a).substr(1,lna ) + “ ”;} for any attribute c ∈ C − A do { c EquivalenceClass =c + “ ” + es(x, c).substr(1,lnc )+ “ ” + EquivalenceClass; Step 7. DecStr =es(x,d).substr(1,lnd ); Step 8. Emit < c EquivalenceClass, < DecStr, 1 >>.}}
607
End
608 609
By Algorithm 4, we can compute the equivalence classes from different candidate attribute subset in task parallel for each data split.
610
3.4.2 Parallel computations of attribute significance for the equivalence classes
599 600 601 602 603 604 605
611 612 613 614 615 616 617
1. 2. 3. 4. 5. 6.
The computation starts with a map phase in which the map function is applied in parallel on different partitions of the input dataset. The MapReduce framework will shuffle and group the same equivalence classes into a node, that is, all the pair values that share a certain key are passed to a single reduce function. Thus the reduce function can compute the attribute significance among the same equivalence classes in parallel and store < c, c AttrSig > into a file in the HDFS. We can define the different attribute significance as follows. Definition 12 [27] For a hierarchical encoded decision table S, let A ⊆ C and c ∈ C − A, then the significance of attribute c is defined by: sigP OS (c, A, D) =
|P OS(D|[A ∪ {c}])| − |P OS(D|A)| |U|
|BND(D|A)| − |BND(D|[A ∪ {c}])| |U| sigInf o (c, A, D) = Inf o(D|A) − Inf o(D|[A ∪ {c}])
sigBN D (c, A, D) =
sigDIS (c, A, D) =
− DIS(D|[A ∪ {c}]) DIS(D|A) ni nj
(9) (10) (11) (12)
1≤i
621 622
where sig (c, A, D)(={POS,BND,DIS,Info}) denotes the attribute significance in positive region, boundary region, discernibility matrix and information entropy algorithms, respectively. Note that although sigP OS (c, A, D) and sigBN D (c, A, D) here are identical, but they have different performances for big data with high dimensions in cloud 22
623 624 625
626 627 628 629 630 631 632 633
computing (See Fig. 5). Therefore, we only give the pseudocode of reduce function for three different algorithms in view of boundary region as follows (Algorithm 5). Algorithm 5:Reduce(key,V) //Reduce phase of computing the attribute significance of single attribute among the same equivalence classes Input: An equivalence class, c EquivalenceClass; the list of pairs < DecStr, F requencies > ,V Output: < c, c AttrSig >, where c is an attribute name and c AttrSig is the value of attribute significance Begin
640
Step 1. c AttrSig = 0; Step 2. for each v ∈ V do { Step 3. Compute the frequencies (n1 ,n2 , . . .,nk ) of the different encoded decision string;} Step 4. Compute c AttrSig according to Definition 12. Step 5. if all the decision values are different, then Step 6. {Emit < c, c AttrSig >. }
641
End
634 635 636 637 638 639
650
For boundary region algorithm, when computing sigBN D (c, A, D) for c ∈ C −A, we only calculate the number of boundary region objects, |BND(D|[A∪ {c}])|, in reduce phase. For information entropy algorithm, when computing sigInf o (c, A, D) for c ∈ C − A, one can check that the value of Inf o(D|A) is the same in this iteration, thus we only calculate the information entropy Inf o(D|[A ∪ {c}]). For discernibility matrix algorithm, when computing sigDIS (c, A, D) for c ∈ C −A, one can check that DIS(D|A) is the same in this iteration, thus we only calculate the number of indiscernibility object pairs, ∪ {c}]), in reduce phase. DIS(D|[A
651
3.4.3 Parallel hierarchical attribute reduction algorithms using MapReduce
642 643 644 645 646 647 648 649
652 653 654 655 656 657 658 659 660
Parallel hierarchical attribute reduction algorithms based on boundary region(positive region), discernibility matrix and information entropy requires one kind of MapReduce job. The MapReduce driver program reads the attribute significance values of different candidate attribute subset from the output files in reduce phase, sums up the value of each candidate attribute, and determines some attribute to be added into a reduct. Thus, we can get the newly candidate attributes which are used for the next iteration. This procedure must be executed serially in each iteration. A general parallel hierarchical attribute reduction algorithm can be summarized in Algorithm 6. 23
661 662 663 664 665
666 667 668 669
Algorithm 6: Parallel hierarchical attribute reduction algorithms using MapReduce Input: a hierarchical encoded decision table, HEDT Output: a reduct, Red Begin Step Step Step Step
1. 2. 3. 4.
670 671 672
Step 5. Step 6.
673 674 675 676
Step 7. Step 8.
677 678
Step 9.
679 680
Step 10.
Let Red = ∅; If (D|Red) = (D|C) , then turn to Step 8; For each attribute c ∈ C − Red do {Initate a MapReduce job and execute map function of Algorithm 4 and reduce function of Algorithm 5; Read the output files and acquire sig (c, Red, D)}; Choose an attribute c with sig (c , Red, D) = best(sig (c, Red, D)) from C − Red (if the attribute like that is not only one, select one attribute arbitrarily ); Red = Red ∪{c }, turn to Step 2; If (D|[Red − {c}]) = (D|Red) for every c ∈ Red, then go to Step 10. If there exists an attribute c ∈ Red such that (D|[Red − {c }]) = (D|Red), then Red = Red - {c } and go to Step 8. Output Red.
681
End
682
By Algorithm 6, we can acquire each reduct for different hierarchical attribute algorithms based on boundary region, discernibility matrix and information entropy. It can be known from Steps 2–7 that (D|Red) = (D|C). On the other hand, according to Steps 8 and 9 of Algorithm 6, we have (D|[Red − {c}]) = (D|Red) for every c ∈ Red. By Definition 5, Algorithm 6 is complete, that is, the output set Red is a reduct with certainty [19].
683 684 685 686 687
688 689 690 691
692 693 694 695 696 697 698 699
Example 3: (Continued from Example 1) Different hierarchical decision tables under different levels of granularity are shown in Fig. 2. We illustrate some reducts under different levels of granularity for conditional attributes and decision attribute in Fig. 4. (1) For S2222 , we first get a reduct {A2 , EL2 }. In order to mine the hierarchical decision rules, we begin to ascend the levels of conditional attributes A2 or/and EL2 . In the end, we can acquire an attribute-generalization reduct {A2 , EL1 } as shown in Fig. 4(a). Thus, S2222 can be generalized into S21∗2 . (2) For S2221 , we first get a reduct {A2 , EL2 } as well. Then, we ascend the levels of conditional attributes A2 or/and EL2 . Finally, we can acquire an attribute-generalization reduct {A1 , EL1 } as shown in Fig. 4(b). Thus, S2221 can be generalized into S11∗1 . 24
S1,1,*,1 { A1 , EL1}
S1,2,*,2
S 2,1,*,2
S1,2,*,1
S 2,1,*,1
{ A1 , EL2 }
{ A2 , EL1}
{ A1 , EL2 }
{ A2 , EL1}
u S2,2,2,2
S 2,2,2,1
{ A2 , EL2 } (a) attribute-generalization reduction for
{ A2 , EL2 } (b) attribute-generalization reduction for S 2,2,2,1
S2,2,2,2
Fig. 4. An illustration of attribute-generalization reduction process 700
701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717
3.5 Time complexity analysis of parallel algorithms
To demonstrate the efficiency of our proposed parallel attribute reduction algorithms, i.e. Algorithms 1–6, we compare these parallel algorithms which are implemented on multiple nodes using MapReduce with their corresponding serial algorithms implemented on single node. Given the same decision table, Table 5 lists the computational complexities of Algorithms 1–6 and their serialversion algorithms. In Table 5, m denotes the number of conditional attributes, n is the total number of objects and N is the number of slave nodes. To simplify our discussion, we assume the number of the objects in each data split is Nn . k is the number of decision values, n is the number of the objects in the maximal equivalence class. l(ct ), lnct (t=1, 2, . . ., m), l(d) and lnd denote the concept hierarchy depths and the hierarchical encoded string of conditional attribute ct and decision attribute d respectively. From Table 5, we can find that the time complexities of the parallel attributes reduction algorithms are far less than the traditional serial algorithms. Under the parallel strategy, the computation task of attribute reduction is divided and distributed into N nodes and thus the computational complexity is greatly reduced N times. Generally speaking, the more slave nodes we have, the faster attribute reduction processing can achieve. Table 5 Time complexity analysis of parallel algorithms Algorithm Nos. Algorithm 1
Serial O(n( m t=1 l(ct ) + l(d)))
Parallel
m
O(
n(
t=1
l(ct )+l(d)) ) N
max(O(n ), O( Nn ))
Algorithm 4
O(n) O(n( m t=1 l(ct ) + l(d))) m O(n( t=1 lnct + lnd ))
Algorithm 5
O(kn)
max(O(kn ),O( kn N ))
Algorithm 2 Algorithm 3
Algorithm 6
m
max(O(n ), O( m
O(
n(
t=1
n(
t=1
l(ct )+l(d)) )) N
lnct +lnd ) ) N
max(O(m2 n( m t=1 lnct + lnd )), 2 O(m kn))
718
25
m
m2 n(
lnc +lnd )
t t=1 max(O( N 2 O(m2 kn ), O( mNkn ))
),
719
4
720
726
This section presents experimental results of running four hierarchical attribute reduction algorithms based on positive region , boundary region, discernibility matrix and information entropy in data and task parallel(HARAPOS, HARABND, HARADIS and HARAInfo). We are interested in the performances such as speedup and scaleup [37] of parallel attribute reduction algorithms. We do not consider the relative performance of all these algorithms since the reducts may be different.
727
4.1 Experiment setup
721 722 723 724 725
728 729 730 731 732 733 734 735 736
737 738 739 740 741
Experimental evaluation
We run four parallel algorithms on a cluster of 17 nodes. For distributed experiments, one is set as a master node and the rest are configured as slave nodes. Each node has 2GB of main memory and use Intel Pentium Processer with Dual-Core(2 cores in all, each has a clock frequency of 2.6GHz), and connects via an Ethernet (100Mbit/sec). For each node, we install Cygwin 2.697(a Linux-like environment in Windows), Hadoop 0.20.2 and Java 1.6.20. We make the following changes to the default Hadoop configuration: we run two map and two reduce tasks in parallel on each node, and set the replication factor to 3. We conduct an extensive series of experiments on one commonly used machine learning data sets Mushroom from the UCI Machine Learning repository[9] and three synthetic big data(DS1, DS3, DS4). Each dataset has only one decision attribute. We duplicate the dataset Mushroom to generate a big data(DS2). Table 6 summarizes the characteristics of each dataset. Table 6 Description of the datasets
742
743 744 745
Datasets
objects
attributes
Classes
Memo
Size(GB)
DS1
10,000
5,000
2
single level
1.3
DS2
40,620,000
22
2
single level
2.2
DS3
40,000,000
30
5*5*5=125
multiple levels
2.6
DS4
40,000,000
50
9*9*9=729
multiple levels
4.3
4.2 The running time on different datasets
For big data, we partition it as many splits in data parallel using MapReduce, and deal with each data split in task parallel. Fig. 5 shows the running time and ratio of selecting some most important attributes in each iteration for 26
746 747 748 749 750 751 752
HARAPOS and HARABND. From Fig. 5, one can check that the running time of HARAPOS is longer than that of HARABND, moreover the ratio of HARAPOS/HARABND is higher when the number of selected attributes increases for big data with high dimensions. Note that the first value in each subgraph denotes the running time of computing the classification capability for the original dataset. Hence we focus on the performances of HARABND, HARADIS and HARAInfo on DS2, DS3 and DS4. 4
2
(a) Running time on DS1 and DS2
x 10
(b) The ratio of HARAPOS and HARABND 14
HARAPOS on DS1 HARABND on DS1 HARAPOS on DS2 HARABND on DS2
1.8 1.6
Ratio on DS1 Ratio on DS2 12
10
1.2
8
Ratio
Running time[s]
1.4
1
6
0.8 0.6
4
0.4 2 0.2 0
0
1
2 3 4 Number of selected attributes
5
0
6
0
1
2 3 4 Number of selected attributes
5
6
Fig. 5. The comparisons of HARAPOS and HARABND on two datasets 753 754 755 756 757 758
Fig. 6 shows the running time of selecting some most important attributes in each iteration and the total running time for three datasets. In Fig. 6, we can check that three parallel algorithms exhibit a similar pattern of increase in the running time, since they compute the attribute significance from the boundary regions. Note that the running time of HARADIS is longer in that it uses BigInteger type in JAVA language as shown in Fig. 6(d).
766
For two datasets with multiple levels, we compute the running time of different hierarchical decision tables with cidj(cidj denotes the i th level of all conditional attributes and the j th level of decision attribute, respectively) as shown in Fig. 7. From Fig. 7, the running time of the decision table at higher levels is much longer than that at lower levels because a parallel hierarchical attribute reduction algorithm must select more attributes at higher level. However, the running times are almost long for decision attribute ascension because the sizes of the reducts of different parallel algorithms are the same.
767
4.3 The performance evaluations on different datasets
759 760 761 762 763 764 765
768 769
In what follows, we examine the speedup and scaleup of our proposed algorithms in data and task parallel. 27
(a) DS2
(b) DS3−2
800 600
HARABND HARADIS HARAInfo
400 0
1 2 3 4 Number of selected attributes
6000 4000 2000 0
5
4
Total running time[s]
Running time[s]
1000
5
8000 HARABND HARADIS HARAInfo
8000 Running time[s]
Running time[s]
1200
200
(c) DS4−2
10000
1400
0
1
3 5 7 9 Number of selected attributes
HARABND HARADIS HARAInfo
6000
4000
2000 0
11
0
2 4 6 Number of selected attributes
8
(d) Total running time for hierarchical attribute reduction
x 10
HARABND HARADIS HARAInfo
4 3 2 1 0
DS2
DS3−2
DS4−2
Fig. 6. The running times of three parallel algorithms
x 10
4
(a) condition level ascension 10 c3d3 c2d3 c1d3
9 8
(b) decision level ascension 10 c3d3 c3d2 c3d1
9 8
6 5 4
7
6 5 4
6 5 4
3
3
2
2
2
1
1
1
1
2
0
c3d3 c2d2 c1d1
8
3
0
4 x 10 (c) condition/decision level ascension
9
7 Running time[s]
7 Running time[s]
x 10
Running time[s]
4
10
1
2
0
1
2
Fig. 7. The running times on two datasets under different levels of granularity
770
771 772 773 774 775 776 777 778 779
4.3.1 Speedup In order to measure the speedup, we keep the dataset constant and increase the number of computers in the system. The perfect parallel algorithm demonstrates linear speedup: a system with m times the number of computers yields a speedup of m. However, linear speedup is difficult to achieve because of serial computing, the communication costs, the faults and the overheads in job scheduling, monitoring and control. We evaluate the speedup on datasets with different nodes. The number of nodes varied from 1 to 16. Fig. 8 shows the speedup of three parallel hierarchical attribute reduction algorithms for DS2, DS3-2 and DS4-2. 28
(a) DS2
(b) DS3−2
16 Linear HARABND HARADIS HARAInfo
16 Linear HARABND HARADIS HARAInfo
8
4
8
4
1
4
8 Number of nodes
1
16
Linear HARABND HARADIS HARAInfo
12 Speedup
12 Speedup
Speedup
12
1
(c) DS4−2
16
8
4
1
4
8 Number of nodes
1
16
1
4
8 Number of nodes
16
Fig. 8. The speedup of different parallel algorithms 780
781 782 783 784 785 786
4.3.2 Scaleup Scaleup is defined as the ability of a m-times larger cluster to perform a mtimes larger job in the same run-time as the original system. To demonstrate how well the three parallel algorithms handle larger datasets when more slave nodes are available, we have conducted the scaleup experiments where we make the size of the datasets grow in proportion to the number of salve nodes in the common computer cluster. Fig. 9 shows the scaleup of three parallel hierarchical attribute reduction algorithms. (b) DS3−2
(a) DS2 HARABND HARADIS HARAInfo
0.95
1 HARABND HARADIS HARAInfo
0.95
0.9 Scaleup
Scaleup
0.85
0.85
0.85
0.8
0.8
0.8
0.75
0.75
0.75
0.7
8
16 Number of cores
32
0.7
HARABND HARADIS HARAInfo
0.95
0.9
0.9 Scaleup
(c) DS4−2
1
1
8
16 Number of cores
32
0.7
8
16 Number of cores
32
Fig. 9. The scaleup of different parallel algorithms 787
788
5
789
In this paper, we first introduce granular computing for concept ascension, then define the hierarchical encoded decision table and discuss some the cor-
790
Conclusions
29
791 792 793 794 795 796
responding properties, and finally propose hierarchical attribute reduction algorithms in data and task parallel for big data using MapReduce, which are based on boundary region, discernibility matrix and information entropy. The experimental results demonstrate that the hierarchical attribute reduction algorithms using MapReduce can scale well and efficiently process big data on commodity computers in cloud computing.
798
Furthermore, the parallelization of other attribute reduction algorithms and the extended rough set models should be considered in the future.
799
Acknowledgments
800
809
The research is supported by the National Natural Science Foundation of China under Grant Nos: 61103067, 61305052, the Natural Science Foundation of Jiangsu Province under Grant No.BK20141152, the Key Laboratory of Cloud Computing and Intelligent Information Processing of Changzhou City under Grant No: CM20123004, Qing Lan Project of Jiangsu Province of China, the Natural Science Foundation of Universities of Jiangsu Province under Grant No. 13KJB520005, the Key Laboratory of Embedded System and Service Computing, Ministry of Education under Grant No: ESSCKF201303, the Natural Science Foundation and Doctoral Research Foundation of Jiangsu University of Technology under Grant Nos: kyy12018, kyy13003.
810
References
811
[1] V. Abhishek, L. Xavier, E. David, H. Roy, Scaling genetic algorithms using MapReduce, In: Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications, IEEE, 2009, pp.13-18.
797
801 802 803 804 805 806 807 808
812 813 814 815 816 817 818 819 820 821 822 823 824 825
[2] A. Bargiela, W. Pedrycz, Toward a theory of granular computing for human centered information processing, IEEE Transactions on Fuzzy Systems, 16(2) (2008) 320-330. [3] C.T. Chu, S. Kim,Y.A. Lin, Y.Y. Yu, G. Bradskl, A. Y. Ng, K. Olukotun, MapReduce for machine learning on multicore, In: Proceedings of the 20th Conference on Advances in Neural Information Processing Systems(NIPS2006),Vol.6, 2006, pp.281-288. [4] J.H. Dai, W.T. Wang, H.W. Tian, L. Liu, Attribute selection based on a new conditional entropy for incomplete decision systems, Knowledge-Based System 39 (2013) 207-213. [5] J. Dean, S. Ghemawat, MapReduce:Simplified data processing on large clusters, Communications of the ACM 51(1) (2008) 107-114.
30
826 827 828 829
830 831
832 833
834 835 836
837 838
839 840
841 842 843
844 845
846 847 848
849 850 851
852 853 854
855 856 857
858 859 860
861 862 863
[6] D.Y. Deng, D.X. Yan, J.Y. Wang, Parallel reducts based on attribute significance, In: J. Yu, S. Greco, P. Lingras, et al.(Eds.): Rough Set and Knowldge Technology, Lecture Notes in Computer Science, vol. 6401, Springer, Berlin/Heidelberg, 2010, pp.336-343. [7] I. Duntsch, G. Gediga. Simple data filtering in rough set systems. International Journal of Approximate Reasoning , 18(1) (1998) 93-106. [8] Q.R. Feng, D.Q. Miao, Y. Cheng, Hierarchical decision rules mining, Expert Systems with Applications 37(3)(2010) 2081-2091. [9] A. Frank, A. Asuncion. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. [10] S. Ghemawat, H. Gobioff, S.T. Leung, The Google file system, SIGOPS Operating Systems Review, 37(5) (2003) 29-43. [11] I. Guyan, A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research 3 (2003) 1157-1182. [12] J. Han, Y. Cai, N. Cercone, Data-driven discovery of quantitative rules in relational databases, IEEE Transaction on Knowledge and Data Engineering, 5(1)(1993) 29-40. [13] J. Han, Y. Fu, Mining multiple-lvel association rules in large database, IEEE Transaction on Knowledge and Data Engineering, 11(5)(1999) 798-805. [14] L.X. Han, C.S. Liew, J.V. Hemert, M. Atkinson, A generic parallel processing model for facilitating data mining and integration, Parallel Computing 37(2011) 157-171. [15] T.P. Hong, C.E. Lin, J.H. Lin, S.L. Wang, Learning cross-level certain and possible rules by rough sets, Expert Systems with Applications, 34(3)(2008) 1698-1706. [16] X.H. Hu, N. Cercone, Discovering maximal generalized decision rules through horizontal and vertical data reduction, Computational Intelligence, 17(4)(2001) 685-702. [17] Q.H. Hu, W. Pedrycz, D.R. Yu, J. Lang, Selecting discrete and continuous features based on neighborhood decision error minimization, IEEE Transactions on Systems, man, and Cybernetics-Part B: Cybernetics 40(1) (2010) 137-150. [18] J.Y. Liang, F. Wang, C.Y. Dang, Y.H. Qian, An efficient rough feature selection algorithm with a multi-granulation view, International Journal of Approximate Reasoning, 53(6) (2012) 912-926 [19] J.H. Li, C.L. Mei, Y.J. Lv, A heuristic knowledge-reduction method for decision formal contexts, Computers and Mathematics with Applications, 61(2011) 10961106.
31
864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902
[20] C.H. Liu, D.Q. Miao, N. Zhang, Graded rough set model based on two universes and its properties, Knowledge-Based Systems, 33(2012) 65-72. [21] Y.J. Lu, Concept hierarchy in data mining: specification, generation and implementation, Master Degree Dissertation, Simon Fraser University, Cnanda, 1997. [22] D.Q. Miao, G. Y. Wang, Q. Liu, et. al., Granular Computing: Past, Nowday and Future. Beijing: Science Publisher, 2007. [23] D.Q. Miao, Y. Zhao, Y.Y. Yao, F.F. Xu, H.X. Li, Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model, Information Sciences 179(24) (2009) 4140-4150. [24] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences 11(5) (1982) 341-356. [25] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, Boston, 1991. [26] Y.H. Qian, J.Y. Liang, W. Pedrycz, C.Y. Dang. Positive approximation: An accelerator for attribute reduction in rough set theory, Artificial Intelligence 174(9) (2010) 597-618. [27] J. Qian, D.Q. Miao, Z.H. Zhang, W. Li, Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation, International Journal of Approximate Reasoning 52(2) (2011) 212-230. [28] J. Qian, D.Q. Miao, Z.H. Zhang, Knowledge reduction algorithms in cloud computing, Chinese Journal of Computers, 34(12) (2011) 2332-2343. in Chinese. [29] A. Skowron, C. Rauszer, The discernibility matrices and functions in information systems, In: R. Slowi´ nski (Eds.), Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, Kluwer, Dordrecht, 1992. [30] A. Srinivasan, T. A. Faruquie, Sachindra Joshi, Data and task parallelism in ILP using MapReduce, Machine Learning 86(1) (2012), 141-168. [31] R. Susmaga, Tree-like parallelization of reduct and construct computation, In: S. Tsumoto et al.(Eds.): RSCTC 2004, LNAI 3066, pp. Springer Berlin Heidelberg, 2004, pp.455-464. [32] G.Y. Wang, H. Yu, D.C. Yang, Decision table reduction based on conditional information entropy, Chinese Journal of Computers 25(7) (2002) 760-766. in Chinese. [33] L.H. Wang, G.F. Wu, Attribute reduction based on parallel symbiotic evolution, Chinese Journal of Computers, 26(5) (2003) 630-635. in Chinese. [34] X.Z. Wang, T.T. Wang, J.H. Zhai, An attribute reduction algorithm based on instantce selection. Journal of Computer Research and Development, 49(11) (2012) 2305-2310. in Chinese.
32
903 904
905 906
907 908
909 910 911
912 913 914 915
916 917 918
919 920 921 922
923 924
925 926
927 928 929 930
931 932 933
934 935
936 937 938
939 940 941
[35] F. Wang, J.Y. Liang, Y.H. Qian, Attribute reduction: A dimension incremental strategy, Knowledge-Based System 39 (2013) 95-108. [36] W.Z. Wu, Y. Leung, Theory and applications of granular labelled partitions in multi-scale decision tables. Information Sciences, 181(18) (2011)3878-3897. [37] X. Xu, J. Jager, H.P. Kriegel, A fast parallel clustering algorithm for large spatial databases, Data Mining and Knowledge Discovery 3 (1999) 263-290. [38] Z.Y. Xu, Z.P. Liu, B.R Yang, et.al., A quick attribute reduction algorithm with complexity of max(O(|C||U |), O(|C|2 |U/C|)), Chinese Journal of Computers 29(3) (2006) 611-615. in Chinese. [39] Y. Yang, Z. Chen, Z. Liang, G. Wang, Attribute reduction for massive data based on rough set theory and MapReduce, In: J. Yu, S. Greco, P. Lingras, et al.(Eds.): Rough Set and Knowldge Technology, Lecture Notes in Computer Science, vol. 6401, Springer, Berlin/Heidelberg, 2010, pp.672-678. [40] X.B. Yang, M. Zhang, H.L. Dou, J.Y. Yang. Neighborhood systems-based rough sets in incomplete information system, Knowledge-Based Systems, 24(6)(2011), 858-867. [41] Y.Y. Yao, Stratified rough sets and granular computing, In: Dave, R.N. and Sudkamp. T. (Eds.): Proceedings of the 18th International Conference of the North American Fuzzy Information Processing Society, New York, USA, IEEE Press, 1999, pp.800-804. [42] Y.Y. Yao, Y. Zhao, Discernibility matrix simplification for constructing attribute reducts, Information Sciences 7 (2009) 867-882. [43] M.Q. Ye, X.D. Wu, X.G. Hu, D.H. Hu. Knowledge reduction for decision tables with attribute value taxonomies, Knowledge-Based Systems, 56(2014):68-78. [44] J.P. Yuan, D.H. Zhu, A hierarchical reduction algorithm for concept hierarchy, In: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA’06), IEEE Computer Society, Vol.1, 2006, pp.724-729. [45] L. A. Zadeh, Fuzzy sets and information granularity, In: M. Gupta, R. Ragade and R. Yager (eds.), Advances in Fuzzy Set Theory and applications, Amsterdam, North-Holland Publishing (1979), pp. 3-18. [46] J.B. Zhang, T.R. Li, D. Ruan, et.al., A parallel method for computing rough set approximations, Information Sciences 194 (2012) 209-223. [47] W.Z. Zhao, H.F. Ma, Q. He, Parallel K-Means clustering based on MapReduce, In: M.G. Jaatun, G. Zhao, and C. Rong (Eds.): Cloud Computing(CloudCom2009), Springer Berlin Heidelberg, 2009, pp. 674-679. [48] X. Zhang, C.L. Mei, D.G. Chen, J.H. Li, Multi-confidence rule acquisition oriented attribute reduction of covering decision systems via combinatorial optimization, Knowledge-Based System 50 (2013) 187-197.
33
942 943 944 945 946
[49] W. Ziarko, Acquisition of hierarchy-structured probabilistic decision tables and rules from data, Expert Systems, 20(5)(2003) 305-310. [50] D. Zinn, S. Bowers, S. K¨ ohler, B. Lud¨ ascher. Parallelizing XML data-streaming workflows via MapReduce, Journal of Computer and System Sceiences, 76(6) (2010) 447-463.
34