ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 Expert Systems with Applications xxx (2014) xxx–xxx 1
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa 4 5
Dynamic tolerant skyline operation for decision making
3 6
Q1
Junyi Chai a,⇑, Eric W.T. Ngai a, James N.K. Liu b a
7 8
b
9
Department of Management and Marketing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region
a r t i c l e
1 1 1 9 12 13 14 15 16 17 18
i n f o
Keywords: Decision making Skyline Preference analysis Personnel selection
Q2
a b s t r a c t Skyline operation is typical multicriteria decision making well documented in data engineering. The assumption of skyline operation is settled human preference, which may be subject to huge challenges in practical decision-making applications because it simplifies preference scenarios that are usually dynamic. This study establishes the mathematical formulation of dynamic preference in real settings. A decision approach called tolerant skyline operation (T-skyline) is completely developed, including its conceptual modeling, computation methods, and a skyline maintenance mechanism on a database. The method is established and its computation mechanism is designed, and both are evaluated through an empirical study of personnel selection and evaluation. We also analyze computation efficiency and system stability. The decision targets are fully achieved, the computation results are satisfactory, and the computation efficiency is rational. The effectiveness and advantages of the approach are significant, as illustrated in different real-world settings. Experiments facilitated the examination of the design and development of T-skyline operation by adopting real and public datasets to evaluate players in the National Basketball Association in the United States. The experiment results validate the practical viability of our decision model, which can inspire discussions in sport industries. The methodology used in this study is valuable for further academic research, particularly for the interdisciplinary investigation of decision making and data engineering. 2014 Published by Elsevier Ltd.
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
39 40
1. Introduction
41
Multicriteria decision making (MCDM) aims to provide decision makers (DMs) a knowledge recommendation amid a finite number of objects (also known as alternatives, actions, or candidates) evaluated from multiple viewpoints called criteria (also known as dimensions, attributes, or features). MCDM covers four issues: criteria analysis, sorting, ranking, and choice (Figueira, Greco, & Ehrgott, 2005). We are particularly interested in studies of preference MCDM, an issue well documented in various research fields. Representative research topics include preference learning in machine learning (e.g., Hüllermeier, Fürnkranz, Cheng, & Brinker, 2008), preference relations in cognitive science (e.g., Herrera-Viedma, Herrera, Chiclana, & Luque, 2004; Xu, 2007), preference query in data engineering (e.g., Stefanidis, Koutrika, & Pitoura, 2011), and preference programming or modeling in decision making (e.g., Greco, Matarazzo, & Slowinski, 2001; Salo & Punkka, 2011). By contrast, interdisciplinary studies of preference MCDM are limited despite
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
⇑ Corresponding author. Tel.: +852 5138 0601. E-mail addresses:
[email protected] (J. Chai),
[email protected] (E.W.T. Ngai),
[email protected] (J.N.K. Liu).
the significance of its complementary advantages. In the current study, we examine a data-engineering preference query technique for solving MCDM problems, namely, skyline operation.
57
1.1. Skyline operation
60
Preference query (Adomavicius & Tuzhilin, 2005; Stefanidis et al., 2011) aims to retrieve several objects from a database in which all outputted objects must fulfill one or several preferences preset by DMs. This issue has been studied from two aspects: top-K operation (Mamoulis, Yiu, Cheng, & Cheung, 2007) and skyline operation (Borzsonyi, Kossmann, & Stocker, 2001). Skyline operation aims to retrieve a set of objects from multidimensional datasets in which all outputted objects conform to the dominance principle: with objects A and B in a multidimensional dataset, A dominates B if A’s values are not inferior to B’s values in all dimensions and superior to B’s values in at least one dimension. Skyline is an elementary set in which objects cannot be dominated by other objects of the dataset. In other words, skyline operation is the acquisition of an object subset in which chosen objects fulfill all preference requirements and no object can dominate a skylining object under defined preferences.
61
http://dx.doi.org/10.1016/j.eswa.2014.04.041 0957-4174/ 2014 Published by Elsevier Ltd.
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
58 59
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 2
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
97
We illustrate skyline operation through a simple example of personnel selection (Robertson & Smith, 2001). National Basketball Association (NBA) technical statistics of the 2008–2009 regular season (retrieved from http://espn.go.com/nba/) involving 10 basketball players are shown in Table 1. The dimensions of the dataset include the index, the last name of the players, the affiliated team, and several key technical criteria, such as games played, minutes per game, number of assists (A), number of turnovers (T), and number of steals (S). In the database context, skyline operation accepts two user preferences, GAIN (large values are preferred) and COST (small values are preferred). If the human preference is preset to A and S with the GAIN type and to T with the COST type, the skyline of Table 1 can be calculated according to the above definition as object set {P1, P2, P3, P4, P5, P10}. Thus, the performance of the six skylining players will be recognized because any other object of this dataset cannot be superior to any of them. Meanwhile, the skyline objects are non-comparable with each other and are therefore deemed to have equal performance. The computed skyline may vary with the settings on the preferences. Once it considers only A and S with the GAIN type, for example, the corresponding skyline shall be {P1, P2, P3}.
98
1.2. Related works
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
We review literature on skyline operation from two aspects: the concepts and the computation methods. The initial concept of skyline operation was proposed by Borzsonyi et al. (2001). Since this primary study, its conceptual extensions have been comprehensively studied. Representative works include subspace skyline operation (Pei et al., 2006), R-tree-based skyline operation (Papadias, Tao, Fu, & Seeger, 2003), and constrained skyline operation (Lu, Jensen, & Zhang, 2011). Studies since 2010 have tended to the extension of skyline-based applications, such as using skyline operation as an aggregation function to build data cubes for fast online analytical processing (Yiu, Lo, & Yung, 2012) and extending skyline operation processing in peer-to-peer systems (Hose & Vlachou, 2011). As to the computation methods of skyline, Borzsonyi et al. (2001) pioneered two baseline algorithms, block–nest–loop (BNL) and divide-and-conquer (D&C). BNL compares every object with other objects and identifies its skyline membership if it cannot be dominated. D&C retrieves partial skylines from several subsets of datasets and merges all obtained partial skylines into a final result. Chomicki, Godfrey, Gryz, and Liang (2003) proposed the sort–filter skyline (SFS), which developed BNL by first sorting objects by a monotone function. Godfrey, Shipley, and Gryz (2005) proposed the linear elimination sort for skyline (LESS), which improved SFS by removing a part of objects in the sorting process. Papadias, Tao, Fu, and Seeger (2005) developed progresQ3 sive skyline computation. Zhang and Chomicki (2011) developed a framework for skyline preference queries in which preferences are set over the whole profile of the dataset instead of just over the dimensions. Table 1 The running case: an extracted technical statistics of NBA. No.
LN
TM
G
MPG
A
T
S
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
Nash Williams Kidd Paul Davis Ford Miller Wade Iverson Billups
Sun Jazz Nets Hornets Warriors Raptors 76ers Heat Nuggets Pistons
68 71 71 57 55 67 71 46 57 64
35.3 37.5 36.9 36.5 35.5 30.4 36.9 38.9 42.7 36.7
786 672 645 495 451 535 562 362 417 460
264 218 188 143 170 215 200 193 238 132
56 78 117 109 114 93 99 96 112 78
The development of skyline operation since 2013 has shown wider perspective. Huang, Jiang, Pei, Chen, and Tang (2013) examined the skyline distance to measure the minimum cost of upgrading a querying skyline point to the skyline. Hu, Sheng, Tao, Yang, and Zhou (2013) examined time efficiency in external memory to retrieve the skyline of N points in multidimensional space. Trimponias, Bartolini, Papadias, and Yang (2013) studied skyline query processing when a dataset was vertically decomposed into different servers. Zhang, Li, Hassan, Rajasekaran, and Das (2014) investigated a new problem of searching the skyline group.
128
1.3. Research motivations
138
Despite being well documented, existing studies are based on data engineering rather than on practical decision making. Preference query processing can be regarded as a classical MCDM problem (Figueira et al., 2005). Specifically, top-K operations aim to generate preference-ordered K objects and are therefore multicriteria ranking approaches, whereas skyline operations aim to find an object subset that consists of all superior objects and are therefore multicriteria sorting approaches (Chai & Liu, 2014). Studying skyline operation in the MCDM paradigm is important because of the following reasons. In decision-making fields, problems can be addressed on flexible assumptions for practical purposes, but this approach is usually ineffective for many objects (Slowinski, Greco, & Matarazzo, 2009), unless particular decision models (Chai, Liu, & Ngai, 2013) or specialized decision support systems (e.g., Ngai, Q4 Peng, Alexander, & Moon, 2014; Ngai et al., 2011) are used. On the other hand, studying skyline operation in data engineering can facilitate the processing of large datasets through database technologies, albeit under restrictive assumptions or conditions. Few studies have conducted an interdisciplinary investigation into skyline operation to fully utilize the complementary advantages of different fields, although Huang et al. (2013) have emphasized the competence of skyline operation in MCDM applications. In the current study, we develop a skyline operation applicable to a database environment but still capable of solving practical decision problems. Through a conventional mechanism, the outputted size of skyline operation is completely settled with respect to a dataset once query preferences are identified. The operation faces huge challenges in practical decision making because of the following scenarios: (1) the skyline size might be undesirable (e.g., too large or too small) and therefore requires mechanisms to control the outputs, and (2) the preference of DMs might be imperfect and therefore requires approaches for meeting dynamic predefined settings and adjustable outputs. Several studies have resolved related issues. Some studies (Lin, Yuan, Zhang, & Zhang, 2007; Lu et al., 2011; Tao, Ding, Lin, & Pei, 2009) have provided methods to find a representative subset of skylines. These methods can control the skyline size and remain valid when this size is larger than desired. For an outputted skyline that is too little to fulfill the DM’s needs, Chai, Liu, and Li (2013) designed mechanisms that accept marginal objects hierarchically. However, these mechanisms still failed to adjust the skyline along with dynamic settings of preference. Wong, Pei, Fu, and Wang (2009) argued that the most straightforward solution is to transfer dynamic skylines to conventional skylines, thus making existing skyline operations feasible. However, this solution requires the full materialization of datasets and time-consuming preprocessing and is thus prohibitive, providing a semi-materialization method via consideration of an implicit preference rather than of ordered values. Jiang, Pei, Lin, Cheung, and Han (2008) studied preference relations in a specific problem domain. Yiu, Lu, Mamoulis, and Vaitis (2011) provided preference query techniques for a spatial database. These studies partly refer to the dynamic preference of skyline
139
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
129 130 131 132 133 134 135 136 137
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx 191 192
operations, but their methods remain unsatisfactory for practical decision making.
193
1.4. Our solution
194
222
The main technical challenge for the current study is the development of skyline operation in the database context while fulfilling two practical purposes for decision making: (1) the skyline size can be controlled to fit the DMs’ preferences, and (2) the outputted skyline can be adjusted along with dynamic preference. The study proposes a thorough fundamental skyline operation called tolerant skyline (T-skyline) operation as the solution to our problem. First, we reformat skyline operation from the perspective of decision modeling. After the introduction of the preference intensity concept, the dominance relations of conventional skyline operation are redefined. We accordingly establish a type of decision-oriented skyline that includes preferred and non-preferred skylines via mathematical modeling. For real implementation in the database, two polynomial-time algorithms for computing T-skyline and the database maintenance mechanism of T-skyline are examined in detail. The definition and computational methods are validated and evaluated through an empirical study of personnel selection and problem evaluation in a real-world setting. The whole approach is satisfactory for decision making in multiple data experiments on real datasets of the 2010–2011 technical statistics on all NBA players. This paper is organized as follows. We formulate the general skyline operation problem in Section 2. Section 3 models and defines T-skyline. Section 4 develops a series of algorithms to compute T-skyline in the database context. Section 5 solves for continuous T-skyline maintenance in the database. Section 6 describes an empirical study evaluating actual NBA players. We evaluate computation efficiency in Section 7. Section 8 concludes the paper, emphasizes implications, and identifies limitations and potential works.
223
2. Problem description
224
Skyline operation arouses considerable attention because of its ability to retrieve useful information from a database. A data table stored in a database contains finite objects (also called tuples, alternatives, points, or items) described by multiple dimensions (also called attributes, criteria, or features). Each cell of the table corresponds to one object in the row and one dimension in the column. The values of cells are called attribute values (or information functions or records,). In this section, we formulate the general skyline operation problem. Table 2 shows the major notations frequently used in this paper. Formally, data tables (DTs) can be represented as a 4-tuple DT = (U, Q, V, g) with a finite object set U for x e U and an attribute
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
225 226 227 228 229 230 231 232 233 234 235
3
set Q for q a Q. The scale of values of attribute q is denoted by Vq for V = {Vq:q e Q}. Attribute values can be represented as gq(x):U Q ? V for gq(x) e Vq. In general, attribute values can have various forms, such as numbers, symbols, and linguistic terms. However, these values should be homogeneous to a specific attribute. The object of skyline operations is multicriteria DTs. Each criterion is defined in a preference function. Formally, a preference table can be represented as a 3-tuple PT = (U, P, f). This table includes a set U of objects x, a set P of criteria p, and a set of preference functions f. Criterion values related to x and p are the values of preference functions with attribute values gq(x) as independent variables. Criterion values can thus be denoted as fP[gq(x)] instead. Each singleton criterion pj contains reified fj. We call the set of fj as a preference system. For instance, denoting attribute values g qi ðxÞ ¼ v ðx; qi Þ and criterion values fpj ½g qi ðxÞ ¼ wðx; pj Þ, we understand that w(x, pj) is the value of preference function f with independent variable v(x, qi); hence, w(x, pj) = f(v(x, qi)). The simplest preference system is a set of unary linear functions. If the function monotonically increases, we call it the GAIN type, such as f(q) = q. Otherwise, we call it the COST type, such as f(q) = q. In other words, the values are preferenceordered. In our running case, criteria A and S have f(q) = q, and criterion T has f(q) = q. Most previous studies in data engineering assume the simplest preference system before constructing skyline operations.
236
3. Tolerant skyline operation
261
3.1. Dynamic preference modeling
262
Skyline operation strongly relies on subjective judgments. In the case of an imprecise preference system, the obtained skyline is difficult to satisfy. The present study defines preference intensity. By specifying preference intensity on the criteria, DMs can adjust the settled preference system while controlling the outputted skylines. We define the preference intensity function as follows.
263
Definition 1 (preference intensity). Considering the dominance relations of two values w(x, pj) and w(y, pj) on criterion pj, preference intensity can be defined as
270
Gðx; yÞ ¼ qðdÞ s:t: d ¼ wðx; pj Þ wðy; pj Þ: Variable d is the difference (D-value) of criterion value x over y. The function value G(x, y), which lies between 0 and 1, requires DMs to preset certain intensity functions that reflect their preference. We recommend six generalized intensity functions (Brans, Vincke, & Mareschal, 1986; Xu, 2001) as the commonly used types for selection by DMs (Table 3).
Table 2 The frequently used notations in this paper. Notations
Meanings
DT = (U, Q, V, g) v(x, qi)
Data table DT with the objects x e U, the attributes q e Q, the attribute value gq(x) and the scale of attribute values Vq Attribute value v(x, qi) of object x with respect to attribute qi, also denoted g qi ðxÞ
w(x, pj)
Criterion value w(x, pj) of object x with respect to criterion pj, also denoted fPj ½g q ðxÞ
fj
The DM-specified preference function fj with respect to criterion pj; wðx; pj Þ ¼ fpj ðv ðx; qi ÞÞ
PT = (U, P, f) xDPy G(x, y) = q(d) d = w(x, pj) w(y, pj) Dþ P ðxÞ & DP ðxÞ
Preference table PT with objects x e U, criteria p e P and preference function f Dominance relation: for x, y e U, x is superior or equal to y on criteria set P Preference intensity of x over y; the value of function G(x, y); where d = w(x, pj) w(y, pj) The D-value of x over y under criterion pj; d0(x, y) is the expected D-value of x over y Dominance granules: superiority set Dþ P ðxÞ and inferiority set DP ðxÞ with respect to the singleton object x
Sþ P ðkÞ & SP ðkÞ
T-skyline with respect to criteria set P: preferred T-skyline Sþ P ðkÞ and non-preferred T-skyline SP ðkÞ, where the DM-specified tolerant degree k e [1, 1) The comparable set CP(x) and the non-comparable set IP(x) with respect to the singleton object x; properties: Dþ P ðxÞ [ DP ðxÞ ¼ C P ðxÞ; C P ðxÞ [ I P ðxÞ ¼ U; C P ðxÞ \ I P ðxÞ ¼ £
CP(x) & IP(x)
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260
264 265 266 267 268 269
271 272
273 275 276 277 278 279 280 281
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 4
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
Table 3 The six types of generalized criteria. I
Usual criterion
III
Gaussian criterion
V
Linear criterion (i)
1 if d–0 0 if d ¼ 0 q(d) = 1 exp (d2/2e2)
qðdÞ ¼
qðdÞ ¼
1 d=e
if d < eor d > e if e 6 d 6 e
II
Quasi criterion
IV
Level criterion
VI
Linear criterion (ii)
1 if 0 if 8 <1 qðdÞ ¼ 1=2 : 0 8 <1 e1 qðdÞ ¼ jdj : e2 e1 0
qðdÞ ¼
d < eor d > e e6d6e if jdj > e2 if e2 P jdj > e1 if jdj 6 e1 if jdj > e2 if e2 P jdj > e1 if jdj 6 e1
skyline operations; that is, criterion values should strictly be superior/inferior to those of any other object at least on one criterion.’’ Second, each object x from U possesses two dominance granules. The superiority set includes the object x itself and all dominating objects with superior values at least on one criterion and without inferior values on any criterion. Meanwhile, the inferiority set includes the object x itself and all dominated objects with inferior values at least on one criterion and without superior values on any criterion. Thus, we can obtain the property Dþ P ðxÞ \ DP ðxÞ ¼ fxg. Third, the tolerant degree k is DM-specific. It is used to control the hierarchy of outputted skylines to meet DMs’ needs. In partic ular, jDþ P ðxÞj 6 1 and jDP ðxÞj 6 1 when k = 1, implying that Dþ ðxÞ ¼ fxg and D ðxÞ ¼ fxg. P P
333
294
Preference intensity is used to model DMs’ imprecise preference. In real applications, conventional skyline operation is considered the type I, in which q(d)=1 represents the difference between two criterion values. For the other five types, the threshold on G(x, y) needs to be materialized, thus offering an opportunity to measure similarity via various preference intensity functions q(d). For example, the quasi type can tolerate the similarity of continuous values. The Gaussian type can integrate group preference by controlling parameter e(e > 0) (e.g., e can be obtained from the normal distribution of preference intensities once it involves multiple participants). Types IV, V, and VI can be used in various problem domains by setting a threshold on q(d). Preference intensity is used to define the T-skyline operation approach.
295
3.2. T-skyline formulation
4. Tolerant skyline computation
346
296
This section provides a series of algorithms for computing T-skyline. We consider only preferred T-skyline, but non-preferred T-skyline can be similarly computed.
347
4.1. Update–approaching method
350
Pairwise comparison operation is the naïve method to compute skylines. Each object compares the values of its preference functions with those of all other objects on all criteria to obtain the dominance granules of each object. In a table PT:n k, this operation can be done under time complexity O(n2). For $x e U, one step of the iteration can partition the object set U into two subsets. We define these subsets as the comparable set CP(x) and non-comparable set IP(x). The properties include CP(x) [ IP(x) = U and CP(x) \ IP (x) = £. Set CP(x) consists of a superiority set Dþ P ðxÞ and an inferiority set D P ðxÞ. We thus can obtain the properties as þ Dþ P ðxÞ [ DP ðxÞ ¼ C P ðxÞ and DP ðxÞ \ DP ðxÞ ¼ fxg. The following assertions can then be easily proved as valid: (i) For objects x, y e U, if þ þ y 2 Dþ P ðxÞ is satisfied, then DP ðyÞ # DP ðxÞ and DP ðyÞ DP ðxÞ. (ii) For objects x, y e U, if y 2 D ðxÞ is satisfied, then D ðyÞ # D P P P ðxÞ þ þ and DP ðyÞ DP ðxÞ. With regard to the tolerant degree k, the following assertions can be easily proved as valid for $x e U:
351
328
Dominance relations are the core of the construction of T-skyline operation. Dominance relations are presented in terms of two aspects. First, for x, y e U, object x dominates object y under singleton criterion pj if w(x, pj) is superior or equal to w(y, pj) on preference function fj and is thus denoted as xDpj y. Second, for x, y e U, object x is dominated by object y under singleton criterion pj if w(x, pj) is inferior or equal to w(y, pj) on preference function fj and is thus denoted as yDpj x. Three terms need to be noted: ‘‘equal’’ means that the criterion values are the same, and ‘‘superior’’ and ‘‘inferior’’ are typical outranking relations on the value of the preference function. Dominance relations are established based on preference intensities. If generalized criteria are used for identification, an expected D-value d0 can be found. This value is subject to the threshold of G(x, y) and/or parameter e. Supposing that G(x, y) = q(d) s.t. d = w(x, pj) w(y, pj), the dominance relation can be given as follows: if d P d0, then w(x, pj) is superior to w(y, pj) or w(y, pj) is inferior to w(x, pj). For example, DMs can specify d0 = e of Type III or d0 = 0.2e1 + 0.7e2 of Type IV. If DMs specify a group-agreed threshold pffiffiffiffiffiffiffiffiffiffiffiffi G(x, y) P 0.8 on the Gaussian criterion with e = 0.5, jdj P 0:5 2 ln 5 is easily obtained. Therefore, the dominance relation can be defined as follows: if d P d0 or w(x, pj) = w(y, pj), then xDpj y. Under established dominance relations, dominance granules can be defined as the superiority set Dþ where P ðxÞ, Dþ ðxÞ ¼ fy 2 U : 8 p 2 P; yD xg, and the inferiority set D ðxÞ, where j pj P P D P ðxÞ ¼ fy 2 U : 8pj 2 P; xDpj yg. Based on this definition, our T-skyline definition includes two parts: (1) the preferred skyline Sþ P ðkÞ, þ where Sþ P ðkÞ ¼ fx 2 U : jDP ðxÞj 6 kg, and (2) the non-preferred sky line S P ðkÞ, where SP ðkÞ ¼ fx 2 U : jDP ðxÞj 6 kg. The number of objects in dominance granules is denoted by ||. The natural number k, where k e [1, 1), is called the degree of tolerant. This coefficient is installed by the DMs and is usually alterable with the requirements and size of the datasets.
329
3.3. Discussion
330
We discuss T-skyline operation from three perspectives. First, the dominance relation of T-skyline is based on relaxed outranking relations. This relation eliminates the restrictions from previous
282 283 284 285 286 287
288 289 290 291 292 293
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327
331 332
(1) (2) (3) (4)
If If If If
jDþ P ðxÞj 6 k jD P ðxÞj 6 k jDþ P ðxÞj > k jD P ðxÞj > k
is is is is
satisfied, satisfied, satisfied, satisfied,
then then then then
þ [y2Dþ ðxÞ Dþ P ðyÞ # SP . P [y2DP ðxÞ D ðyÞ # S P. P [y2Dþ ðxÞ DP ðyÞ: å Sþ P. P [y2DP ðxÞ Dþ P ðyÞ: å SP
334 335 336 337 338 339 340 341 342 343 344 345
348 349
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371
Based on this analysis, we compute T-skyline by frequently updating the object set U. For $x e U and given k, if jDþ P ðxÞj > k is satisfied in an iteration, then the inferiority set D ðxÞ can be elimP inated from U for the next iteration. If jDþ P ðxÞj < k is satisfied in an iteration, then the superiority set Dþ P ðxÞ can be eliminated from U for the next iteration and Dþ P ðxÞ can be accepted as the T-skyline þ Sþ P ðkÞ. If jDP ðxÞj ¼ k is satisfied in an iteration, then the comparable set C P ðxÞ ¼ Dþ P ðxÞ [ DP ðxÞ can be eliminated from U for the next þ iteration and DP ðxÞ can be accepted as the T-skyline Sþ P ðkÞ. The update–approaching (UA) method aims to minimize the number of iteration steps by frequently updating the sets of
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
372 373 374 375 376 377 378 379 380 381 382
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 5
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx 383 384 385 386 387
objects. We present the UA method via the following pseudocode. Algorithm I calculates the dominance granule of singleton objects in a dynamic preference system. This operation can be frequently called by other algorithms. Algorithm II conducts the UA computation. Algorithm I: Calculation of dominance granules Input: The singleton object x; its criteria value w(x, pj); each criterion pj is with preference function fj and preference intensity q. Output: Dominance granules Dþ P ðxÞ and DP ðxÞ. Description: 1: for $pj e P 2: for $y e U 3: compare w(x, pj) with w(y, pj) on fj and q 4: if w(x, pj)is superior or equal to w(y, pj) 5: D y pj ðxÞ 6: 7:
if w(x, pj)is inferior or equal to w(y, pj) Dþ y pj ðxÞ
8: 9: 10: 11:
end if end for compute dominance granule T þ T Dþ Dpj ðxÞ and D Dpj ðxÞ P ðxÞ ¼ P ðxÞ ¼ pj 2P
409
pj 2P
12: end for 13: return Dþ P ðxÞ and DP ðxÞ.
Input: Database and the DM-specified tolerant degree k. Output: Preferred T-skyline Sþ P with degree k. Description: 1: Initialization: Sþ P ¼ ; and goal set D = U 2: for $x e D 3: call Algorithm I on $y e D 4: if jDþ P ðxÞj > k update D ¼ D DP ðxÞ 5: then go to 2 þ 6: else if jDþ P ðxÞj < k update D ¼ D DP ðxÞ and þ þ Sþ ¼ S [ D ðxÞ P P P 7: then go to 2 þ 8: else jDþ P ðxÞj ¼ k update D ¼ D DP ðxÞ [ DP ðxÞ and þ þ þ SP ¼ SP [ DP ðxÞ 9: then go to 2 10: end for
430
4.2. Extremum–UA method
431
The UA method eliminates the superiority set or inferiority set in each iteration step and processes the next iteration under the updated object set. UA is effective for general T-skyline computation. However, with k = 1, we can convert the UA method into an Extremum–UA (EUA) method to increase efficiency. First, we define extreme objects and extreme sets with respect to preference function fpj in criterion pj.
432 433 434 435 436 437 438 439 440
443
446 447
448
Maximum set X þP :
X þP ¼ [ xþpj ; Minimum set X P : pj 2P
X P ¼ [ xpj : pj 2P
450
Objects with extreme values are called extreme objects. An extreme set includes all extreme objects with consideration of all criteria. Clearly, we obtain the properties X þ P # U and X P # U. In addition, with respect to the tolerant degree k, the following assertions are valid for $x e U.
451
(1) If jDþ P ðxÞj–1 is satisfied in an iteration, then the inferiority set D P ðxÞ can be eliminated from the object set for the next iteration. (2) If jDþ P ðxÞj ¼ 1 is satisfied in an iteration, then the inferiority set D P ðxÞ can be eliminated from the object set for the next iteration and object x can be accepted as the T-skyline Sþ P ðk ¼ 1Þ.
457 456
Definition 2 (extreme objects). In preference table PT:U P for x e U and pj e P, the extreme object with respect to pj can be defined as follows:
We require that the iteration processes begin from extreme þ objects (9x 2 X þ P ). Meanwhile, the updating processes in both X P þ þ and U. Until X P ¼ ;, we obtain a portion of SP ðk ¼ 1Þ. The next iterations go on for the rest of the object set until it becomes empty. Compared with non-extreme objects, an extreme object is likely to exhibit skyline membership. Therefore, the inferiority set of extreme objects usually has the maximum size of non-skyline objects. If the extreme set is used as the starting point, frequent updating can eliminate most of the non-skyline objects. Such a mechanism can dramatically improve the efficiency of computation, compared with the UA method. Algorithm III is the pseudocode of the EUA method. The extreme þ set X þ P is first calculated, and iterations then start from 9x 2 X P by calling Algorithm I. Finally, iterations are conducted by initializing goal set D, where D # U X þ P.
452 453 454 455
458 459 460 461 462 463 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479
Algorithm III: The EUA method Input: Database; the tolerant degree is specified to be 1. Output: T-skyline Sþ P ðk ¼ 1Þ. Description: 1: Initialization: Sþ P ¼ ; and goal set D = U 2: for $pj e P 3: compute extreme set þ þ 4: Xþ P ¼ [ xpj where xpj ¼ fx : max½fpj ðxÞ; 8x 2 U; pj 2 Pg pj 2P
5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:
end for %% obtain extreme set X þ P for 9x 2 X þ P Dþ Algorithm I on $y e D P ðxÞ; DP ðxÞ þ if jDP ðxÞj–1, update D ¼ D D P ðxÞ then go to 6 þ else jDþ P ðxÞj ¼ 1, update D ¼ D DP ðxÞ and SP then go to 6 end for for $x e D %% initially D # U X þ P Dþ Algorithm I on $y e D P ðxÞ; DP ðxÞ if jDþ P ðxÞj–1, update D ¼ U DP ðxÞ then go to 13 þ else jDþ P ðxÞj ¼ 1, update D ¼ U DP ðxÞ and SP then go to 13 end for
x
x
505
441 444
445
464
Algorithm II: The UA method
429
Definition 3 (extreme sets). In preference table PT:U P for x e U and pj e P, the extreme set with respect to P can be defined as follows:
Maximum object xþpj :
xþpj ¼ fx :
Minimum object xpj : xpj ¼ fx :
max½fpj ðxÞ; min½fpj ðxÞ;
8x 2 U; pj 2 Pg;
8x 2 U; pj 2 Pg:
In preference table PT:n k, the naïve method needs k n2 steps of iteration. If the calculated object can be memorized in
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
506 507
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 6
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
514
frequent updating, it needs 0.5 kn(n 1) steps of iteration. Both UA and EUA methods can be done under time complexity O(n2). In the EUA method, the runtime is subject to the route of object selection. The worst case happens if the selected object satisfies D P ðxÞ ¼ fxg for each iteration, while thus to eliminate x itself. The best case happens if the selected objects fulfill Dþ P ðxÞ ¼ fxg for each iteration. Then, jSþ P ðk ¼ 1Þj steps of iterations are required.
515
4.3. An analysis of UA and EUA methods
516
In this section, we use Fig. 1 for illustration. We suppose that two attributes with GAIN type constitute a two-dimensional space. The scatterplot area represents the overall object set. Fig. 1(a) illus trates the superiority set Dþ fa;bg ðAÞ, the inferiority set Dfa;bg ðAÞ and the non-comparable set I{a,b}(A) with respect to point A. Curve BC roughly represents the conventional skyline. Fig. 1(b) gives three preferred T-skylines with respect to different k: Sþ fa;bg ðk ¼ 1Þ, þ Sþ fa;bg ðk ¼ 2Þ, and Sfa;bg ðk ¼ 3Þ. Curve EF is roughly represented as the non-preferred skyline S fa;bg ðk ¼ 1Þ. The UA method is illustrated in Fig. 1(c). The conventional skyline is given as Sþ fa;bg ðk ¼ 1Þ. Supposing that k = 3, points in area Sþ fa;bg ðk 6 3Þ should be the outputted skyline. The middle curve þ Sþ fa;bg ðk ¼ 3Þ represents the set of points with jDP ðxÞj ¼ 3. Fig. 1(c) represents five iterations from point i1 to point i5. For i1 and i4, their inferiority sets D P ði1 Þ and DP ði4 Þ are eliminated because both þ jDþ ði Þj and jD ði Þj are greater than 3. For i3 and i5, their superior1 4 P P þ þ ity sets Dþ P ði3 Þ and DP ði5 Þ are accepted by Sfa;bg ðk 6 3Þ but are elimþ inated from U because both jDþ ði Þj and jD P 3 P ði5 Þj are less than 3. Point i2 is accepted as the skyline, so its whole comparable set Dþ P ði2 Þ [ DP ði2 Þ is eliminated. After the first round of iteration, the scatterplot area shrinks to the slash area that will be the object set for the next iteration. Fig. 1(d) illustrates the EUA method. In
508 509 510 511 512 513
517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537
(a) The conventional skyline and dominance granules
(c) The UA method for general T-skyline
contrast to that in Fig. 1(c), the skyline operation starts from the extreme points i1 and i2. After the inferiority sets of i1 and i2 are eliminated in the first two iterations, the object set shrinks to area i1Oi2. After the iterations on i3, i4, and i5, the object set shrinks to the slash area. Fig. 1(c) and (d) suggest that the EUA method is more efficient than the UA method. We verify this observation through the empirical case in Section 7 (1). The presented UA/EUA method requires the calculation of dominance granules and the prompt updating of goal sets by eliminating objects. Although the method seems similar to classical computation methods, such as SFS, LESS, BNL, and D&C, it is fundamentally different. It uses the DM-specified tolerant degree as the threshold value to decide whether the superiority set or inferiority set should be eliminated. T-skyline operations are based on relaxed dominance relations, which are established on preference functions and preference intensities. To improve computational efficiency in this case (i.e., k = 1), the EUA method is developed as an alternative to the UA method.
538
5. Tolerant skyline maintenance
556
Continuous skyline maintenance on a database aims to update the calculated skyline after deleting ‘‘old’’ data. In general, this computation covers two issues: (1) object deletion when the criterion set is fixed and (2) criterion deletion when the objects are fixed. Many studies (e.g., Pei et al., 2006) have analyzed subspace skylines. Such concepts as skyline groups and decisive subspaces have been provided to explore relations among original criterion sets, criterion subsets, and criterion supersets. These studies have only partly resolved the second issue, although very few studies have addressed the first issue. In this section, we provide solutions
557
(b) The concept of T-skyline
(d) The EUA method for λ = 1 T-skyline
Fig. 1. Illustrations of T-skyline and the UA/EUA method.
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555
558 559 560 561 562 563 564 565 566
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 7
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx Table 4 The results of T-skyline with the comparisons under the general assumption (i.e., PI-1 in Table 5). The T-skyline operation
The SO-based skyline operation
k
The T-skyline objects
Size
Sj
Size
k=1 k=2 k=3
T. Chandler, K. Durant, N. Hilario, D. Howard, L. James, L. Jordan, S. Novak, D. Nowitzki, S. O’Neal R. Allen, S. Curry, A. Horford, K. Love, L. Odom, P. Pierce, D. Wade A. Afflalo, C. Anthony, A. Bynum, J. Evans, P. Gasol, B. Griffin
9 7 6
j=1 j=2 j=3
9 18 23
for continuously maintaining T-skyline when deleting objects from the dataset with respect to the fixed criterion set. Suppose that the deleted object set is denoted as VP(–£) and Sþ P ðkÞ is the calculated T-skyline. The updated skyline should still þ be Sþ P ðkÞ if SP ðkÞ \ V P ¼ ;. However, this skyline cannot be compliþ cated if SP ðkÞ \ V P –;. We suppose the set N ¼ Sþ P ðkÞ \ V P . After deletion, the rest of T-skyline is Sþ P ðkÞ N. Dominance granules from set N then correspondingly change. Specifically, the inferiority sets of objects from N need to be considered for the updated skyline. The union of these sets is [x2N D P ðxÞ. Therefore, the updated skyline can be obtained in consideration of a new object set þ [x2N D P ðxÞ [ ðSP ðkÞ NÞ. Algorithm IV provides the pseudocode to maintain this T-skyline.
567 568 569 570 571 572 573 574 575 576 577 578 579
Algorithm IV: Continuous T-skyline maintenance Sþ P ðkÞ;
Input: Known preferred skyline The DM-specified ~ where ~ tolerant degree k k 6 k; Deleted object set VP. Output: Updated preferred skyline e kÞ S þ ðe P
Description: ~ Initialization: e Sþ P ðkÞ ¼ ;; goal set D = £; let set
1:
N ¼ Sþ P ðkÞ \ V P 2: for $x e N _ _ Dþ P ðxÞ; DP ðxÞ
3:
Algorithm I on $y e U _
þ compute goal set D ¼ [ D P ðxÞ [ ðSP ðkÞ NÞ
4:
x2N
_
_
end for %% denoted Dþ P ðxÞ and DP ðxÞ for distinguishing. for $x e D Dþ Algorithm I on P ðxÞ; DP ðxÞ þ ~ update if jD ðxÞj > k
5: 6: 7: 8: 9:
P
then go to 6 jDþ P ðxÞj
k update D ¼ D Dþ <~ 10: else if P ðxÞ and do 11: then go to 6 þ ~ 12: else jDþ P ðxÞj ¼ k update D ¼ D DP ðxÞ [ DP ðxÞ and do þ þ ~ þ ~ e e S ðkÞ ¼ S ðkÞ [ D ðxÞ P
13: 14:
P
P
then go to 6 end for
602
This method has a prerequisite. Suppose that the known T-skyline is Sþ P ðk 6 3Þ. The updated skyline can be obtained if k = 1, k = 2, k of updated skylines or k = 3. In other words, the setting degree ~ should not be greater than the degree k of the known skylines, that k 6 k. is, ~
603
6. Empirical study
608
The performance evaluation of NBA players is a classic scenario of personnel selection and evaluation. The performance evaluation of players is an important and frequent activity in the sports industry, and evaluation results are crucial to the salary, career, and status of players. Thus, the decision approach to quantitative analysis is necessary for technical statistics on player performance. This study employs the NBA dataset in the 2010–2011 regular season. The original dataset contains 468 players with 26 attributes (http://espn.go.com/nba/). In accordance with practice, we consider the player if and only if his playing games (G) are greater than or equal to 25. A total of 383 NBA players are then enrolled. All computations are conducted in a PC configured with an Intel Core 2 Duo (T5750 @ 2.00 GHz) CPU with 4.00 GB memory. This empirical study examines two independent cases. Case (i) is used to examine the performance of T-skyline through comparison with other existing skyline operations. Case (ii) is used to illustrate T-skyline maintenance on the database. Both Case (i) and Case (ii) illustrate the problem-solving process using the proposed decision approach.
609
6.1. T-skyline with preference intensities on Case (i)
629
In this section, we use Case (i) to demonstrate the T-skyline results with respect to various preference intensities. Suppose that DMs are interested to know Who is (are) the most efficient player(s)? The DMs first establish a dynamic preference system that involves three preference functions fI, fII, and fIII. A criterion set is defined in P = {A, B, C} as follows:
630
(1) Criterion A: Personal efficiency per game: EFF = fI(PV, NV, G) = (PVNV)/G, where
636
Table 5 Three assemblies of preference intensities.
Q6
PI
Criterion A
Criterion B
Criterion C
PI-1 PI-2 PI-3
Type I and DM-specified G(x, y) = 1 Type III with e = 0.5, and DM-specified G(x, y) P 0.8 Type III with e = 0.5, and DM-specified G(x, y) P 0.9
Type I and DM-specified G(x, y) = 1 Type II with e = 0.05, and DM-specified G(x, y) = 1 Type II with e = 0.08, and DM-specified G(x, y) = 1
Type I and DM-specified G(x, y) = 1 Type V with e = 0.5, and DM-specified G(x, y) P 0.8 Type V with e = 0.5, and DM-specified G(x, y) P 0.9
Table 6 The results of T-skyline with the preference intensities PI-2 and PI-3. PI
1-Skylines
2-Skylines
3-Skylines
PI-2 PI-3
N. Hilario, D. Howard, L. James, L. Jordan T. Chandler, N. Hilario, D. Howard, L. James, D. Jordan
R. Allen, T. Chandler, A. Horford, S. O’Neal, D. Wade R. Allen, D. Nowitzki, S. O’Neal
A. Bynum, P. Gasol, D. Nowitzki, L. Odom A. Bynum, S. Curry, L. Odom, P. Pierce
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
604 605 606 607
610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628
631 632 633 634 635
637
638
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 8
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
Fig. 2. T-skyline with different preference intensities in Case (i).
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 9
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx Table 7 The comparison of T-skyline after object deletion. T-skyline
1-Skylines
2-Skylines
3-Skylines
n = 383 n = 332
B. Ronnie, J. Lin, R. Rajon, J. Williams A. Tony, B. Ronnie, C. Paul, R. Rajon
A. Tony, K. Jason, C. Paul B. Corey, D. Carlos, K. Jason, S. Thabo
B. Corey, D. Carlos, S. Thabo A. Ron, E. Monta, J. Jared, W. Julian
Positive values : PV ¼ PTS þ REB þ STL þ AST þ BLK Negative values : NV ¼ ðFGA FGÞ þ ðFTA FTÞ þ TO
640
(2) Criterion B: Shot efficiency: SE = fII(PTS, FT, FGA) = (PTSFT 1)/FGA (3) Criterion C: Score per game AVG: fIII(AVG) = AVG
641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691
Criterion A, which is used to detect the overall efficiency of basketball players, is commonly used in the NBA (i.e., in the real scenario). Criterion B is used to detect the efficiency of shots, that is, the average non-free throw personal total score of each field goal attempt. These criterion values should be around one. Criterion C is the total score per game with the GAIN type. The first advantage of the T-skyline is its hierarchical skyline results. By using the UA/EUA methods, we can compute the T-skyline with respect to k = 1, 2, 3. . . (Table 4). The setting k = 1 can return nine players with identical results by using conventional skyline operation after materializing a 3D preference table. The T-skyline can also provide seven players with setting k = 2 (e.g., R. Allen) and six players with setting k = 3 (e.g., C. Anthony). Such hierarchical results by using the T-skyline benefit decision making in two aspects. On the one hand, the T-skyline can reveal why some favored NBA players can or cannot become skyline members. For example, DMs are interested in the performance of player Kobe Bryant. The proposed operation can return the result as K. Bryant is 6-skylines.1 In addition to the 22 players shown in Table 4 (from k = 1 to k = 3), 9 players are 4-skylines (e.g., S. Nash) and 4 players are 5-skylines (e.g., D. Rose). All of these players are more efficient than K. Bryant in the 2010–2011 NBA regular season. On the other hand, the hierarchical T-skyline results facilitate the detection of imperfections of preference systems. In Case (i), for example, obtaining one rebound, assist, or block shot is more difficult than obtaining one score from a shot. In this sense, the player-evaluation system is unfair for excellent defensive players. Thus, DMs will learn that two players in our empirical study, A. Bynum and P. Gasol, have been underrated even though both are qualified as 3-skylines. Such merit can be valuable for real decision-making applications, particularly for personnel evaluation. The second advantage of the T-skyline is that the outputted size of the hierarchical T-skyline sets is highly controllable. In addition to the baseline competitor as a conventional skyline operation, we examine another representative skyline operation. Lu et al. (2011) provided a skyline order (SO)-based skyline operation to obtain a hierarchical skyline result. This operation is defined as a skyline sequence S = {S1, S2, . . ., Sn} with respect to the entire object set U. Sj for j e n can be understood as the conventional skyline set with P respect to the object set U j1 i¼1 Si . Under the same settings, we calculate the hierarchical SO-based skyline sets. The size of the outputted skyline sets is provided in Table 4. When k = 1 and j = 1, both operations return nine players who are also conventional skyline objects. When j = 2, the SO S2 contains 18 players, of which, if T-skyline is used, 7 players are 2-skylines, 6 players are 3-skylines, and 5 players exhibit skyline membership when k > 3. SO S2 contains more than two kinds of players (2-skylines, 1
For simplicity, the objects in each hierarchical T-skyline set with respect to the tolerant degree k can be represented as k-skylines. Thus, 6-skylines refers to the objects with skyline membership when k ¼ 6 excluding the objects with skyline membership when k ¼ 5.
3-skylines, and k-skylines when k > 3), which should be differentiated according to the DMs’ preference. By setting the tolerant degree k via T-skyline, DMs can control the outputted size of skyline results more precisely and flexibly than when they use the SObased and baseline conventional skyline operations. In other words, the results of the T-skyline are more controllable. The third advantage of the proposed operation is its particular setting of preference intensity that can effectively manage the dynamic preference of DMs. This characteristic is a breakthrough because T-skyline computation is can no longer do so under the general assumption, which can be represented as the simplest assembly of preference intensities PI-1 (Table 5). For comparison, these experiments are set to PI-1. We conduct T-skyline with the dynamic preference settings specified as two assemblies of preference intensities PI-2 and PI-3 (Table 5), which competitors cannot use. Table 6 shows the detailed computation results of using the T-skyline with PI-2 and PI-3. The preferred T-skyline with PI-2 is illustrated in Fig. 2(a). We show the first three hierarchical T-skylines as 1-skylines (which uses points), 2-skylines (stars), and 3-skylines (circles). The name of the corresponding player is also marked in this figure. Fig. 2(b) illustrates the preferred T-skyline with PI-3. Five players (T. Chandler, N. Hilario, D. Howard, L. James, and D. Jordan) are marked as 1-skylines. Three players are marked as 2-skylines, and four players as 3-skylines. By contrast, we can clearly view the changes with respect to the different PIs. The same setting of preference system (i.e., Criteria A, B, and C; Fig. 2) can produce different skyline results because of the changes of intensity function q(d). Compared with the T-skyline objects with PI-1 in Table 4, the obtained results under both PI-2 and PI-3 narrow down the number of qualified skyline objects in each hierarchy. Specifically, four players (i.e., K. Durant, S. Novak, D. Nowitzki, and S. O’Neal) are no longer 1-skylines. Several players also exhibit altered skyline membership with respect to different PIs. For example, D. Nowitzki is 1-skylines in PI-1, 2-skylines in PI-3, and 3-skylines in PI-2. In the same preference system, different settings of PIs can generate multiple skyline results, which enhance decision performance. Generally, skyline operations contain an inherent property: an object with the most preferred value in at least one criterion tends to be the 1-skylines. Such a property is adverse in several decision-making situations. For instance, DMs may have difficulty accepting the experimental result that S. Novak is as good as D. Nowitzki (who is 1-skylines in PI-1) because S. Novak has the most preferred values in both Criterion B and Criterion C. Such matters make the DM-specific evaluation system questionable. In T-skyline, the expected D-value d0 can control similarity in dominance relations via setting preference intensity functions. Thus, various human factors (e.g., group opinions and statistical results) can be accounted for in player evaluation and can thereby eliminate the strong influence of extreme values on the final outputted skyline. In our experiments on PI-2 and PI-3, S. Novak no longer has any skyline membership when k = 1, k = 2, or k = 3 (Fig. 2). This mechanism benefits many real-world applications.
692
6.2. Continuous T-skyline maintenance on Case (ii)
745
In this section, we use Case (ii) to demonstrate continuous T-skyline maintenance. Suppose that DMs are interested to know
746
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744
747
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 10
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
Fig. 3. Continuous T-skyline maintenance in Case (ii).
748 749 750 751 752
Who is(are) the most efficient ball stealer(s)? We first establish a dynamic preference system that includes three preference functions {fI, fII, fIII}. A criterion set is defined in P = {A, B, C}. (1) Criterion A: fI(STL, TO) = STL/TO; (2) Criterion B: fII(STL, PF) = STL/PF; (3) Criterion C: fIII(STL, MIN) = (STL/MIN) 48. Criterion A is the ratio
of the number of steals (STL) to the number of turnovers (TO). Criterion B is the ratio of the number of steals to the number of personal fouls (PF). Criterion C is the number of steals per 48 min. In practice, this preference system is commonly used to evaluate the guard’s ability to steal balls.
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
753 754 755 756 757
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
775
In this experiment, we update T-skyline after several objects are deleted from the original dataset (n = 383). Suppose that DMs want to find the T-skyline when considering only game starters. Fifty-one objects are then deleted because their values on the attribute game starting (GS) are equal to zero. The resulting T-skyline with Algorithm IV is shown in Table 7. Fig. 3 illustrates the comparable Tskyline results. The preferred T-skyline without object deletion is illustrated in Fig. 3(a). We show the first three hierarchical T-skylines as 1-skylines (star), 2-skylines (box), and 3-skylines (circle). The names of the corresponding players are also marked in this figure. Several players as non-preferred T-skylines are also marked. Fig. 3(b) illustrates the preferred T-skyline after object deletion. The deleted non-game-starters (51 players) are represented by ‘‘+’’. We also mark the names of 1-skylines players, including those of A. Tony, B. Ronnie, C. Paul and R. Rajon, and highlight four players as 2-skylines and another four players as 3-skylines. Jeremy Lin and Jason Williams as non-game-starters no longer have any skyline membership.
776
7. Evaluating computation efficiency
777
This section provides the running time analysis in terms of (i) the efficiency of the EUA and UA methods and (ii) the stability of the tolerant degree, particularly the influence of the tolerant degree on the different nature of datasets (i.e., the various dimensions and sizes of objects). We employ Case (i) to conduct this experiment.
758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774
778 779 780 781 782 784 783 785 786 787 788
(1) Efficiency of UA and EUA methods We examine the running time of UA and EUA with respect to five object sets (i.e., 3, 6, 9, 12, and 15 k) in consideration of five dimensions (i.e., 3D, 9D, 15D, 21D, and 24D). Generally, the curves
11
in Fig. 4(a) are smooth, and those in Fig. 4(b) are steep, suggesting that the running time of both algorithms linearly increases with the number of objects or dimensions. However, EUA is more sensitive to the increment of dimensions than UA, whereas UA is more sensitive to the increment of objects. Fig. 4(c) and (d) show that the running time of EUA is very stable (around 7–10 s) although the objects have various dimensions and sizes. Moreover, the running time of UA is distinctly increasing. Combining these results, we find that EUA is more efficient than UA in computing 1-skylines when the object set is large and of small dimensions. However, EUA cannot be used for all T-skylines (i.e., k P 2). UA can effectively conduct all T-skyline computations, although its efficiency leaves room for improvement. (2) Stability of tolerant degree The tolerant degree is an important coefficient that controls the outputted size of the skyline. In this section, we examine the stability of the tolerant degree in datasets of varying nature. We first employ the preprocessed dataset (n = 383) in Fig. 5(a) and (b). Fig. 5(a) shows the running time under three preference intensities (fixed in 3D), and Fig. 5(b) shows another four dimensions (i.e., 9D, 15D, 21D, and 24D) with the same PI. In the small dataset (n = 3), the tolerant degree k is stable with varying PIs (6 ± 0.8 s in Fig. 5(a)) or dimensions (7.3 ± 1.1 s in Fig. 5(b)). We vary the size of objects from 3 to 15 k and the size of dimensions from 3D to 24D (Fig. 5(c) and (d)) to further examine the stability of the tolerant degree under large dimensions and object sets. This experiment also shows that the tolerant degree is stable in (i) various object sizes (±2 for n P 3 k) and (ii) various dimensions (±2 for n P 3D). Therefore, the tolerant degree has no effect on the efficiency of T-skyline computation.
(a) Testing of EUA in diverse sizes of dimension and object set
(b) Testing of UA ( λ = 1 ) in diverse sizes of dimension and object set
(c) Comparison in diverse sizes of dimensions ( n = 9000)
(d) Comparison in diverse size of object sets ( d = 15D)
Fig. 4. Running time comparison of EUA and UA (k = 1).
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
789 790 791 792 793 794 795 796 797 798 799 800 801 802 804 803 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 12
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
(a) Testing in diverse preference intensities (n=383, 3D)
(b) Testing in diverse dimensions (n=383, PI-1)
(c) Testing in diverse sizes of object set (15D, PI-1)
(d) Testing in diverse dimensions (n=9k, PI-1)
Fig. 5. Running time with various tolerant degrees.
822
8. Implications and conclusions
823
Skyline operation as a preference query technology retrieves an object subset from the dataset in which all skylining objects are in a dominant position fulfilling the DMs’ preference of selection. Although previous studies have comprehensively provided various definitions of operations, computation methods, and application directions, almost all of them complied with the strict preference assumption and thus could not fulfill the needs of practical decision making, that is, to have an adjustable preference system and controllable skyline outputs. The solution in the present paper is a skyline operation approach for the general multicriteria sorting problem in the database context. This study comprehensively investigates a decision approach, T-skyline operation. First, T-skyline operation is defined by modeling DMs’ dynamic preference and formulating a T-skyline rationale. We then develop two database computation algorithms, the UA method for general T-skylines and the EUA method for particular T-skylines. Third, the algorithms are studied to maintain a computed T-skyline in the database. We measure the system running time to examine computational efficiency. For overall justifications of our approach, we empirically study a classical decision problem, namely, personnel selection and evaluation. The data experiments use historical data and real evidence. The results validate the practical viability of the proposed solution. The effectiveness and advantages of our approach are significant according to illustrations of the problem-solving process and comparisons with other existing operations.
824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848
This study has methodological and practical implications. In terms of methodology, this study is the first interdisciplinary study to develop skyline operation for MCDM problems. Such technology transfer that adopts data engineering technology for classic decision problems has great value for two reasons. First, preference query techniques can process large datasets through a database, but they cannot meet flexible settings and the requirements of practical decisions. Second, MCDM approaches are valid on relaxed assumptions, but they have to be confined to a rational number of objects or otherwise exponentially increase computation time (Slowinski et al., 2009). As a query technology, the proposed approach can process large datasets. For example, the running time of T-skyline is about 10 s for a 15,000-object and 3-dimension dataset under normal PC configurations according to our experiments. On the other hand, T-skyline operation as a modeled decision approach can effectively control and adjust outputs with DMs’ dynamic preferences. Interdisciplinary studies between the data-processing and decision-making fields, such as the present study, can inspire more thought and discussion. On the practical side, T-skyline operation aims to output a series of hierarchical subsets of mutually independent objects ordered by the DMs’ preference. Therefore, this operation is a classic multicriteria sorting process that aims to assign a collection of objects into several preference-ordered classes (Figueira et al., 2005; Zopounidis & Doumpos, 2002). Although our empirical study illustrates proper applications in personnel selection, this approach is also valid and significant for comprehensive application, as long as the arising problems are essentially sorting problems. Such
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
919
problems include supplier selection (e.g., Chai & Liu, 2014), warehouse evaluation (e.g., Chai, Liu, & Xu, 2013), financial analysis (e.g., Doumpos & Zopounidis, 2001), R&D project evaluation (e.g., Fernandez, Navarro, & Bernal, 2009), and many other areas (e.g., Kadzinski & Tervonen, 2013; Morais, de Almeida, & Figueira, in press; Silva, Costa, & De Gusmao, 2014). Moreover, the data experiments adopt real datasets from NBA technical statistics on all basketball players of the 2010–2011 regular season. Therefore, this study provides a complete consultation report of personnel evaluation based on actual evidence. The skyline results show certain practical significance for NBA sports industries. This model can be used in personalized recommendation (Koutrika & Ioannidis, 2004) and in similar athlete evaluations (Lewis, 2004; Young, 2008). As for our theoretical conclusions, the feature of T-skyline formulation is twofold. On the one hand, it considers two boundaries with regard to DMs’ preferences: preferred skylines for providing available objects and non-preferred skylines for providing adverse objects. In contrast to existing operations, this mechanism allows DMs to avoid negative decision results. On the other hand, it adopts DM-specific parameters as thresholds to control the level of skyline membership and thus facilitates the discovery of marginal skyline objects. T-skyline formulation makes the skyline size adjustable and partly controllable. Moreover, the developed polynomial-time algorithms are particularly appropriate for T-skyline computation because the operations described in literature are invalid. Evaluating computation efficiency makes the running time with an ordinary PC configuration satisfactory for most scenarios of practical decisions, albeit leaving room for further improvement. Further, in contrast to previous techniques with fixed outputs, our operation generates hierarchical skylines, and the outputs can be adjusted by parameters. Given the modeled preference systems, T-skyline can control its outputs along with the dynamic preferences of DMs. Future studies can be conducted in three directions. In terms of methodology, interdisciplinary studies of technology transfer can be carried out. Specifically, top-K queries, as another preference query technology, can be properly regarded as multicriteria ranking problems, which should be examined for flexible decision settings. In terms of theory, approaches that enhance the controllability and flexibility of skylining outputs should be developed. In terms of practice, our model and approach can be adapted to accommodate extensive decision-making areas.
920
Acknowledgments
921
926
The authors are grateful for the constructive comments of the referees on an earlier version of this paper. This research was supported in part by a Grant from The Hong Kong Polytechnic University (Grant number G-YN71). The first author would like to express special thanks to Dr. Man Lung YIU for the valuable discussions on the related issues.
927
References
928 929 930 931 932 933 934 935 936 937 938 939 940
Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. Borzsonyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of the international conference on data engineering (ICDE) (pp. 421– 432). Brans, J. P., Vincke, P., & Mareschal, B. (1986). How to select and how to rank projects: The PROMETHEE method? European Journal of Operational Research, 24(2), 228–238. Chai, J. Y., & Liu, J. N. K. (2014). A new believable rough set approach for supplier selection. Expert Systems with Applications, 41(1), 92–104. Chai, J. Y., Liu, J. N. K., & Li, A. M. (2013). A novel tolerant skyline operation for multicriteria decision support. Journal of Decision Systems, 22(3), 151–167.
877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918
922 923 924 925
13
Chai, J. Y., Liu, J. N. K., & Xu, Z. S. (2013). A rule-based group decision model for warehouse evaluation under interval-valued intuitionistic fuzzy environments. Expert Systems with Applications, 40(6), 1959–1970. Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2003). Skyline with presorting. In Proceedings of the international conference on data engineering (ICDE) (pp. 717– 816). Doumpos, M., & Zopounidis, C. (2001). Assessing financial risks using a multicriteria sorting procedure: The case of country risk assessment. Omega, 29(1), 97–109. Fernandez, E., Navarro, J., & Bernal, S. (2009). Multicriteria sorting using a valued indifference relation under a preference disaggregation paradigm. European Journal of Operational Research, 198(2), 602–609. Figueira, J., Greco, S., & Ehrgott, M. (2005). Multiple criteria decision analysis: State of the art surveys. London: Springer-Verlag. Godfrey, P., Shipley, R., & Gryz, J. (2005). Maximal vector computation in large data sets. In Proceedings of the international conference on very large data bases (VLDB) (pp. 229–240). Greco, S., Matarazzo, B., & Slowinski, R. (2001). Rough sets theory for multicriteria decision analysis. European Journal of Operational Research, 129(1), 1–47. Herrera-Viedma, E., Herrera, F., Chiclana, F., & Luque, M. (2004). Some issue on consistency of fuzzy preference relations. European Journal of Operational Research, 154(1), 98–109. Hose, K., & Vlachou, A. (2011). A survey of skyline processing in highly distributed environments. The Very Large Data Bases Journal, 21(3), 359–384. Hu, X., Sheng, C., Tao, Y., Yang, Y., & Zhou, S. (2013). Output-sensitive skyline algorithms in external memory. In Proceedings of ACM-SIAM symposium on discrete algorithms (pp. 887–900). Huang, J., Jiang, B., Pei, J., Chen, J., & Tang, Y. (2013). Skyline distance: A measure of multidimensional competence. Knowledge and Information Systems, 34(2), 373–396. Hüllermeier, E., Fürnkranz, J., Cheng, W., & Brinker, K. (2008). Label ranking by learning pairwise preferences. Artificial Intelligence, 172(16–17), 1897–1916. Jiang, B., Pei, J., Lin, X., Cheung, D. W., & Han, J. W. (2008). Mining preferences from superior and inferior examples. In Proceedings of the international conference on knowledge discovery and data mining (SIGKDD) (pp. 390–398). Kadzinski, M., & Tervonen, T. (2013). Stochastic ordinal regression for multiple criteria sorting problems. Decision Support Systems, 55(1), 55–66. Koutrika, G., & Ioannidis, Y. (2004). Personalization of queries in database systems. In Proceedings of the international conference on data engineering (pp. 597–608). Lewis, M. (2004). Moneyball: The art of winning an unfair game. New York: Norton. Lin, X., Yuan, Y., Zhang, Q., & Zhang, Y. (2007). Selecting stars: The k most representative skyline operator. In Proceedings of the international conference on data engineering (ICDE) (pp. 86–95). Lu, H., Jensen, C. S., & Zhang, Z. (2011). Flexible and efficient resolution of skyline query size constraints. IEEE Transactions on Knowledge and Data Engineering, 23(7), 991–1005. Mamoulis, N., Yiu, M. L., Cheng, K. H., & Cheung, D. W. (2007). Efficient Top-K aggregation of ranked inputs. ACM Transactions on Database Systems, 32(3), 1–47. Article 19. Morais, D. C., de Almeida, A. T., & Figueira, J. R. (in press). A sorting model for group decision making: A case study of water losses in Brazil. Group Decision and Negotiation, 1–24. Ngai, E. W. T., Li, C. L., Cheng, T. C. E., Lun, V. Y. H., Lai, K. H., Cao, J. N., et al. (2011). Design and development of intelligent context-aware decision support system for real-time monitoring of container terminal operations. International Journal of Production Research, 49(12), 3501–3526. Ngai, E. W. T., Peng, S., Alexander, P., & Moon, K. K. L. (2014). Decision support and intelligent systems in the textile and apparel supply chain: An academic review of research articles. Expert Systems with Applications, 41(1), 81–91. Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2003). An optimal and progressive algorithm for skyline queries. In Proceedings of the ACM conference on the management of data (SIGMOD) (pp. 443–454). Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 41–82. Pei, J., Yuan, Y., Lin, X., Jin, W., Ester, M., Liu, Q., et al. (2006). Towards multidimensional subspace skyline analysis. ACM Transactions on Database Systems, 31(4), 1335–1381. Robertson, I. T., & Smith, M. (2001). Personnel selection. Journal of Occupational and Organizational Psychology, 74(4), 441–472. Salo, A., & Punkka, A. (2011). Ranking intervals and dominance relations for ratiobased efficiency analysis. Management Sciences, 57(1), 200–214. Silva, M. M., Costa, A. P. C. S., & De Gusmao, A. P. H. (2014). Continuous cooperation: A proposal using a fuzzy multicriteria sorting method. International Journal of Production Economics, 151, 67–75. Slowinski, R., Greco, S., & Matarazzo, B. (2009). Rough sets in decision making. Encyclopedia of complexity and systems science. Springer, pp. 7753–7787. Stefanidis, K., Koutrika, G., & Pitoura, E. (2011). A survey on representation, composition and application of preferences in database systems. ACM Transactions on Database Systems, 36(3). Article No. 19. Tao, Y., Ding, L., Lin, X., & Pei, J. (2009). Distance-based representative skyline. In Proceedings of the international conference on data engineering (ICDE) (pp. 892–903). Trimponias, G., Bartolini, I., Papadias, D., & Yang, Y. (2013). Skyline processing on distributed vertical decompositions. IEEE Transactions on Knowledge and Data Engineering, 25(4), 850–862. Wong, R. C. W., Pei, J., Fu, A. W. C., & Wang, K. (2009). Online skyline analysis with dynamic preferences on nominal attributes. IEEE Transactions on Knowledge and Data Engineering, 21(1), 35–49.
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
Q5
941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026
ESWA 9310
No. of Pages 14, Model 5G
12 May 2014 14 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037
J. Chai et al. / Expert Systems with Applications xxx (2014) xxx–xxx
Xu, X. Z. (2001). The SIR method: A superiority and inferiority ranking method for multiple criteria decision making. European Journal of Operational Research, 131(3), 587–602. Xu, Z. S. (2007). A survey of preference relations. International Journal of General Systems, 36(2), 179–203. Yiu, M. L., Lo, E., & Yung, D. (2012). Measuring the sky: On computing data cubes via skylining the measures. IEEE Transactions on Knowledge and Data Engineering, 24(3), 492–505. Yiu, M. L., Lu, H., Mamoulis, N., & Vaitis, M. (2011). Ranking spatial data by quality preferences. IEEE Transactions on Knowledge and Data Engineering, 23(3), 433–446.
Young, M. E. (2008). Nonlinear judgment analysis: Comparing policy use by those who draft and those who coach. Psychology of Sport and Exercise, 9, 760–774. Zhang, X., & Chomicki, J. (2011). Preference queries over sets. In Proceedings of international conference on data engineering (pp. 1019–1030). Zhang, N., Li, C., Hassan, N., Rajasekaran, S., & Das, G. (2014). On skyline groups. IEEE Transactions on Knowledge and Data Engineering, 26(4), 942–956. Zopounidis, C., & Doumpos, M. (2002). Multicriteria classification and sorting methods: A literature review. European Journal of Operational Research, 138(2), 229–246.
Please cite this article in press as: Chai, J., et al. Dynamic tolerant skyline operation for decision making. Expert Systems with Applications (2014), http:// dx.doi.org/10.1016/j.eswa.2014.04.041
1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048