On fuzzy approaches for enlarging skyline query results

On fuzzy approaches for enlarging skyline query results

Accepted Manuscript On fuzzy approaches for enlarging skyline query results Djamal Belkasmi, Allel Hadjali, Hamid Azzoune PII: DOI: Reference: S1568...

3MB Sizes 3 Downloads 52 Views

Accepted Manuscript On fuzzy approaches for enlarging skyline query results Djamal Belkasmi, Allel Hadjali, Hamid Azzoune

PII: DOI: Reference:

S1568-4946(18)30570-2 https://doi.org/10.1016/j.asoc.2018.10.013 ASOC 5131

To appear in:

Applied Soft Computing Journal

Received date : 26 October 2017 Revised date : 11 July 2018 Accepted date : 4 October 2018 Please cite this article as: D. Belkasmi, et al., On fuzzy approaches for enlarging skyline query results, Applied Soft Computing Journal (2018), https://doi.org/10.1016/j.asoc.2018.10.013 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ON FUZZY APPROACHES FOR ENLARGING SKYLINE QUERY RESULTS Djamal Belkasmia,c , Allel Hadjalib , Hamid Azzounec a DIF-FS-UMBB,

Boumerdes 35000, Algeria [email protected] b LIAS, ENSMA, France [email protected] c LRIA/USTHB, Algiers 16000, Algeria [email protected]

Abstract In the last decade, skyline queries have gained much attention and are proved to be valuable for multi-criteria decisions. Based on the concept of Pareto dominance, they return the non-dominated points, called the skyline points. In practice, it may happen that the skyline only contains a small number of points which could be insufficient for the user needs. In this paper, we discuss two fuzzy-set-based approaches to enriching the small skyline with particular points that could serve the decision makers’ needs. The basic idea consists in identifying the most interesting points among the non-skyline ones. On the one hand, we introduce a novel fuzzy dominance relationship which makes more demanding the dominance between the points of interest. So, much points would be considered as incomparable and then as elements of the new relaxed skyline. On the other hand, we leverage an appropriate fuzzy closeness relation to retrieve non skyline points that are fuzzily close to some skyline points. Furthermore, we develop efficient algorithms to compute the relaxed variants of skyline. Extensive experiments are conducted to demonstrate the effectiveness of our approaches and analyze the performance of the proposed algorithms. A comparative study between the approaches presented is made as well. Keywords: Databases; Preference queries; Skyline queries; Pareto dominance; Skyline relaxation; Fuzzy relations.

Preprint submitted to Journal of LATEX Templates

October 9, 2018

1. Introduction Preference queries have gained much attention in the Database field in the recent years. Skyline queries [1] are a good example of SQL extensions that allow users to express their preferences in queries. Based on Pareto dominance relationship, skyline queries select all non-dominated points based on a multicriteria comparison. Let U be a set of d-dimensional points, a skyline query returns, the skyline S, set of points of U that are not dominated by any other point of U . A point p dominates, in the sense of Pareto, another point q iff p is better than or equal to q in all dimensions and strictly better than q in at least one dimension. One can observe that skyline points are incomparable. It is worthy to note that skyline queries have benefits many types of database applications since its introduction in the database field. They are widely used in non-trivial real applications, including multi-criteria decision making applications [2, 3], Web services such as hotel recommender [4], restaurant finder [5], and peer-to-peer network database [6]. With this widen use of skyline queries over several real life database applications, a lot of research studies have been devoted to efficient computing skyline and introducing multiple variants of skyline queries [7, 8, 9, 3, 10]. However, querying d-dimensional data sets using a skyline operator may lead to two possible scenarios: (i) a large number of skyline points returned, which could be less informative for users requirements, (ii) a small number of skyline points returned, which could be insufficient for users needs. To solve the problem stemmed from the first scenario, various approaches have been proposed to refine the skyline, therefore reducing its size [11, 12, 13, 14, 15, 16, 17, 2, 18]. While for the second scenario only very few works exist to relax the skyline in order to increase the number of skyline results [13, 19, 20]. In this paper, we address the problem of low skyline and propose advanced fuzzy-set-based solutions to enlarge it with a set of particular interesting (non-skyline) points. Such solutions exhibit a cooperative behavior in the sense that they assist the users to obtain the desired results to their skyline queries. Users’ preferences

2

and controlling are two key elements of our solutions. The former is leveraged to choose some specific skyline relaxation parameters and the latter allows ending the relaxation process when the results are satisfactory. In summary, the new main contributions made in this paper

1

are as follows:

1. We address the skyline relaxation problem by proposing two efficient fuzzy approaches MP2 R2 and C2 R 3 . The former relies on a novel fuzzy dominance relationship which makes more demanding the dominance between two points. As for the latter, it leverages an appropriate fuzzy closeness relation to retrieve non skyline points that are fuzzily close to skyline points. Both approaches allow adding new points to the skyline result. 2. For each approach, the semantic basis for the relaxed variant of skyline are discussed in depth. Then, optimized algorithms are developed to efficiently compute each variant of skyline. A theoretical complexity analysis of the proposed algorithms is also investigated. 3. We conduct a set of thorough experiments to study and analyze the relevance and effectiveness of the proposed approaches. A comparative study between these approaches and the Gonclaves and Tineo approah [19] is performed as well. The paper is structured as follows: In Section 2, we introduce some basic notions about fuzzy set theory and skyline queries. Section 3 describes the approaches MP2 R and C2 R for relaxing the skyline and discusses the semantic basis of each of them. The computation part of the relaxed variants of skyline is presented in Section 4. In Section 5, we provide an overview of existing works. Section 6 is devoted to the experimental study. Finally, we conclude and discuss some perspectives in Section 7. 1 Which

substantially extends, revises and improves our earlier conference works [21, 22] Preferred Relation for Relaxation 3 Closeness Relation for Relaxation

2 Much

3

2. Preliminary notions 2.1. Fuzzy sets The concept of fuzzy sets has been developed by Zadeh [23] in 1965 to represent classes or sets whose limits are imprecise. They can describe gradual transitions between total belonging and rejection. Typical examples of these fuzzy classes are those described with adjectives or adverbs natural language, as not expensive, fast and very close. Formally, a fuzzy set F on the universe X is described by a membership function µF : X −→ [0, 1], where µF (x) represents the degree of membership of x in F . By definition, if µF (x) = 0 then the element x does not belong to F , if µF (x) = 1 then x completely belongs to F , these elements form the core of F denoted by Cor(F ) ={x ∈ F |µF (x) = 1}. When 0 < µF (x) < 1, we talk about a partial membership, these elements form the support of F denoted by Supp(F ) ={x∈ F |µF (x) > 0}. Moreover, µF (x) is close to 1, more x belongs to F . Let x, y ∈ F , we say that x is preferred to y iff µF (x) > µF (y). If µF (x) = µF (y), then x and y have the same preference. Let F and G be two fuzzy sets on the universe X, we say that F ⊆ G iff µF (x) ≤ µG (x), ∀ x ∈ X. The complement of F, denoted F c , is defined by µF c (x) =

1 - µF (x). Furthermore, F ∩ G (resp. F ∪ G) is defined such that µF ∩G (x) = min(µF (x), µG (x)) (resp. µF ∪G (x) = max(µF (x), µG (x))). In practice, F can be represented by a trapezoid membership function (t.m.f) (α, β, ϕ, ψ), where [β, ϕ] is the core and ]α, ψ[ is its support (see figure 1).

Figure 1: Trapezoidal fuzzy set.

4

2.2. Skyline queries The notion of skyline queries was pioneered in [1]. Subsequently, the interest in this area has exploded: [1] has garnered over 2340 citations since 2001 (Google Scholar, September 2017). Skyline queries allow retrieving all non-dominated and the best points based on a crisp multi-criteria comparison. A point dominates, in the sense of Pareto, another one if it is as good or better than the other in all criteria and better in at least one criterion. Let D = (D1 , D2 , ..., Dd ), where Di is the domain of attribute Ai , a d-dimensional space. We define a relation R(A1 , A2 , ..., Ad ) in D and we assume the existence of a total order relationship on each domain Di . Definition 1. Let U = (u1 , u2 , ..., un ) be a set of d-dimensional data points (that corresponds to a set of database tuples). Let ui and uj two points of U . We said that ui dominate, in Pareto sense, uj iff ui is better than or equal to uj in all dimensions and better than uj in at least one dimension. Formally, we write: ui  uj ⇔ (∀k ∈ {1, .., d}, ui [k] ≥ uj [k])

^

(∃l ∈ {1, .., d}, ui [l] > uj [l])

(1)

where each point ui = (ui [1], ui [2], ui [3], ..., ui [d]) with ui [k] stands for the value of the point ui for the attribute Ak . In (1), and without loss of generality, we assume that the largest value, the better. We say then that ui is preferred to (resp. dominates) uj and we denote this by ui  uj . Definition 2. The skyline of U , denoted by S, is the set of points which are not dominated by any other point. u ∈ S ⇔ @u0 ∈ U, u0  u

(2)

As it can see, skyline queries compute the set of Pareto-optimal points in a relation, i.e., those points that are not dominated by any other point in the same relation. 5

Example 1. To illustrate the concept of the skyline, let us consider a database containing pieces of information on candidates as shown in Table 1. The list of candidates includes the following information: Code, Age, Management experience (man exp in years), Technical experience (tec exp in years) and distance work to Home (dist wh in Km). Table 1: List of candidates.

code

age

man exp

tec exp

dist wh

M1

32

5

10

35

M2

41

7

5

19

M3

37

5

12

45

M4

36

4

11

39

M5

40

8

10

18

M6

30

4

6

27

M7

31

3

4

56

M8

36

6

13

12

M9

33

6

6

95

M10

40

7

9

20

Ideally, personnel manager is looking for a candidate with the largest management and technical experience (Max man exp and Max tec exp), ignoring the other pieces of information. The traditional skyline returns the following candidates: M5 and M8 (i.e., S = {M5 , M8 }) (see figure 2). 3. Fuzzy skyline relaxation We discuss here our fuzzy approaches to skyline relaxation. Let Srelax and SF E be the relaxed skyline returned respectively by the approaches MP2 R and C2 R described formally in sub-sections 3.1 and 3.2. Both approaches rely on the main idea that consists of computing the extent to which a point, discarded by the Pareto-dominance relationship, may belong 6

Figure 2: Skyline of candidates

to the relaxed skyline. To this end, and as it will be illustrated further, we associate with each skyline attribute Ai (i ∈ {1, · · · , d}) a pair of parameters (γi1 , γi2 ) where γi1 and γi2 respectively denote the bounds of the relaxation zone allowed to the attribute Ai . A vector of pairs of parameters, denoted γ, is then defined as γ = ((γ11 , γ12 ), · · · , (γd1 , γd2 )). In the following, γ is called the relaxation parameter vector. Definition 3. Let γ and γ 0 be two vectors of parameters. We say that γ ≥ γ 0 0 0 0 0 if and only if ∀i ∈ {1, · · · , d}, (γi1 , γi2 ) ≥ (γi1 , γi2 ) (i.e., γi1 ≥ γi1 ∧ γi2 ≥ γi2 ).

3.1. MP2R relaxation approach MP2 R approach (see [21] for more details)relies on a particular fuzzy dominance relationship that allows enlarging the skyline with the most interesting points among those ruled out when computing the initial skyline S. This dominance relationship uses the relation ”Much Preferred (MP)” (see below its definition) to compare two points u and u0 . So, u ∈ U is an element of the relaxed variant of S, denoted Srelax , if there is no point u0 ∈ U such that u0 is

much preferred to u (denoted M P (u0 , u)) in all skyline attributes. Formally, we write: u ∈ Srelax ⇔ @u0 ∈ U, ∀i ∈ {1, ..., d}, M Pi (u0i , ui ) 7

(3)

where, M Pi stands for the relation MP defined on the domain Di of the attribute Ai and M Pi (u0i , ui ) expresses the extent to which the value u0i is much preferred to the value ui . Due the gradual nature of the relation MP, each element u of Srelax will be associated with a degree (∈]0, 1]) expressing the extent to which u belongs to Srelax . In fuzzy set terms, one can write: µSrelax (u) = 1 − max min µM Pi (u0i , ui ) = min max(1 − µM Pi (u0i , ui )) 0 0 u ∈U

u ∈U

i

(4)

i

The semantics of the relation M Pi can be expressed by the formulas (5) (see also figure 3).    0   0 µM P (γi1 ,γi2 ) (ui , ui ) = 1  i    (u0i −ui )−γi1 γi2 −γi1

if u0i − ui ≤ γi1

if u0i − ui ≥ γi2

(5)

else

Figure 3: The membership function µ

(γ ,γ ) M Pi i1 i2

For instance, if u0i − ui ≥ γi2 then u0i is completely much preferred to ui . One

can see that if u0i − ui > γi1 , u0i is not only preferred but much preferred to ui

to some extent. In terms of t.m.f., the fuzzy set associated with M Pi writes (γi1 ,γi2 )

(γi1 , γi2 , ∞, ∞), and denoted M Pi

(0,0)

. It is easy to check that M Pi

corresponds to the crisp preference relation expressing by means of the regular (γ)

relation ”greater than”. Now, let Srelax be the relaxed skyline Srelax computed on the basis of the relaxation vector γ (= ((γ11 , γ12 ), · · · , (γd1 , γd2 )) in the case of (0) d skyline attributes). One can easily check that the classical skyline S = Srelax with 0 = ((0, 0), · · · , (0, 0)). One can check that the following monotonicity property holds.

8

Proposition 1. Let γ and γ 0 be two vectors of parameters. Then, the following propriety holds ([21]): (γ 0 )

(γ)

γ 0 ≤ γ ⇒ Srelax ⊆ Srelax 0 0 0 0 Lemma 1. Let γ = ((0, γ12 ), · · · , (0, γd2 )) and γ 0 = ((γ11 , γ12 ), · · · , (γd1 , γd2 )),

the following result holds as well [21]: (0)

(γ 0 )

(γ)

Srelax ⊆ Srelax ⊆ Srelax 3.2. C2R relaxation approach Another way of relaxing the skyline S is to make it more flexible by adding points that strictly speaking do not belong to it, but are close to belonging to it. The idea is to identify interest points that are in the neighborhood of skyline points and add them to the skyline S. Let u be a point of U − S, and u0 a point of S. Then, u is an element of the relaxed variant of S, denoted SF E , if u is semantically close to u0 . We write: u ∈ SF E ⇔ ∃u0 ∈ S, such that ∀i ∈ {1, ..., d}, (ui , u0i ) ∈ Ci

(6)

where, Ci is a reflexive, symmetrical approximate indifference (or equality) relation defined on the domain Di of the attribute Ai and Ci (ui , u0i ) expresses the extent to which the value ui is close to the value u0i . Since Ci is of a fuzzy relation (defined below), each element u of SF E is associated with a degree (∈]0, 1]) expressing the extent to which u belongs to SF E . In fuzzy set terms, we write: µSF E (u) = max min µCi (ui , u0i ) 0 u ∈S

(7)

i

The semantics of the relation Ci can be provided by the formulas (8) (see also (γi1 ,γi2 )

figure 4). In terms of t.m.f., Ci writes (0, 0, γi1 , γi2 ), and denoted Ci is easy to check that

. It

(0,0) Ci

corresponds to the classical equality ”=”.    0 if |ui − u0i | ≥ γi2   µC (γi1 ,γi2 ) (ui , u0i ) = 1 if |ui − u0i | ≤ γi1  i  0   (γi2 −|ui −ui |) else γi2 −γi1

9

(8)

Figure 4: The membership function µ

(γ ,γ ) Ci i1 i2

(γ)

Now, let SF E be the relaxed skyline SF E computed on the basis of the relaxation vector γ (= ((γ11 , γ12 ), · · · , (γd1 , γd2 )) in the case of d skyline at(0)

tributes). One can easily check that the classical Skyline S = SF E , where 0 = ((0, 0), · · · , (0, 0)). Example 2. Let us illustrate the computation of SF E by considering the skyline S = {M5 , M8 } of Example 1. Assume that the fuzzy ”Closeness” relations, Ci , corresponding to the skyline attributes (man exp and tec exp) are respectively given by:    1   0 µC (1/2,2) (u, u ) = 0 man exp     (−2|u − u0 | + 4)/3    1   µC (1/2,4) (u, u0 ) = 0 tec exp     (−2|u − u0 | + 8)/7

if |u − u0 | ≤ 1/2 if |u − u0 | ≥ 2

(9)

else if |u − u0 | ≤ 1/2

if |u − u0 | ≥ 4

(10)

else

Now, applying the C2R approach, leads to the following relaxed skyline (with (γ)

γ = ((1/2, 2), (1/2, 4))): SF E = {(M5 , 1), (M8 , 1), (M3 , 0.66), (M10 , 0.66), (M1 , 0.28)}, see Table 2 and Figure 5. (γ)

Table 2: Degrees of the elements of SF E .

code

M5

M8

M3

M10

M1

µSF E

1

1

0.66

0.66

0.28

10

(γ)

Figure 5: Points retrieved by SF E

One can see that some candidates that were not in S are now elements of SF E (such M3 , M10 and M1 ), see figure 5. As can be seen, SF E is then larger than S (i.e., S ⊆ SF E ). Let us now take a glance at the content of SF E , one can observe that (i) the skyline elements of S are still elements of SF E with a degree equals 1; (ii) SF E includes other new elements whose degrees are less than 1 (such as M3 , M10 , and M1 ). Interestingly, the user can select from SF E : 1. the Top-k elements (k is a user-defined parameter): elements of SF E with highest degrees, or 2. the subset of elements, denoted (SF E )σ , with a degrees higher than a threshold σ provided by the user. In the context of example 2, it is easy to check that Top-3 = {(M5 , 1), (M8 , 1), (M3 , 0.85)} and (Srelax )0.5 = {(M5 , 1), (M8 , 1), (M3 , 0.66), (M10 , 0.66)}. Proposition 2. Let γ and γ 0 be two vectors of parameters. The following (γ)

(γ 0 )

monotonicity property holds: γ ≤ γ 0 ⇒ SF E ⊆ SF E . 0

(γ)

Proof 1. Let γ ≤ γ 0 , one can deduce that ∀i, Ciγ ⊆ Ciγ . Let u ∈ SF E (γi1 ,γi2 )

⇒ ∃u0 ∈ S, ∀i ∈ {1, · · · , d}, (ui , u0i ) ∈ Ci

⇒ ∃u0 ∈ S, ∀i ∈ {1, · · · , d}, µC (γi1 ,γi2 ) (ui , u0i ) > 0 i

⇒ ∃u0 ∈ S, ∀i ∈ {1, · · · , d}, µ

(γ 0 ,γ 0 ) i1 i2

Ci

(ui , u0i ) > µC (γi1 ,γi2 ) (ui , u0i ) > 0 i

11

0 0 (γi1 ,γi2 )

⇒ ∃u0 ∈ S, ∀i ∈ {1, · · · , d}, (ui , u0i ) ∈ Ci (γ 0 )

⇒ u ∈ SF E

(γ 0 )

(γ)

So we have SF E ⊆ SF E . 0 0 0 0 Lemma 2. Let γ = ((0, γ12 ), · · · , (0, γd2 )) and γ 0 = ((γ11 , γ12 ), · · · , (γd1 , γd2 )), (0)

(γ 0 )

(γ)

the following holds: SF E ⊆ SF E ⊆ SF E

(γ)

Example 3. For γ = ((1/2, 2), (1/2, 4)), SF E = {M5 , M8 , M3 , M10 , M1 } as

shown in Example 2. Now it is easy to check that for γ 0 = ((1/2, 1), (3/2, 3)) (γ 0 )

where γ 0 ≤ γ, SF E ={M5 , M8 , M3 , M10 }, see table 3 and figure 6. This result (γ 0 )

(γ)

is in agreement with Proposition 2 (i.e., SF E ⊆ SF E ). (γ 0 )

Furthermore, the points retrieved by SF E have lower degrees than those recov(γ)

ered by SF E . (γ 0 )

Table 3: Degrees of the elements of SF E

code

M5

M8

M3

M10

µSrelax

1

1

0.5

0.5

(γ)

(γ 0 )

Figure 6: Points retrieved by SF E and SF E

Let us now have a look to the results provided by MP2 R and C2 R approaches, we clearly have Srelax is larger than SF E (i.e., SF E ⊆ Srelax ), see figure 7. Then, the following proposition holds: 12

Figure 7: Points recovered by SF E and Srelax

(γ)

(γ)

Proposition 3. Let γ be a vector of parameters. We have: SF E ⊆ SRelax . (γ)

(γ)

(γ)

Proof 2. SF E ⊆ SRelax . Let us proof that the assumption u ∈ / SRelax and (γ)

u ∈ SF E leads to a contradiction. (γ)

(γ)

Since u ∈ / SRelax ⇒ ∃v ∈ U , ∀i, (vi , ui ) ∈ M Pi . (γ)

(γ)

Besides, since u ∈ SF E ⇒ ∃w ∈ S, ∀i, (wi , ui ) ∈ Ci .

Observe that w does not dominate v. (since ∀i; wi
(γ)

(γ)

M Pi , due to ∀i; (vi , ui ) ∈ M Pi , which contradicts ∀i; (wi , ui ) ∈ Ci ). Furthermore v does not dominate w either (since w ∈ S). Therefore, v and w are incomparable, implying that ∃j, ∃k, wj
in particular (vj , uj ) ∈ M Pj

(γ)

and (wj , uj ) ∈ Cj .

Let us show by reductio ad absurdum that it entails vj j wj . (γ)

Indeed, assume vj 4j wj ; (vj , wj ) ∈ M Pj

(γ)

entails vj
Hence a contradiction occurs since wj
(γ) Cj ).

(w.r.t. a

But in turn, vj j wj contradicts that ∃j, wj
the hypothesis we start with. (γ)

(γ)

Thus, assuming u ∈ / SRelax leads to a contradiction. Accordingly, ∀u ∈ SF E we (γ)

(γ)

(γ)

have u ∈ SRelax implying that SF E ⊆ SRelax .

13

4. Srelax and SF E Computation To compute the two relaxed variants of Skyline, i.e. Srelax and SF E , we propose a two-steps procedure (see figure 8): (i) the skyline computation step; and (ii) the skyline relaxation step. In the first step, we calculate the regular skyline

Figure 8: Relaxing skyline process

S using a slightly improved algorithm version, called LIBNL (see algorithm 1), of algorithm BNL proposed in [1]. The LIBNL algorithm uses a function named SkylineCompare(ui ,uj ) to evaluate the dominance , in the sense of Pareto, between ui and uj on all skyline dimensions and returns the result in the variable status. The result may be equal to: 0 if ui = uj , 1 if ui  uj , 2 if ui ≺ uj , 3 if they are incomparable. If the value of ui .dominated is equal to f alse then the point ui belongs to the skyline. In the second step, we propose two algorithms, named CRS (see algorithm 2) and FES (see algorithm 3) to calculate respectively the relaxed skylines Srelax and SF E . 14

Algorithm 1: LIBNL Data: A set of n points U Result: A skyline S; a set of dominated points U − S 1

begin

2

S = ∅;

3

for i := 1 to n − 1 do

4

if ¬(ui .dominated) then

5

for j := i + 1 to n do

6

status = 0;

7

if ¬uj .dominated then

8

Status = SkylineCompare(ui ,uj );

9

switch status do

10 11 12 13

14 15

16

case 1 do ui .dominated= true; case 2 do uj .dominated= true; if ¬(ui .dominated) then S = S ∪ {ui }; return S and U − S;

The main idea of the CRS algorithm is to identify the much preferred points among the ones ruled out by Pareto dominance when computing the skyline S. The key element of this algorithm is the fuzzy relation M Pk (introduced in the subsection 3.1) expressed by the function Compute(µM P (γ) ). Note that the k

lines 11, 12 and 14, 15 are used for the purpose of optimization of the execution time. The principle of FES algorithm is to retrieve some particular non-skyline points that are the closest ones to the skyline points in the sense of the fuzzy proximity relation C (introduced in subsection 3.2) expressed by µC (γ) . In the k

15

Algorithm 2: CRS Data: U − S set of n − m points; S skyline of m points; γ vector of parameters; Result: A relaxed skyline Srelax ; 1

begin

2

Srelax = S;

3

for i = 1 to n − m do

4

Vmax = 0;

5

for j = 1 to n − m do

6

if i 6= j then

7

Vmin = 1;

8

for k = 1 to d do

9

Compute(µM P (γ) (ui [k], uj [k])); k

10

Vmin = MIN(Vmin , µM P (γ) (ui [k], uj [k]));

11

if Vmin = 0 then

k

/* Allow to stop scanning all dimensions */ 12

break;

13

µSrelax (ui ) = 1 − MAX(Vmax , Vmin );

14

if Vmax = 1 then /* avoid scanning all dominated objects

15

16 17

*/

break; if µSrelax (ui ) > 0 then Srelax = Srelax ∪ {ui };

18

rank Srelax in decreasing order w.r.t. µSrelax (ui );

19

return Srelax or T op − k or (Srelax )σ ;

same way, lines 10, 11 and 13, 14 are used to optimize the execution time. We note that FES is more optimal than CRS algorithm in term of execution time.

16

Algorithm 3: FES Data: U − S set of n − m points; S skyline of m points; γ vector of parameters; Result: A relaxed skyline SF E ; 1

begin

2

SF E = S;

3

for i = 1 to n − m do

4

Vmax = 0;

5

for j = 1 to m do

6

Vmin = 1;

7

for k = 1 to d do

8

Compute(µC (γ) (ui [k], uj [k])); k

9

Vmin = MIN(Vmin , µC (γ) (ui [k], uj [k])); k

10

if Vmin = 0 then /* Allow to stop scanning all dimensions

11

break;

12

µSF E (ui ) = MAX(Vmax , Vmin );

13

if Vmax = 1 then /* avoid scanning all dominated objects

14

15 16

*/

*/

break; if µSF E (ui ) > 0 then SF E = SF E ∪ {ui };

17

rank SF E in decreasing order w.r.t. µSF E (ui );

18

return SF E or T op − k or (SF E )σ ;

4.1. Complexity issue Here, we provide a complexity analysis of the algorithms mentioned above.

17

4.1.1. Complexity of LIBNL’s algorithm LIBNL algorithm computes the skyline, using SkylineCompare function, between points of U . SkylineCompare has a complexity of O(d) = d where d is the number of the dimensions of the skyline. So, it is easy to check that the complexity of LIBNL algorithm writes

n(n−1) .d. 2

As in general, d is negligible w.r.t

n (i.e., d  n ), this complexity boils down to O(n2 ) (quadratic complexity). 4.1.2. Complexity of CRS’s algorithm The complexity of CRS algorithm involves the time of two functions: the first one computes the membership degree of each point of U − S using three loops. Then, its complexity is O((n−m).(n−m).d), as in general, d and m are negligible w.r.t n (i.e., d, m  n), one can deduce that this complexity is approximately equal to O(n2 ). The second one is a ranking function to rank-order relaxed

skyline points. It requires O(r.log(r)) where r = |Srelax |. Thus, the complexity

of CRS’s algorithm writes O(n2 ) + O(r.log(r)) ' O(n2 ) (quadratic complexity). 4.1.3. Complexity of FES algorithm To compute the membership degree of each point of U − S, FES algorithm proceeds in three loops. We can see that the complexity of these steps is O((n − m).(m).d). Knowing that d and m are negligible w.r.t n (i.e., d  n and m  n), this complexity reduces to O(n). So, the complexity of the algorithm FES is equal to O(n) +O(r.log(r)) = O(n+r.log(r)) ' O(n.log(n)) (logarithmic complexity) where O(r.log(r)) stands for the complexity of the last instruction and r = |SF E |. One can observe that FES algorithm is more optimal than CRS algorithm.

5. Related work Our study can be related to the previous works on skyline computation and controlling the skyline size. In this section, we review the major existing approaches on these two topics.

18

5.1. Skyline computation Computing skyline was first investigated by Kung et al. [24] in computational geometry. Bentley et al. [25] propose an efficient algorithm with an expected linear runtime if the data distribution on each dimension is independent. Borzsonyi et al. [1] are the pioneers that introduce the concept of skyline in the context of databases and propose an SQL syntax for skyline queries. They also develop the skyline computation techniques based on block-nestedloop and divide-and-conquer paradigms, respectively. To take advantage of the idea of pre-sorting, Chomicki et al. [26] provide another block-nested-loop based computation technique, SFS (sort-filter-skyline). The SFS algorithm was further significantly improved by Godfrey et al. [27]. The first progressive technique that can output skyline points without scanning the whole dataset was developed by Tan et al. [28]. Kossmann et al. [29] discuss another progressive algorithm based on the nearest neighbor search technique, which adopts a divide-and-conquer paradigm on the dataset. In Papadias et al. [18], the authors investigate a branch-and-bound algorithm (BBS) to progressively output skyline points on datasets indexed by an R-tree. One of the most important properties of BBS is that it minimizes the I/O cost. On the other hand, many variations of skyline computation have been explored. Pei et al. [30, 31] and Yuan et al. [32] study a skyline cube data structure that completely precomputes the skyline of all possible subspaces for a given data set. Xia and Zhang [33] address the incremental maintenance of skyline cubes. Tao et al. [34] develop the SUBSKY algorithm to answer subspace skyline queries efficiently in any subspace. To tackle the problem of skyline in high dimensional spaces, Chan et al. [17] relax the notion of dominance to k-dominance and propose the k-dominant skyline. Dellis and Seeger [35] propose the reverse skyline query, which consists of objects whose dynamic skyline contains a given query point q. The dynamic skyline of an object p corresponds to a transformed data space where p becomes the origin and all other points are represented by their distance vectors to p. Denis Mindolin [36] investigate the skyline computation issue in the case where some attributes are considered 19

to be more important than the others. Jiang and Pei [37] apply the skyline analysis on time series data, where every data object is a time serie. Balke et al. [38] and Wu et al. [39] address the skyline computation in distributed systems. Let us also note the work done in [16] where the problem, called top-k representative skyline points, is addressed. The idea is to compute k skyline points such that the total number of (distinct) data points dominated by one of the k skyline points is maximized. On the other hand, Khalefa et al. [8] introduce a new variant of skyline on incomplete data. In [9], the authors tackle the problem of skyline analysis on uncertain data and propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline. They define the concept of p-skyline that is a set of objects whose skyline probabilities are at least p. This work was also extended in [40]. In [7], the authors propose a possibility theory-based approach to the treatment of missing user preferences in skyline queries. They define an uncertain dominance relationships and introduce the notion of possibilistic contextual skyline. See also [41, 42]. 5.2. Controlling the skyline size Other works have focused on controlling the size of the skyline. These works can be divided into two categories: skyline refinement and skyline relaxation works. Table 4 provides a comparison between these works and our approaches w.r.t. five criteria: (i) type of work (fuzzy or not), (ii) Iterative nature, (iii) work with controlling, (iv) Preference-based work, (v) Transformation nature (RL: Relaxation or RF: refinement). Balke et al. [11] provide representative subsets (restricted skyline) of the skyline that can be used in an interactive query process. These subsets have to be both manageable in size and representative of the Pareto skyline. Moreover, they introduce the focused skyline as a highly selective subset of the restricted skyline biasing on points that show the best performance. In [13], Hadjali et al. introduce some ideas to define other novel variants of skyline. The first idea consists in refining the skyline by introducing some ordering between its 20

Table 4: Comparaing the different approaches

Criteria Approaches

(i)

(ii)

(iii)

(iv)

(v)

Jin et al.[20]

Not

No

No

Yes

RL

Balk et al.[11]

Not

Yes

No

Yes

RF

Goncalves et al.[19]

Fuzzy

No

Yes

Yes

RL

Abbaci et al.[12]

Fuzzy

Yes

Yes

Yes

RF

Not

Yes

No

Yes

RF

MP2R

Fuzzy

Yes

Yes

Yes

RL

C2R

Fuzzy

Yes

Yes

Yes

RL

Loyer et al.[43]

points in order to single out the most interesting ones. The second one tries to simplify the skyline either by granulating the scales of the criteria which may enable us to cluster points that are somewhat similar. In [43], Loyer et al. propose a flexible approach to classify and refine the skyline. The idea is to apply successive relaxations of the dominance conditions in accordance with user’s preferences. The approach, called θ-skyline, is based on decision theory which relates with decision-making in the presence of conflicting choices. To rank the skyline points, the authors define a global ranking method. As for skyline relaxation, only very few work have tackled this problem. Jin et al. [20] propose a novel concept, called thick skyline, which recommends not only skyline points but also their nearby neighbors within ε-distance. In this work, the recovered neighboring points are not associated with degrees allowing to rank-order them. On the contrary, in our approaches each recovered points is associated with a degree. In doing so, an order is defined between the relaxed skyline points and then the user can select only the desired results. Goncalves and Tineo [19] propose a flexible dominance relationship using fuzzy comparison operators in order to retrieve interesting dominated points. This increases the skyline with points that are only weakly dominated by any other point. Let us note that this approach acts on skyline queries by making more flexible the

21

query conditions. While in our approaches, another vision of skyline relaxation is discussed. It takes as starting point the skyline query results to proceed. Beyond the features pointed out above, our skyline relaxation approaches are fundamentally user-centric where users preferences are in the heart of the relaxation process. They are also endowed with efficient controlling mechanisms to guarantee a size of the relaxed skyline that really serves the user decisions.

6. Experimental study In this section, we present the experimental study that we have conducted. The aim of this study is to prove and demonstrate the effectiveness of the proposed approaches (MP2 R and C2 R) and their ability to relax small skyline with the most interesting points. In addition, this study allows us to develop a comparative assessment on the quantitative and qualitative aspects of the relaxation process between our two approaches and the approach proposed by Goncalves and Tineo (denoted GT approach) [19] (since it is the closest work to our proposal) 6.1. Experimental environment All experiments were performed under Linux operating system, on a machine with an Intel core i7, a 8 GB RAM and a 500 GB of disk. All algorithms implementations were done with Java (a framework called SKY Relax has been developed). Dataset benchmark are generated using method described in [1] following three conventional distribution schemas (correlated, anti-correlated and independent). For each dataset, we consider different sizes (small: 5K and 50K, medium: 100K and 250K, large: 500K and 750K). Each point has 110 bytes, it contains an integer identifier (4 bytes), 12 decimal fields (96 bytes) with values in [0,1], and a string field of 10 characters. 6.2. Experimental results We vary a collection of parameters (see table 5) that could impact the process of relaxation. 22

Table 5: Set of parameters

Parameters

Values

Data Size

5K, 50K, 100K,

[D] Dataset distribution schema [DIS] Number of skyline dimensions [d]

Default values 5K

250K, 500K, 750K Correlated, Independent,

Correlated

Anticorrelated 2, 4, 6, 8, 10, 12

Relaxation thresholds

γi1 ∈ [0, 1]

[γ = (γi1 , γi2 ), i∈ {1, . . . , d}]

γi2 ∈ [0, 1]

2 γ = (0, 0.25)

In summary, the following points are considered: 1. C2 R vs. MP2 R and GT approach on the size of relaxed skyline and on the computation time • Impact of the variation of [DIS], [D] and [d] 2. C2 R vs. MP2 R w.r.t to [γ]: Impact of changes in γ = (γi1 , γi2 ) value on the size of relaxed skyline. 6.2.1. C2R vs. MP2R and GT approach on the size of relaxed skyline and on the computation time a. Impact of the variation of [DIS]

Figure 9 shows that the particularity of correlated data minimizes the size of relaxed Skyline compared to anti-correlated and independent data. We observe also that C2 R approach retrieves fewer points than MP2 R and GT approach because it is more demanding in the relaxation process. The computational time curve shows that C2 R execution time, for the three distributions, is largely low compared with the time of MP2 R and GT approach (from 10.05ms to 283.79ms and 2200ms for correlated data, from 42.51ms to 655.57ms and 2020ms for anticorrelate data and from 26.71ms to 649.89ms and 2010ms for independent data).

23

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 9: C2 R vs. MP2 R and GT approach [19] w.r.t [DIS].

b. Impact of the variation of [d]

It is well-known that the number of dimensions increases the size of the regular skyline; this phenomenon is known as ”the problem of dimensionality. C2 R and MP2 R lead to the same behavior where data are highly correlated (see figure 10). By analyzing the curve representing the computation time, we note that C2 R approach outperforms MP2 R and GT approach in terms of computing time (between 8.67ms to 176.88ms for C2 R and respectively from 116ms to 1240.74ms for MP2 R and from 232ms to 3480.74ms when dimensionality [d] vary from 2 to 12 ).

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 10: C2 R vs. MP2 R and GT approach [19] w.r.t [d].

24

c. Impact of the variation of [D]

The analysis of figure 11 shows that the size of relaxed Skyline delivered by C2 R and MP2 R are proportional to the size of the dataset. While in terms of execution time, the computation of C2 R approach is extremely faster (for example, to relax dataset with 500K tuples, C2 R takes 79.44ms while MP2 R and GT approach consume respectively 1302.64s ≈ 21mn and 1602.64s ≈ 26mn). As can be seen, this first part of the experimentation shows that C2 R approach

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 11: C2 R vs. MP2 R and GT approach [19] w.r.t [D].

is better and more optimal than MP2 R approach. We have also shown that ours proposed approaches are more optimal than the GT approach [19]. 6.2.2. C2R vs. MP2R w.r.t to [γ] In this subsection, we study the impact of the variation of (γi1 , γi2 ) values on the size and the computation time of relaxed skylines given by C2 R and MP2 R (note that the GT approach is not considered in this part of the experimentation). The idea is to vary both thresholds γi1 and γi2 . For the sake of simplicity, we use the same vector of parameters γ for both approaches, and since the data are normalized, we also use the same values of (γi1 , γi2 ) for all skyline dimensions (in sub-section 6.2.2.3, we discuss the use of different vector of parameters). We start with an initial skyline with a size of 2 and we analyze the variation of the number of points whose degrees µSrelax (u) and µSF E (u) are greater than 0. The following scenarios are worth to be discussed: 25

6.2.2.1. Scenario 1:. In this scenario, we fix γi1 and vary γi2 . The idea is to analyze the impact and influence of the relaxation zone on the effectiveness of C2 R and MP2 R approaches. We observe the following cases: Case 1: γi1 = 0 and γi2 ∈ {0, 0.25, 0.5, 0.75, 1}.

In this case, the analysis

of figure 12 shows that the size of the relaxed Skyline increases when the value of γi2 expands (this is due to the fact that the relaxation area becomes larger). We note that MP2 R covers more points than C2 R, which confirms that C2 R is more restrictive. The analysis of the curve related to computational time of both approaches (see figure 12) shows that C2 R provides a more optimal execution time and greatly exceeds MP2 R approach. Figure 13 illustrates in

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 12: Scenario 1: Fix γi1 and vary γi2 (case 1).

(a) Points retrieved by MP2 R

(b) Points retrieved by C2 R

Figure 13: γi1 = 0 and γi2 = 0.75 (case 1).

26

details the dispersion of points retrieved by both approaches when γi1 = 0 and γi2 = 0.75. We remark that no point (in the two relaxed skylines) is relaxed with a degree equals 1 (this is due to the value of γi1 = 0). Case 2: γi1 = 0.25 and γi2 ∈ {0.25, 0.5, 0.75, 1}.

In this case, we observe

the appearance of some retrieved points with degrees equal 1 (this is due to the fact that in this case γi1 > 0). The analysis of figure 14 shows that the value of γi2 controls the size of relaxation. One can also see that, whatever the values of γi1 and γi2 , C2 R is more efficient than MP2 R in term of computation time. The dispersion of points retrieved by both approaches (C2 R and MP2 R) is illustrated in figure 15.

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 14: Scenario 1: Fix γi1 and vary γi2 (case 2).

(a) Points retrieved by MP2 R

(b) Points retrieved by C2 R

Figure 15: γi1 = 0.25 and γi2 = 0.75 (case 2).

27

Case 3: γi1 = 0.5 and γi2 ∈ {0.5, 0.75; 1} In this case, the values of γi1 and γi2 are close to 1, which makes C2 R and MP2 R more permissive. We note that the size of the relaxed Skyline increases significantly when the value of γi2 expands (see figure 16). In term of computational time, C2 R is still more optimal and efficient than MP2 R.

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 16: Scenario 1: Fix γi1 and vary γi2 (case 3).

However, we observe that the relaxation process provides more points with degree equals 1 (see figure 17 for dispersion). This is due to the fact that the value of γi1 moves away from 0.

(a) Points retrieved by MP2 R

(b) Points retrieved by C2 R

Figure 17: γi1 = 0.5 and γi2 = 0.75 (case 3).

28

Case 4: γi1 = 0.75 and γi2 ∈ {0.75, 1} In figure 18, we note that the value of γi2 allows controlling the size of relaxation. However, we observe that the number of retrieved points with degrees equal 1 is more important in this case. One can also see that, whatever the values of γi1 and γi2 , the C2 R is more efficient than MP2 R in term of computation time.

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 18: Scenario 1: Fix γi1 and vary γi2 (case 4).

Figure 19 shows the dispersion of retrieved points. We observe that both approaches retrieve all dataset points (with different degrees) when the value of γi2 is equal to 1. Furthermore, we note that the number of retrieved points with degrees equal 1 is more important in this case.

(a) Points retrieved by MP2 R

(b) Points retrieved by C2 R

Figure 19: γi1 = 0.75 and γi2 = 1 (case 4).

29

6.2.2.2. Scenario 2:. In this scenario, we vary both thresholds γi1 and γi2 . The obtained results are shown in figure 20. The analysis of these curves shows that the relaxation process becomes more permissive when thresholds move away from the origin. Nevertheless, C2 R is always more selective than MP2 R on the number of points retrieved (i.e., |SF E | < |Srelax |). In this context (i.e., using the same parameter vector) the relation SF E ⊆ Srelax still hold. We also note that, in terms of computation time, C2 R is more efficient than MP2 R. An example of scattering points is illustrated by figure 21.

(a) Size of relaxed Skyline

(b) Execution time in ms

Figure 20: Scenario 2: Varying γi1 and γi2 .

(a) Points retrieved by MP2 R

(b) Points retrieved by C2 R

Figure 21: γi1 = 0.5 and γi2 = 0.75

30

6.2.2.3. Scenario 3:. In the previous experiments, we have considered the same relaxation parameter vector for both approaches. We discuss here the relaxation results when using different vectors of parameters γ and γ 0 to compute the relaxed skyline (note that the same parameter γ (resp. γ 0 ) is applied to all skyline dimensions). We observe the three following cases: 0 0 • γ < γ 0 , (i.e., ((γi1 < γi1 ) ∧ (γi2 < γi2 ))) 0 0 ))) ) ∧ (γi2 > γi2 • γ > γ 0 , (i.e., ((γi1 > γi1

• γ and γ 0 are incomparable. In this case, two situations can observed: 0 0 – ((γi1 < γi1 ) ∧ (γi2 > γi2 )), denoted by γ @ γ 0 . 0 0 – ((γi1 > γi1 ) ∧ (γi2 < γi2 )), denoted by γ A γ 0 .

6.2.2.4. Case 1: γ < γ 0 . Figure 22 shows all the scenarios where γ < γ 0 . One can see that in this case the size of SF E is larger than the size of Srelax (i.e., Srelax ⊆ SF E ). 6.2.2.5. Case 2: γ > γ 0 . When γ > γ 0 , contrary to the previous case, the size of SF E becomes smaller than the size of Srelax (i.e., Srelax ⊇ SF E ). Figure 23 illustrates the different scenarios possibles. 6.2.2.6. Case 3: γ and γ 0 incomparable.. In this case, we study the behavior of the relaxation using any unordered parameter vectors. Figures 24a and 24b show the various possible scenarios. The two following behaviors are observed: 1. γ @ γ 0 ⇒ |SF E | < |Srelax |

2. γ A γ 0 ⇒ |SF E | > |Srelax |

This means that the inclusion relation between Srelax and SF E does not always hold in the same sense.

31

Figure 22: Points retrieved by C2 R and MP2 R, Case 1: γ < γ 0 .

7. Conclusion and Perspectives In this paper, we have addressed the problem of skyline relaxation in a controlled way. The basic idea is to make the skyline more permissive by adding points that strictly speaking do not belong to skyline, but are not far from belonging to it. We have explored two strategies: first, we propose to recover the much interesting points among those discriminated by Pareto dominance relationship by leveraging a novel fuzzy dominance relationship Much Preferred (MP). Then, we advocate to enlarge the skyline with (non-skyline) points that

32

Figure 23: Points retrieved by C2 R and MP2 R, Case 2: γ > γ 0 .

are the closest to the skyline points in the sense of a particular fuzzy absolute closeness relation defined in a convenient way. In addition, we have implemented three approaches (C2 R, MP2 R and GT approaches) to compute different variants of relaxed skyline. The experimental study we done has shown that, on the one hand, and depending on the relaxation parameters , the C2 R approach is more restrictive than MP2 R approach when relaxing classic skyline and, on the other hand, the execution time of C2 R (which is better than the one of MP2 R) always remains acceptable and reasonable. On the other hand, the experiments have 33

(a) Points retrieved by C2 R, case: γ @ γ 0

(b) Points retrieved by MP2 R, case: γ A γ 0

Figure 24: Case 3: γ and γ 0 incomparable

also shown that our proposed approaches outperform the GT approach in term of execution time. It is worthy to note that C2 R like MP2 R involves various parameters, which can be used to control the size and the quality of the relaxed skyline. One can observe that via this study the end-users have at their disposal a tool that can help them choosing the appropriate skyline relaxation approach that meets their needs and expectations. As for future work, we plan to investigate: (i) the impact of dataset change (insert, update or delete points) on the behavior and effectiveness of our approaches ; (ii) the issue of skyline relaxation in the categorical attributes context and ; (iii) the idea of using the concept of fuzzy linguistic quantifiers in skyline relaxation in the spirit of the study done in [12].

References [1] S. B¨orzs¨onyi, D. Kossmann, K. Stocker, The skyline operator, in: Proceedings of the 17th International Conference on Data Engineering, April 2-6, 2001, Heidelberg, Germany, 2001, pp. 421–430. [2] C. Y. Chan, H. V. Jagadish, K. Tan, A. K. H. Tung, Z. Zhang, Finding kdominant skylines in high dimensional space, in: Proceedings of the ACM 34

SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006, 2006, pp. 503–514. [3] M. L. Yiu, N. Mamoulis, Efficient processing of top-k dominating queries on multi-dimensional data, in: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007, 2007, pp. 483–494. [4] M. D. Morse, J. M. Patel, W. I. Grosky, Efficient continuous skyline computation, Inf. Sci. 177 (17) (2007) 3411–3437. [5] J. J. Levandoski, M. F. Mokbel, M. E. Khalefa, Flexpref: A framework for extensible preference evaluation in database systems, in: Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA, 2010, pp. 828–839. [6] A. Vlachou, C. Doulkeridis, Y. Kotidis, M. Vazirgiannis, SKYPEER: efficient subspace skyline computation over distributed data, in: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, 2007, pp. 416–425. [7] A. Hadjali, O. Pivert, H. Prade, Possibilistic contextual skylines with incomplete preferences, in: Second International Conference of Soft Computing and Pattern Recognition, SoCPaR 2010, Cergy Pontoise / Paris, France, December 7-10, 2010, 2010, pp. 57–62. [8] M. E. Khalefa, M. F. Mokbel, J. J. Levandoski, Skyline query processing for incomplete data, in: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Canc´ un, M´exico, 2008, pp. 556–565. [9] J. Pei, B. Jiang, X. Lin, Y. Yuan, Probabilistic skylines on uncertain data, in: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007, 2007, pp. 15–26. 35

[10] A. A. Alwan, H. Ibrahim, N. I. Udzir, F. Sidi, Processing skyline queries in incomplete distributed databases, J. Intell. Inf. Syst. 48 (2) (2017) 399–420. [11] W. Balke, U. G¨ untzer, W. Siberski, Restricting skyline sizes using weak pareto dominance, Inform., Forsch. Entwickl. 21 (3-4) (2007) 165–178. [12] K. Abbaci, A. Hadjali, L. Lietard, D. Rocacher, A linguistic quantifierbased approach for skyline refinement, in: Joint IFSA World Congress and NAFIPS Annual Meeting, IFSA/NAFIPS 2013, Edmonton, Alberta, Canada, June 24-28, 2013, 2013, pp. 321–326. [13] A. Hadjali, O. Pivert, H. Prade, On different types of fuzzy skylines, in: Foundations of Intelligent Systems - 19th International Symposium, ISMIS 2011, Warsaw, Poland, June 28-30, 2011. Proceedings, 2011, pp. 581–591. [14] M. Endres, W. Kießling, Skyline snippets, in: Flexible Query Answering Systems - 9th International Conference, FQAS 2011, Ghent, Belgium, October 26-28, 2011 Proceedings, 2011, pp. 246–257. [15] E. H¨ ullermeier, I. Vladimirskiy, B. Prados-Su´ arez, E. Stauch, Supporting case-based retrieval by similarity skylines: Basic concepts and extensions, in: Advances in Case-Based Reasoning, 9th European Conference, ECCBR 2008, Trier, Germany, September 1-4, 2008. Proceedings, 2008, pp. 240– 254. [16] X. Lin, Y. Yuan, Q. Zhang, Y. Zhang, Selecting stars: The k most representative skyline operator, in: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, 2007, pp. 86–95. [17] C. Y. Chan, H. V. Jagadish, K. Tan, A. K. H. Tung, Z. Zhang, On high dimensional skylines, in: Advances in Database Technology - EDBT 2006, 10th International Conference on Extending Database Technology, Munich, Germany, March 26-31, 2006, Proceedings, 2006, pp. 478–495.

36

[18] D. Papadias, Y. Tao, G. Fu, B. Seeger, An optimal and progressive algorithm for skyline queries, in: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9-12, 2003, 2003, pp. 467–478. [19] M. Goncalves, L. Tineo, Fuzzy dominance skyline queries, in: Database and Expert Systems Applications, 18th International Conference, DEXA 2007, Regensburg, Germany, September 3-7, 2007, Proceedings, 2007, pp. 469–478. [20] W. Jin, J. Han, M. Ester, Mining thick skylines over large databases, in: Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20-24, 2004, Proceedings, 2004, pp. 255–266. [21] D. Belkasmi, A. Hadjali, MP 2 R : A human-centric skyline relaxation approach, in: Flexible Query Answering Systems 2015 - Proceedings of the 11th International Conference FQAS 2015, Cracow, Poland, October 26-28, 2015, 2015, pp. 227–241. [22] D. Belkasmi, A. Hadjali, H. Azzoune, Making the skyline larger: A fuzzyneighborhood-based approach, in: Information Processing and Management of Uncertainty in Knowledge-Based Systems - 16th International Conference, IPMU 2016, Eindhoven, The Netherlands, June 20-24, 2016, Proceedings, Part II, 2016, pp. 341–354. [23] L. A. Zadeh, Fuzzy sets, Information and Control 8 (3) (1965) 338–353. [24] H. T. Kung, F. Luccio, F. P. Preparata, On finding the maxima of a set of vectors, J. ACM 22 (4) (1975) 469–476. [25] J. L. Bentley, H. T. Kung, M. Schkolnick, C. D. Thompson, On the average number of maxima in a set of vectors and applications, J. ACM 25 (4) (1978) 536–543.

37

[26] J. Chomicki, P. Godfrey, J. Gryz, D. Liang, Skyline with presorting, in: Proceedings of the 19th International Conference on Data Engineering, March 5-8, 2003, Bangalore, India, 2003, pp. 717–719. [27] P. Godfrey, R. Shipley, J. Gryz, Maximal vector computation in large data sets, in: Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, 2005, pp. 229–240. [28] K. Tan, P. Eng, B. C. Ooi, Efficient progressive skyline computation, in: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11-14, 2001, Roma, Italy, 2001, pp. 301–310. [29] D. Kossmann, F. Ramsak, S. Rost, Shooting stars in the sky: An online algorithm for skyline queries, in: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, August 20-23, 2002, Hong Kong, China, 2002, pp. 275–286. [30] J. Pei, W. Jin, M. Ester, Y. Tao, Catching the best views of skyline: A semantic approach based on decisive subspaces, in: Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 - September 2, 2005, 2005, pp. 253–264. [31] J. Pei, A. W. Fu, X. Lin, H. Wang, Computing compressed multidimensional skyline cubes efficiently, in: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, 2007, pp. 96–105. [32] Y. Yuan, X. Lin, Q. Liu, W. Wang, J. X. Yu, Q. Zhang, Efficient computation of the skyline cube, in: Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30 September 2, 2005, 2005, pp. 241–252. [33] T. Xia, D. Zhang, Refreshing the sky: the compressed skycube with efficient support for frequent updates, in: Proceedings of the ACM SIGMOD 38

International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006, 2006, pp. 491–502. [34] Y. Tao, X. Xiao, J. Pei, SUBSKY: efficient computation of skylines in subspaces, in: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3-8 April 2006, Atlanta, GA, USA, 2006, p. 65. [35] E. Dellis, B. Seeger, Efficient computation of reverse skyline queries, in: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007, 2007, pp. 291–302. [36] D. Mindolin, J. Chomicki, Discovering relative importance of skyline attributes, PVLDB 2 (1) (2009) 610–621. [37] B. Jiang, J. Pei, Online interval skyline queries on time series, in: Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, 2009, pp. 1036–1047. [38] W. Balke, U. G¨ untzer, J. X. Zheng, Efficient distributed skylining for web information systems, in: Advances in Database Technology - EDBT 2004, 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, March 14-18, 2004, Proceedings, 2004, pp. 256–273. [39] P. Wu, C. Zhang, Y. Feng, B. Y. Zhao, D. Agrawal, A. El Abbadi, Parallelizing skyline queries for scalable distribution, in: Advances in Database Technology - EDBT 2006, 10th International Conference on Extending Database Technology, Munich, Germany, March 26-31, 2006, Proceedings, 2006, pp. 112–130. [40] B. Jiang, J. Pei, X. Lin, Y. Yuan, Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods, J. Intell. Inf. Syst. 38 (1) (2012) 1–39.

39

[41] P. Bosc, A. Hadjali, O. Pivert, On possibilistic skyline queries, in: Flexible Query Answering Systems - 9th International Conference, FQAS 2011, Ghent, Belgium, October 26-28, 2011 Proceedings, 2011, pp. 412–423. [42] S. Elmi, K. Benouaret, A. Hadjali, M. A. B. Tobji, B. B. Yaghlane, Computing skyline from evidential data, in: Scalable Uncertainty Management - 8th International Conference, SUM 2014, Oxford, UK, September 15-17, 2014. Proceedings, 2014, pp. 148–161. [43] Y. Loyer, I. Sadoun, K. Zeitouni, Personalized progressive filtering of skyline queries in high dimensional spaces, in: 17th International Database Engineering & Applications Symposium, IDEAS ’13, Barcelona, Spain October 09 - 11, 2013, 2013, pp. 186–191.

40

Highlights for Review

-

We discuss two relaxation strategies: first, we propose to recover the much interesting points among those discriminated by Pareto dominance relationship by leveraging a novel fuzzy dominance relationship.

-

Then, we advocate enlarging the skyline with points that are the closest to the skyline points in the sense of a particular fuzzy absolute closeness relation.

-

A set of properties of the proposed approaches are investigated and efficient optimized algorithms and their complexity are provided as well.

-

We develop and design a user-friendly framework for the skyline relaxation purpose endowed with rich and advanced functionalities.

-

A thorough experimental evaluation of our approaches is performed on large datasets.