Applied Soft Computing 10 (2010) 361–366
Contents lists available at ScienceDirect
Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc
Computing mean absolute deviation under uncertainty Barbara Gładysz, Adam Kasperski * Institute of Industrial Engineering and Management, Wrocław University of Technology, Wybrzez˙e Wyspian´skiego 27, 50-370 Wrocław, Poland
A R T I C L E I N F O
A B S T R A C T
Article history: Received 30 April 2008 Received in revised form 1 April 2009 Accepted 2 August 2009 Available online 11 August 2009
In this paper the problem of computing the mean absolute deviation in a set of uncertain variables is discussed. The uncertainty is modeled by closed intervals and fuzzy intervals. Some polynomial algorithms for determining the lower and upper bounds for the mean absolute deviation under interval uncertainty are proposed. Possibility theory is then applied to generalize the interval uncertainty representation to the fuzzy one. ß 2009 Elsevier B.V. All rights reserved.
Keywords: Absolute deviation Median Interval Fuzzy interval Possibility theory
1. Preliminaries Statistical analysis using survey sampling is a powerful tool to estimate the most common population parameters such as measures of central tendency or measures of deviation. In a classical statistical experiment we use a sample x ¼ ½x1 ; x2 ; . . . ; xn , where x1 ; x2 ; . . . ; xn are measurement results of independent identically distributed real-random variables X 1 ; X 2 ; . . . ; X n . We then estimate a parameter of a population by computing a function f ðx1 ; . . . ; xn Þ or, using some analytical methods, we describe a probability distribution of f ðX 1 ; . . . ; X n Þ [5]. In this paper we discuss the class of functions that measure a deviation of X 1 ; . . . ; X n . The following functions are well known and commonly used: varðX 1 ; . . . ; X n Þ ¼
dðX 1 ; . . . ; X n Þ ¼
n 1X ¯ 2; ðX XÞ n i¼1 i
n 1X ¯ jX Xj; n i¼1 i
dk ðX 1 ; . . . ; X n Þ ¼
n 1X jX X ½k j; n i¼1 i
(1)
(2)
(3)
where X ½k is the kth smallest element among X 1 ; . . . ; X n and X¯ ¼ P ð1=nÞ ni¼1 X i is the mean of X 1 ; . . . ; X n . Obviously, (1) is the
* Corresponding author. E-mail address:
[email protected] (A. Kasperski). 1568-4946/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2009.08.012
variance, (2) is the mean absolute deviation from mean and (3) is the mean absolute deviation from kth element. For the latter function, if n is odd, then fixing k ¼ d n=2 e we get the mean absolute deviation from median (if n is even, then k ¼ n=2 or k ¼ ðn=2Þ þ 1). In this paper we investigate an alternative approach to data analysis, which is based on interval computations [9,13,14]. Suppose that X 1 ; . . . ; X n are uncertain variables and we only know that the value of variable X i will fall within an interval x˜i ¼ ½xi ; x¯ i independently on the values of the remaining variables. No probability distribution for X i is known. Let us define G ¼ x˜1 x˜n . A vector x ¼ ½x1 ; . . . ; xn such that x 2 G is called a configuration and it represents a possible realization of the variables such that X i ¼ xi for i ¼ 1; . . . ; n. We will also represent configuration x 2 G as an ordered sequence ðx½1 ; x½2 ; . . . ; x½n Þ, where x½1 x½2 x½n and ð½1; . . . ; ½nÞ is a permutation of the set of indices f1; . . . ; ng. We will denote by f ðxÞ ¼ f ðx1 ; . . . ; xn Þ the value of function f under a fixed configuration x. In the interval computations, the basic problem is to compute the range of the values of function f, that is the smallest f ¼ min x 2 G f ðxÞ and the largest f¯ ¼ max x 2 G f ðxÞ values attained by this function over all configurations in G [13]. These two quantities form the tightest closed interval D f ¼ ½ f ; f¯ containing all possible values of function f. If f is the mean or the kth smallest element (in particular the median) of X 1 ; . . . ; X n , then computing D f can be done in polynomial time [8,10]. However, if f is the variance, then computing the largest value of f becomes NP-hard [11]. In this paper we discuss alternative measures of deviation, that is functions (2) and (3). We show that under the interval uncertainty
362
B. Gładysz, A. Kasperski / Applied Soft Computing 10 (2010) 361–366
representation, both the smallest and the largest values attained by these functions can be determined in polynomial time. We thus get some measures of deviation, for which the interval computations can be efficiently performed. Closed intervals are perhaps the simplest form of modeling the uncertainty. A more sophisticated uncertainty representation can be adopted by applying the concept of a fuzzy interval. A fuzzy interval is a special case of a fuzzy set in R, whose membership function is a possibility distribution for the values of an unknown quantity. Using possibility theory [7] we can describe a possibility distribution for the values of function f ðX 1 ; . . . ; X n Þ, where X 1 ; . . . ; X n are fuzzy variables with specified possibility distributions. In the literature, the characteristics of a single fuzzy variable such as mean, variance ad median were described in [1,3,6,12,17]. The mean and variance of a set of fuzzy variables were discussed in [8,9,15]. In this paper the class of functions (3) for a set of fuzzy variables is considered. This paper is organized as follows. In Sections 2 and 3 we describe the interval case. We show that, contrary to the variance, both the smallest and the largest value attained by mean absolute deviation from kth element (or mean) can be computed in polynomial time. In Section 4 we discuss an extension of the interval computations to the fuzzy case. Namely, we show how to obtain a possibility distribution for the values of the mean absolute deviations if X 1 ; . . . ; X n are fuzzy variables. 2. Interval mean absolute deviation from kth element In this section we focus on computing interval Ddk ¼ ½dk ; d¯ k for the mean absolute deviation from kth element, that is for the function (3). Let jk ðxÞ be the kth smallest element in configuration x, where 1 k n. If we use notation x ¼ ðx½1 ; . . . ; x½n Þ, then jk ðxÞ ¼ x½k . In particular, j1 ðxÞ is the minimal and jn ðxÞ is the maximal element of x. Furthermore, if n is odd, then j d n=2 e ðxÞ is the median of x and if n is even, then the median is either jðn=2Þ ðxÞ or jðn=2Þþ1 ðxÞ. For a given configuration x ¼ ½x1 ; . . . ; xn we have: dk ðxÞ ¼
n 1X jx jk ðxÞj: n i¼1 i
(4)
From now on we will assume that k d n=2 e . If k > d n=2 e , then we can transform the problem in the following way. Let M ¼ max i¼1;...;n fx¯ i g be the maximal upper bound of the uncertainty intervals. Define x˜0i ¼ ½M x¯ i ; M xi for i ¼ 1; . . . ; n and 0 0 k ¼ n k þ 1. Denote by G the Cartesian product of all x˜0i . Every 0 configuration x 2 G has the corresponding configuration x0 2 G 0 such that xi ¼ M xi for i ¼ 1; . . . ; n. Furthermore, jk0 ðx0 Þ ¼ M jk ðxÞ. We thus have: dk0 ðx0 Þ ¼
1 n
n X
jðM xi Þ ðM jk ðxÞÞj ¼
i¼1 0
1 n
n X
jxi jk ðxÞj ¼ dk ðxÞ:
Proposition 1. ([10]) jk ðxÞ 2 ½jk ðxÞ; jk ðxÞ. ¯
For any
configuration
Define W ¼ fx1 ; . . . ; xn g [ fx¯ 1 ; . . . ; x¯ n g and let W k ¼ W \ ½jk ðxÞ; jk ðxÞ. ¯ Observe that W k is the subset of bounds of the uncertainty intervals, which belong to ½jk ðxÞ; jk ðxÞ. Furthermore, jW k j n þ 1 ¯ since W k contains up to k upper bounds and up to n k þ 1 lower bounds of the uncertainty intervals. The following proposition characterizes a configuration that minimizes the value of dk ðxÞ. Proposition 2. There is a configuration xmin such that dk ðxmin Þ ¼ dk , which fulfills the following conditions: 1. jk ðxmin Þ 2 W k , 2. If x¯ i < jk ðxmin Þ, then xi ¼ x¯ i , 3. If xi > jk ðxmin Þ, then xi ¼ xi , 4. If jk ðxmin Þ 2 ½xi ; x¯ i , then xi ¼ jk ðxmin Þ. Proof. Assume that x ¼ ðx½1 ; . . . ; x½n Þ minimizes the value of dk ðxÞ. Observe first that for all i 2 f1; . . . ; k 1g either x½i ¼ x¯ ½i or x½i ¼ x½k . Otherwise, we could decrease the value of dk ðxÞ by increasing x½i (see formula (5)). Similarly, for all i 2 fk þ 1; . . . ; ng either x½i ¼ x½i or x½i ¼ x½k . Hence configuration x is of the following form: x¯ ½1 . . . x¯ ½ p1 < x½ p ¼ . . . ¼ x½k ¼ . . . ¼ x½r < x½rþ1 . . . x½n ; where 1 p k and k r n. Consider now values x½ p ; . . . ; x½r that are all equal to x½k in configuration x. Because x 2 G we have x˜½ p \ . . . \ x˜½r ¼ ½x; x ¯ 6¼ ? . Observe that x; x¯ 2 W. Suppose that p 1 n r. We can then transform x into x0 2 G by setting x½ p ¼ . . . ¼ x½r ¼ max fx; x¯ ½ p1 g. By Eq. (5) we have dk ðx0 Þ dk ðxÞ. Hence x0 also minimizes dk ðxÞ. Similarly, if p 1 < n r, then we obtain x0 by setting x½ p ¼ . . . ¼ x½r ¼ min fx; ¯ x½rþ1 g and in this case we also have dk ðx0 Þ dk ðxÞ. Configuration x0 fulfills condition 1, because jk ðx0 Þ 2 W k . It is easy to check that it also satisfies the remaining conditions 2–4. & Assuming that the kth element takes a given value from W k , we can construct an unique configuration that corresponds to this value, using conditions 2–4 from Proposition 2. We can check all such configurations and one of them must minimize the value of the deviation. Because W k contains no more than n þ 1 distinct values and dk ðxÞ for a fixed x can be computed in OðnÞ time, the value of dk can be obtained in Oðn2 Þ time. Let us point out that the method of computing the quantity dk is very similar to that proposed for the variance [11]. We now address the problem of computing the bound d¯ k . The following proposition characterizes a configuration that maximizes the value of dk ðxÞ. Proposition 3. There is a configuration xmax such that dk ðxmax Þ ¼ d¯ k and xmax has the following form: xmax ¼ ðx½1 ; . . . ; x½k1 ; x½k ; x¯ ½kþ1 . . . x¯ ½n Þ
i¼1
x 2 G it holds
(6)
0
Therefore, the problem with G and k is equivalent to the original 0 one. Notice that k d n=2 e and the transformation can be performed in OðnÞ time. For a given x the value of dk ðxÞ can be computed in OðnÞ time, which follows from the fact that the kth smallest element in x can be found in OðnÞ time [2,4]. If x ¼ ðx½1 ; . . . ; x½n Þ, then the value of dk ðxÞ can also be expressed in the following way: "
# k1 n X 1 X dk ðxÞ ¼ ðx½k x½i Þ þ ðx½i x½k Þ : n i¼1 i¼kþ1
(5)
We now show that both bounds of the interval Ddk ¼ ½dk ; d¯ k can be computed in Oðn2 Þ time. Consider first two configurations x ¼ ½x1 ; . . . ; xn , x¯ ¼ ½x¯ 1 ; . . . ; x¯ n . The following fact holds:
Proof. Suppose that configuration x ¼ ðx½1 ; . . . ; x½n Þ maximizes deviation dk ðxÞ. From (5) it follows that x½i ¼ x½i for i ¼ 1; . . . ; k 1 and x½i ¼ x¯ ½i for i ¼ k þ 1; . . . ; n. Otherwise, there would exist a configuration for which the deviation was strictly greater. Hence configuration x is of the following form: ðx½1 ; . . . ; x½k1 ; x½k ; x¯ ½kþ1 ; . . . ; x¯ ½n Þ: It remains to be shown that we can decease x½k to x½k without decreasing the value of the deviation. Denote d ¼ x½k x½k and consider two cases: 1. x½k x½k1 . Let us replace x½k with x½k and denote the resulting configuration by x0 . It is clear that x½k is the kth element in x0 . From (5) we get dk ðx0 Þ ¼ dk ðxÞ þ ðn kÞd ðk 1Þd: Thus
B. Gładysz, A. Kasperski / Applied Soft Computing 10 (2010) 361–366
dk ðx0 Þ ¼ dk ðxÞ þ ðn 2k þ 1Þd. By our assumption k d n=2 e ðn þ 1Þ=2, which implies ðn 2k þ 1Þd 0 and, consequently, dk ðx0 Þ dk ðxÞ. Configuration x0 is of form (6) and the proof is completed. 2. x½k < x½k1 . Let us fix first x½k ¼ x½k1 . Using the same reasoning as in point 1 we get that the resulting configuration still maximizes the deviation. Furthermore, it is of the following form: x½1 x½k1 ¼ x½k x¯ ½kþ1 x¯ ½n : Now it is clear that we can decrease the value of x½k to x½k without decreasing the value of the deviation. The resulting configuration is of form (6) and the proof is completed. & We now show how to compute xmax using Proposition 3. Observe first that the kth element in xmax , say x j , must belong to W k . Consider an interval x˜i for i 6¼ j. If x¯ i < x j , then the value of X i must lie on the left of x j in (6) and, consequently, we can fix xi ¼ xi . Similarly, if xi > x j then we can fix xi ¼ x¯ i . Suppose that the number of variables X i whose values have been set to their lower bounds equals l. Observe that l k since otherwise x j 2 = W k . In order to obtain a configuration of form (6) it remains to fix the values of X i , i 6¼ j, such that x j 2 ½xi ; x¯ i to their upper or lower bounds. Let V f1; . . . ; ng be the subset of indices that correspond to such unfixed variables. In order to find a configuration of the form (6) that maximizes the deviation under the assumption that jk ðxmin Þ ¼ x j we need to solve the following problem: X ½ðx j xi Þt i þ ðx¯ i x j Þð1 t i Þ max X i2V ti ¼ k l
If t i ¼ 1, then we fix the value of X i to its lower bound and if t i ¼ 0, the we fix the value of X i to its upper bound, where i 2 V. The problem can be rewritten as follows: X max a ðxi þ x¯ i Þt i i2V
(7)
i2V
t i 2 f0; 1g; i 2 V P where a ¼ 2ðk lÞx j þ i 2 V ðx¯ i x j Þ is a constant. Hence model (7) has the same optimal solution as the following one: X min ðxi þ x¯ i Þt i X
i2V
ti ¼ k l
We now compute d¯ k . Assume that jk ðxmax Þ ¼ x2 ¼ 3. Then V ¼ f1; 3g and l ¼ 2 since X 3 and X 6 are fixed to their lower bounds and X 4 , X 5 and X 7 are fixed to their upper bounds. Now in order to fix X 1 and X 3 we solve problem (8). As the result we obtain configuration x ¼ ½1; 3; 5; 7; 9; 1; 10, which gives deviation dk ðxÞ ¼ 23=7. Repeating this procedure for all the remaining candidate x j , that is x3 ; x4 ; x5 we obtain xmax ¼ ½1; 6; 2; 7; 9; 1; 10, which gives deviation dk ðxmax Þ ¼ 26=7. In consequence d¯ k ¼ 26=7. We thus get that the mean absolute deviation from the third element in this sample problem is in the interval [8/7, 26/7]. 3. Interval mean absolute deviation from mean In this section we focus on computing interval Dd for the P function (2). Let us define by mðxÞ ¼ 1n ni¼1 xi the mean of configuration x ¼ ½x1 ; . . . ; xn . We thus have:
dðxÞ ¼
n X jxi mðxÞj i¼1
In this section we address the optimization problems d ¼ min x 2 G dðxÞ and d¯ ¼ max x 2 G dðxÞ. The quantities d and d¯ form ¯ containing all values of the mean absolute an interval Dd ¼ ½d; d deviation from mean. The lower bound d can be computed efficiently by solving the following linear programming problem: n X ðeþ i þ ei Þ i¼1
i2V
ti ¼ k l
gives dk ðxÞ ¼ 12=7. Repeating this procedure for all the remaining values in W k we get xmin ¼ ½3; 4; 4; 5; 4; 2; 8 and dk ðxmin Þ ¼ 8=7. Hence dk ¼ 8=7.
min
t i 2 f0; 1g; i 2 V
X
363
(8)
i2V
t i 2 f0; 1g; i 2 V Problem (8) is easy to solve. The optimal solution can be obtained by fixing t i ¼ 1 for k l smallest xi þ x¯ i among i 2 V, which can be done in OðnÞ time. We can repeat the procedure for all j 2 f1; . . . ; ng such that x j 2 V k and chose the resulting configuration that gives the largest deviation. It is evident that the overall running time required is Oðn2 Þ. Example 1. We now illustrate the computation of Ddk ¼ ½dk ; d¯ k by an example. Suppose that we have seven variables X 1 ; . . . ; X 7 with uncertainty intervals x˜1 ¼ ½1; 3, x˜2 ¼ ½3; 6, x˜3 ¼ ½2; 5, x˜4 ¼ ½5; 7, x˜5 ¼ ½4; 9, x˜6 ¼ ½1; 2, x˜7 ¼ ½8; 10. Let us fix k ¼ 3. From Proposition 1 we get that the kth element in every configuration must belong to interval ½2; 5 and, consequently, W k ¼ f2; 3; 4; 5g. We first compute dk . If we fix jk ðxÞ ¼ 2, then, according to Proposition 2, we get configuration x ¼ ½2; 3; 2; 5; 4; 2; 8, which
xi x¯ ¼ eþ i ei n X xi nx¯ ¼
for
i ¼ 1; . . . ; n
for for
i ¼ 1; . . . ; n i ¼ 1; . . . n
i¼1
xi xi x¯ i eþ i ; ei 0
Unfortunately, the corresponding linear model for computing the upper bound d¯ requires additional binary variables. We now give a polynomial combinatorial algorithm for computing the upper ¯ It is clear that for any configuration x ¼ ðx½1 ; . . . ; x½n Þ it bound d. holds x½1 mðxÞ x½n . Let k 2 f1; . . . ; ng be a number such that mðxÞ 2 ½x½k ; x½kþ1 . We get: " # k n X 1 X dðxÞ ¼ ðmðxÞ x½i Þ þ ðx½i mðxÞÞ ; n i¼1 i¼kþ1 which after substituting mðxÞ ¼ 1n leads to: 2k dðxÞ ¼ 2 n 2k 2 n
"
n X
x½i
i¼1
i¼kþ1
"
n X
i¼kþ1
k X nk
x¯ ½i
k
k X nk i¼1
k
Pn
i¼1
x½i and easy computations
# x½i # x½i :
(9)
We now use (9) to compute a configuration x 2 G that maximizes dðxÞ. Consider the following mathematical programming problem parametrized by the value of k 2 f1; . . . ; ng: n 2k X nk ð1 t Þ x t x ¯ i i i i k n2 i¼1 t1 þ þ tn ¼ k t i 2 f0; 1g
fðkÞ ¼ max
for
i ¼ 1; . . . ; n;
364
B. Gładysz, A. Kasperski / Applied Soft Computing 10 (2010) 361–366
which contradicts the assumption that x corresponds to an optimal solution to (11). The case mðx Þ > x¯ ½k þ1 is symmetric and the proof goes very similarly. &
which can be expressed equivalently in the following way: n 2k X nk x ð x þ x Þt ¯ ¯ i i i i k n2 i¼1 t1 þ þ tn ¼ k t i 2 f0; 1g
fðkÞ ¼ max
for
Theorem 1. It holds d¯ ¼ f .
i ¼ 1; . . . ; n: (10)
Let us define
f ¼ max k¼1;...;n fðkÞ:
(11)
Observation 1. The value of f can be obtained in Oðn2 ln nÞ time. Proof. For a fixed k first order in Oðnlog nÞ time the variables with respect do nondecreasing values of x¯ i þ nk xi . Denote this order as k ðX ½1 ; . . . ; X ½n Þ we then fix t ½i ¼ 1 for i ¼ 1; . . . ; k and t ½i ¼ 0 for i ¼ k þ 1; . . . ; n. This gives the maximal value of fðkÞ. Repeating this procedure for all k requires Oðn2 log nÞ time. & Observation 2. It holds d¯ f .
Proof. It follows directly from equality (9). Indeed, if x maximizes dðxÞ and mðxÞ 2 ½x½k ; x½kþ1 , then d¯ ¼ dðxÞ fðkÞ f . &
Let k and ½t1 ; . . . ; tn be an optimal solution to (11). In configuration x , that corresponds to this solution, we fix X i ¼ xi if ti ¼ 1 and X i ¼ x¯ i otherwise. Notice that we fix the values of precisely k variables to their lower bounds and the values of the remaining n k variables to their upper bounds. Clearly, x ¼ ðx½1 ; . . . ; x½k ; x¯ ½k þ1 ; . . . ; x¯ ½n Þ, where ½1; . . . ; ½k are indices of the variables set to their lower bounds or, equivalently, for which t ½1 ¼ ¼ t ½k ¼ 1. The fact that x½k x¯ ½k þ1 follows directly from the method of solving problem fðk Þ (see the proof of Observation 1). Our aim is to show that x maximizes function dðxÞ. We first prove the following proposition: Proposition 4. If mðx Þ 2 ½x½k ; x¯ ½k þ1 .
x ¼ ðx½1 ; . . . ; x½k ; x¯ ½k þ1 ; . . . ; x¯ ½n Þ,
then
Proof. If x corresponds to an optimal solution to (11), then Proposition 4 implies that mðx Þ 2 ½x½k ; x¯ ½k þ1 . So, using (9) we can see that dðx Þ ¼ fðk Þ ¼ f . Hence the upper bound given in Observation 2 is attained and x maximizes the deviation. & So, according to Observation 1, the upper bound d¯ can be computed in Oðn2 log nÞ time. 4. Fuzzy mean absolute deviation In this section we show how the results obtained in the previous section can be generalized to the fuzzy case. The key idea is to replace a classical closed interval with a fuzzy one and apply possibility theory to extend the notion of the interval D f . 4.1. Fuzzy intervals The concept of a fuzzy set was introduced by Zadeh [16]. A fuzzy set A˜ is a reference set V and a mapping mA˜ from V into [0, 1], the unit interval. The value of mA˜ ðvÞ, for v 2 V, is interpreted as the ˜ A fuzzy quantity a˜ is a degree of membership of v in the fuzzy set A. fuzzy set in which V is the set of reals R and ma˜ is a mapping from R into [0, 1]. If additionally ma˜ is normal, quasiconcave, upper semicontinuous and has a bounded support (i.e. the set fx : ma˜ ðxÞ > 0g), then a˜ is called a fuzzy interval [7]. Recall that every closed interval a˜ ¼ ½a; a ¯ can be represented by its characteristic function, that is a mapping ma˜ from R into set ˜ It is easy to verify that f0; 1g, where ma˜ ðxÞ ¼ 1 if and only if x 2 a. the closed interval a˜ with membership function ma˜ can be viewed a special case of a fuzzy interval. A l-cut of a fuzzy interval a˜ is a subset of R defined as follows: l a˜ ¼ fx : ma˜ ðxÞ lg; l 2 ð0; 1:
Proof. Let x correspond to an optimal solution k , ½t1 ; . . . ; tn to (11). Using (10) we get: " # n k X 2k X nk x¯ ½i x¯ ½i þ f ¼ fðk Þ ¼ 2 x½i : (12) n k i¼1 i¼1 Suppose that mðx Þ < x½k . This means that x½1 þ þ x½k þ x¯ ½k þ1 þ þ x¯ ½n < nx½k :
(13)
Let us set t½k ¼ 0. We obtain in this way a feasible solution to problem fðk 1Þ and " # n kX 1 2ðk 1Þ X nk þ1 x¯ ½i x¯ ½i þ fðk 1Þ ¼ x½i : (14) n2 k 1 i¼1 i¼1
Subtracting (12) from (14) yields:
fðk 1Þ fðk Þ ¼
2 ½k x¯ ½k þ ðn k Þx½k ðx½1 þ þ x½k 1 n2 þ x¯ ½k þ . . . x¯ ½n Þ
Now, using (13) we get:
fðk 1Þ fðk Þ >
2 ðk x¯ ½k þ ðn k Þx½k x¯ ½k þ x½k nx½k Þ: n2
Hence 2 fðk 1Þ fðk Þ > 2 ðk 1Þðx¯ ½k x½k Þ 0; n
(15) 0
We will additionally assume that a˜ is the smallest closed set ˜ It can be shown, that for every l 2 ½0; 1, containing the support of a. a˜ l is a closed interval. Thus a fuzzy interval can be seen as a family of closed intervals a˜ l ¼ ½al ; a¯ l , parametrized by the value of l 2 ½0; 1. This family is monotone, that is if l1 l2 , then a˜l1 a˜l2 . Clearly, if a˜ is a closed interval, then a˜ l ¼ ½a; a ¯ for all l 2 ½0; 1. We can obtain the membership function ma˜ from the family of l-cuts of a˜ in the following way:
ma˜ ðxÞ ¼ sup fl 2 ½0; 1 : x 2 a˜ l g
(16)
and ma˜ ðxÞ ¼ 0 is x 2 = a˜ 0 . In practical applications trapezoidal fuzzy intervals are often used. A trapezoidal fuzzy interval a˜ is shown in Fig. 1a. Every trapezoidal fuzzy interval a˜ can be described by a quadruple ða; a; ¯ a; bÞ, where a a¯ and a; b 0. It can also be represented as the following family of l-cuts: a˜ l ¼ ½a ð1 lÞa; a¯ þ ð1 lÞb; l 2 ½0; 1:
(17)
Notice that a closed interval a˜ can be described as ða; a; ¯ 0; 0Þ and a real number a (degenerate interval) as ða; a; 0; 0Þ or ða; 0; 0Þ. ˜ 4.2. Computing D f Assume that for every uncertain variable X i , i ¼ 1; . . . ; n, a fuzzy interval x˜i with membership function mx˜i ðxÞ is specified. This membership function is a possibility distribution for the values of X i .
B. Gładysz, A. Kasperski / Applied Soft Computing 10 (2010) 361–366
365
Fig. 1. (a) A trapezoidal fuzzy interval a˜ ¼ ða; a; ¯ a; bÞ and (b) a closed interval a˜ ¼ ½a; a, ¯ which can be viewed as a special case of a fuzzy interval.
In the possibilistic interpretation mx˜i ðxÞ denotes the possibility of the event that X i will take the value of x, that is
˜ l ¼ ½ f l ; f¯l contains all values of f, which may occur interval D f
PðX i ¼ xÞ ¼ mx˜i ðxÞ:
˜ 0 contains all with possibility not less than l. In particular D f
An interpretation of possibility distribution as well as some methods of obtaining it from the possessed knowledge are described in a book [7], which is entirely devoted to possibility theory. In general, interval x˜0i should contain all possible values of X i while interval x˜1i should contain the most plausible values of X i (see [7]). Let A be a subset of R. Then the possibility of the event ‘‘X i 2 A’’ is defined as follows:
PðX i 2 AÞ ¼ sup mx˜i ðxÞ;
(18)
x2A
NðX i 2 AÞ ¼ 1 PðX i 2 = AÞ ¼ inf ð1 mx˜i ðxÞÞ: x2 =A
i¼1;...;n
(19)
(20)
Hence pðxÞ is a possibility distribution for configurations x 2 Rn . Observe that if all x˜i are specified as closed intervals, then pðxÞ takes the value 0 or 1 and it is the characteristic function of the configuration set G. In the interval case, discussed in the previous section, the value of function f falls within a closed interval D f . In the fuzzy case the ˜ whose membership value of f falls within a fuzzy interval D f function mD˜ is a possibility distribution for the values of f. Having f this possibility distribution we can obtain the whole information about function f in the fuzzy case. In particular, we can compute the possibility and necessity of the event ‘‘ f 2 A’’, where A is a subset of R, using formulae similar to (18) and (19). According to possibility theory, the possibility distribution mD˜ is defined as follows: f
sup
˜ l , l 2 ½0; 1, is the family of l-cuts of the fuzzy consequence, D f ˜ . According to formula (16), the possibility distribution interval D f
mD˜ can be computed as follows: f
l
˜ g; mD˜ f ðyÞ ¼ sup fl 2 ð0; 1 : y 2 D f
(21)
f
Consider a vector x ¼ ½x1 ; . . . ; xn that represents values assigned to unrelated variables X 1 ; . . . ; X n . The possibility of the event that x will occur is
pðxÞ ¼ PðX 1 ¼ x1 ^ ^ X n ¼ xn Þ ¼ min mx˜i ðxi Þ:
˜ 1 contains the most plausible ones. In possible values of f, while D f
˜ 0 . The family of l-cuts D ˜ l , l 2 ½0; 1, is =D and mD˜ ðyÞ ¼ 0 if y 2 f f
and the necessity of the event ‘‘X i 2 A’’ is defined as follows:
mD˜ f ðyÞ ¼
l
all values of function f under configuration set G . In order to do this we can use the results shown in the previous sections. Clearly,
pðxÞ:
fx: f ðxÞ¼yg
monotone. So, we can compute the value of mD˜ ðyÞ for a given y f with accuracy e by applying binary search on the interval [0, 1]. However, obtaining the precise shape of the possibility distribution mD˜ is not easy. In order to do this one can try to adopt a profile f
approach proposed in [8]. In general, if we use trapezoidal fuzzy intervals, then mD˜ is a piecewise linear function. Using the fact f
˜ l can be easily computed for a fixed value of l, we can that D f approximate the function mD˜
f
with a given accuracy e ¼ 10 p ,
p 1, in the following way. We first compute the bounds f
li
and
li
f¯ for li ¼ i10 p , i ¼ 0; . . . ; 10 p , by applying the methods shown in l l the previous sections. We assume that the functions f and f¯ between points li and liþ1 are linear. We can retrieve then the possibility distribution mD˜ using formula (21). Let us illustrate f
this method by an example. Example 2. Assume that we have seven variables X 1 ; . . . ; X 7 . We wish to compute the possibility distribution for function dk ðX 1 ; . . . ; X n Þ. The following trapezoidal fuzzy intervals are associated with the variables: x˜1 ¼ ð1; 3; 1; 9Þ, x˜2 ¼ ð3; 6; 3; 5Þ, x˜3 ¼ ð2; 5; 1; 3Þ, x˜4 ¼ ð5; 7; 2; 6Þ, x˜5 ¼ ð4; 4; 3; 1Þ, x˜6 ¼ ð1; 2; 1; 6Þ, x˜7 ¼ ð8; 10; 4; 1Þ. The membership functions of these fuzzy inter-
Hence mD˜ ðyÞ is the possibility of the event that function f will take f the value of y. ˜ . Let Gl , l 2 ð0; 1 denote We now show a method of computing D f the set of all configurations x 2 Rn such that pðxÞ l. In other words, Gl contains all configurations whose possibility of occurrence is not less than l. Using (20) we can see that pðxÞ l if and only if mx˜i ðxi Þ l for all i ¼ 1; . . . ; n. So, according to (15), we conclude that pðxÞ l if and only if xi 2 x˜li for all i ¼ 1; . . . ; n. Therefore:
Gl ¼ x˜l1 ::: x˜ln : 0
We will also assume that G ¼ x˜01 x˜0n . Now, having fixed ˜ l ¼ ½ f l ; f¯l that contains l 2 ½0; 1, we can compute the interval D f
Fig. 2. Possibility distribution mD˜ ðyÞ ¼ Pðd3 ¼ yÞ for the values of d3 ðX 1 ; . . . ; X n Þ. d3
366
B. Gładysz, A. Kasperski / Applied Soft Computing 10 (2010) 361–366
vals are possibility distributions for the values of variables X 1 ; . . . ; X 7 . Assume that k ¼ 3. Let us also fix the accuracy e ¼ 102 . We can now compute the family of l-cuts Dld3 for l ¼ 0; 0:01; 0:02; . . . ; 1 using the results obtained in Section 2. Applying formula (21) we get an approximate shape of the possibility distribution mD˜ shown in Fig. 2. We can now obtain some d3
information about the value of the function. For instance Pðd3 ¼ 0Þ ¼ 0:2, Pðd3 4Þ ¼ Pðd3 2 ½4; 1ÞÞ ¼ 0:58, Nðd3 4Þ ¼ 0:42; etc. Furthermore, interval ½0; 6:28 contains all possible values of d3 while interval ½1:14; 3 contains the most plausible ones. 5. Conclusions In this paper we have shown how to calculate mean absolute deviations for a set of uncertain variables X 1 ; . . . ; X n . Contrary to the variance, the computations in the interval case can be performed in polynomial time. We thus have measures of deviation, which can be efficiently characterized in the interval case. We have also shown how to compute mean absolute deviations in a more general case, when X 1 ; . . . ; X n are possibilistic variables. In this case they can be characterized by a possibility distribution. We have shown an approximate method of constructing this possibility distribution via the use of l-cuts. References [1] S. Bodjanova, Median value and median interval of a fuzzy number, Information Sciences 172 (1–2) (2005) 73–89.
[2] M. Blum, R.W. Floyd, V. Pratt, R.L. Rivest, R.E. Tarjan, Time bounds for selection, Journal of Computer and System Sciences 7 (4) (1973) 448–461. [3] C. Carlsson, R. Fuller, On possibilistic mean value and variance of fuzzy numbers, Fuzzy Sets and Systems 122 (2) (2001) 315–326. [4] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, 2nd edition, The MIT Press and McGraw-Hill Book Company, 2001. [5] H. Cramer, Mathematical Methods of Statistics, Princeton University Press, Princeton, NJ, 1946. [6] D. Dubois, H. Prade, The mean value of a fuzzy number, Fuzzy Sets and Systems 24 (3) (1987) 279–300. [7] D. Dubois, H. Prade, Possibility Theory: An Approach to Computerized Processing of Uncertainty, Plenum Press, New York, 1988. [8] D. Dubois, H. Fargier, J. Fortin, The empirical variance of a set of fuzzy intervals, in: Proceedings of the 2005 IEEE International Conference on Fuzzy Systems FUZZIEEE’2005, Reno, Nevada 2005, (2005), pp. 885–890. [9] D. Dubois, E. Kerre, R. Mesiar, H. Prade, Fuzzy interval analysis, in: D. Dubois, H. Prade (Eds.), Fundamentals of Fuzzy Sets. The Handbooks of Fuzzy Sets Series, Kluwer, 2000, pp. 483–581. [10] T. Feder, R. Motwani, R. Panigrahy, C. Olston, J. Widom, Computing the median with uncertainty, SIAM Journal on Computing 32 (2) (2003) 538–547. [11] S. Ferson, L. Ginzburg, V. Kreinovich, M. Aviles, Computing variance for interval data is NP-hard, ACM SIGACT News 33 (2) (2002) 108–118. [12] R. Korner, On the variance of fuzzy random variables, Fuzzy Sets and Systems 92 (1) (1997) 83–93. [13] R. Moore, F. Bierbaum, Methods and Applications of Interval Analysis (SIAM Studies in Applied and Numerical Mathematics), Soc for Industrial & Applied Math, Philadelphia 1979. [14] R. Moore, Global Optimization Using Interval Analysis, 2nd edition, Marcel, Dekker, Inc., New York, 2004 (revised and expanded). [15] G. Xiang, V. Kreinovich, Estimating variance under interval and fuzzy uncertainty: case of hierarchical estimation, Foundations of fuzzy logic and technology, in: 12th International Fuzzy Systems Association World Congress, IFSA 2007, Cancun, Lecture Notes in Artificial Intelligence 4529, (2007), pp. 4–12. [16] L. Zadeh, Fuzzy sets, Information and Control 8 (1965) 338–353. [17] M. Yamashiro, The median for a L–R fuzzy number, Microelectronics and Reliability 35 (2) (1995) 269–271.