Fuzzy Sets and Systems 18 (1986) 1-13 North-Holland
1
CONSTRUCTING MEMBERSHIP FUNCTIONS USING STATISTICAL DATA M. Reha CIVANLAR Center for Communication and Signal Processing, North Carolina State University, Box 7914, Raleigh, NC 27695-7914, USA
H. Joel TRUSSELL Elec~eai and Computer Engineering Department, North Carolina State University, Box 7911, Raleigh, NC 27695-7911, USA Received August 1984 Revised January 1985 Determination of the membership functions is vital in practical appfications of the fuzzy set theory. This paper presents a guideline to construct the membership functions for fuzzy sets whose elements have a defining feature with a known probability density function (pdI) in the universe of discourse, The method finds the smallest fuzzy set which assigns high average membership values to those objects with the defining features distributed according to the given pdf. It is show that, for any pdf, the method is capable of generating membership functions in accordance with the possibility-probability consistency principle. Membership functions derived from some of the well known pdfs and an application in solving noise contaminated Linear system of equations are presented.
Keywords: Membership functions, Statistically based fuzzy sets.
1. Introduction ThE most crucial step in the design of a problem which is to be solved by fuzzy set methods is the determination of the membership functions of the sets. The methods which have been used in the past are often heuristically based. Because of this subjective aspect of the problem, fuzzy set methods have often been dismissed prematurely by investigators who are unfamiliar with the area. This paper proposes criteria which result in a more rigorous method of defining membership functions for a certain group of fuzzy sets which are statistically based. This result puts a class of membership functions which had been previously proposed [4, 5, 8] on firmer theoretical ground. We define a statistically based fuzzy set as one whose membership function is based upon the probability density function (p~) of a defining feature of its members in the universe of discourse. While this is a limited class of sets it is very important and useful. There are many guidelines on developing the membership functions for fuzzy sets as surveyed in [4]. The sets based on statistics are perhaps some of the most naturally fuzzy sets that can be used. Any measurement based on statistics must 0165-0114/86/$3.50 (~ 1986, Elsevier Science Pubfishers B.V. (North-Holland)
M.R. Civanlar, H.J. TrusseU
make some allowance for deviation from the value obtained by the measurement. Another property of statistically based sets is that they are naturally quantitative, that is, there is reason to believe that the membership function has a relationship to some physical property of the set. To elaborate on the naturally quantitative remark, consider a communication system where the received signals are noise contaminated. In the presence of a transmitted signal, the energy of the received signal will be the sum of the energy of the transmitted signal and the energy of the noise. The energy of the recieved signal is used as an indicator for the presence of a transmitted signal. A fuzzy set which may be useful in this example is the set of received signals when a transmitted signal exists. The defining feature of the elements of this set is their energy. The energy of a signal is proportional to its variance. Sample estimates of the variance can be characterized by confidence limits which in turn are related to the pdf of the sample estimate. It is natural to relate this pdf to a membership function. The next section of the paper will state the proposed criteria for relating the pdf and the membership function and discuss their implications. The relation is obtained through infinite dimensional optimization theory. In Section 3, a requirement imposed on the developed technique by the possibility-probability consistency principle will be discussed. Section 4 will give the numerical solutions for some of the more common probability density functions. An application will be presented in Section 5.
2. Defining the optimal membership function .
2.1. F u z z y sets for a better model In constructing a set that defines a certain collection of objects, a feature of these objects, such as their length, weight or color, can be used. Frequently, the features are characterized by a probability density function or perhaps, only a histogram. In such cases, a fuzzy set can easily be used to describe the collection. A case for which a fuzzy set based on a known pdf gives a good model of reality is the definition of confidence intervals. Consider the construction of a confidence interval for the sample mean of N independent, zero mean random numbers, x~, i = 1 . . . . , N, with variance o-2. It is known that, for a large enough sample size N, the sample mean, m, is a zero mean, normal random variable with variance o-2/N. In classical hypothesis testing this information is used to define a set of acceptable values for the sample mean in the universe of the possible values of the sample mean. The ordinary set defined by the confidence limits is given by
where eN is the confidence limit. According to this approach, those sample mean values smaller than the confidence limit will be acceptable, and those larger than the limit will be rejected. However, the belief in the acceptability of a particular sample mean value should not change abruptly. The transition from acceptability
Constructing membership[unctions to unacceptability is gradual and, clearly, can be modeled by a fuzzy confidence set. In this example, the membership function is a function of Ira[ and decreases as Iml increases. The cases for which collected data is used for defining a property of a group of objects are numerous. In these cases, a histogram of the values of an identifying feature of the objects, which are known to be in the collection, can be used as an approximation to the pdf. Consequently, the sets constructed using this method are normal (i.e. max{/~(x)}= 1). As an example, consider the set of acceptable grades from a certain exam such as the GRE. Since it is very dit~cult to predict the quality of a student using his score from a single exam, the set of acceptable grades should be fuzzy. In order to define the membership function for this set using the new method, the grades of a large enough group of successful students must be collected. These data can be used for constructing a histogram to approximate the pdf. Clearly, the larger the number of samples is, the better this approximation will be. In pattern recognition terminology, the group of students who are known to be successful and whose grades are used in the histogram construction are called training samples. The membership level of a prospective student's grade in this set can be used in the final acceptance decision by combining it with other information about the student. Another way of obtaining the histogram is through a poll. Using the acceptable grades example again, instead of making a histogram of the grades of successful students, professors may be asked to indicate acceptable range of grades and a histogram of number of professors indicating acceptance versus grades can be made. This approach has been used in [5] for defining membership functions. As stated in section one, the proposed membership function construction technique can handle many useful cases, however, it is not general. There exist fuzzy sets whose members do not have a pdf. Examples include the set of long distances, large objects, etc., where the identifying feature has infinite range of probable values. Another constraint is the underlying assumption about the existence of full members of the set. This is an important point in constructing membership functions from polls using the proposed method. If, in constructing a membership function for a fuzzy set such as pleasing houses, the subjects are asked to indicate the height/width ratio of a pleasing rectangular house, and some of the subjects do not find rectangular houses of any kind absolutely pleasing as reported in [7], the method presented here will not generate a consistent membership function. In the following sections, the derivations are given using the probability density functions. Since the procedures for estimating densities from histograms are well known [3], it is assumed that, when necessary, this step is carried out before defining the membership function. 2.2. Conditions for an optimal membership function The problem that is being addressed in this paper is how to use the pdf of a defining feature of the elements of a fuzzy set to construct its membership function. In order to define a reasonable membership function, there are certain
4
M.R. Civanlar,H.,L ~ussell
conditions which can be imposed to make the set have properties consistent with the user's subjective judgement and the underlying pdL From a heuristic viewpoint, the elements which are most likely should have high membership values, however, the set should be as selective as possible. These requirements are quantitatively described below. (I) E{~(x)Ix is distributed according to the underlying pdI}~c where the confidence level c should be close to unity. Qualitatively, since the only available information is the p ~ for the members of the set, the average membership value assigned to those values distributed according to this pdf should be large. (II) 0 ~ p.(x) ~< 1. Because, an infinite scale will not be necessary to assign grade of memberships, and [0, 1] is the classical interval over which the membership functions are defined. (III) .[ t~2(x)dx should be minimized. This condition is required to obtain a selective membership function, that is, the 'size' of the set should be as small as possible. The integral of the squared membership function is related to the size of a fuzzy set [1]. Thus by minimizing it, the 'smallest' set satisfying the other requirements can be obtained. The optimal membership function defined by these conditions can be derived using constrained optimization techniques for infinite dimensional spaces [6]. This derivation is presented in the next section where the optimal membership function is shown to be
i~(x)={~p(x)
if Ap(x)< 1, if Ap(x) ~> 1,
(1)
where p(x) is the pdf or its estimate derived from the histogram of the feature used for defining the fuzzy set, and the constant ~t is to be solved from
(2) p(x)
p(X)~I
For a given pdf, A can be solved from (2) using numeric root finding techniques. A numeric integration is also necessary if a closed form for the indefinite integral of the pdf or its square does not exist. It should be noted that a size measure other than f~/~2(x) dx could be used in the third criterion. Clearly, this will effect the resulting membership function. For example, if the integral of the membership function is used as the measure, the optimal solution is an ordinary set which is nothing but a classical confidence interval having the parameter c of the first condition as the confidence level. However, as discussed in the previous section, imposing a sharp boundary between the nonmembers and the full members of a set based on statistical information will not be realistic. Furthermore, in [1], it is demonstrated that, at least for decision applications, the integral of the squared membership function is a better size measure than the integral of the membership function itself. Clearly, it is not possible to prove the necessity or sufficiency of the proposed criteria for constructing membership functions. However, they are reasonable requirements, and they generate reasonable membership functions.
Constructing membershipfunctions 2.3. Finding the optimal membership
5
function
Let X be the universe, and let S[.] map X into the reals. Assume that, for a certain collection of objects in X, the values of S have a known probability density function p(x). According to the condition stated in Section 2, the optimal membership function for a fuzzy set describing this certain class of objects, using only this information, can be found by solving the following problem: min
f(~,)=½[** tz2(x) dx
I/.
such that
G(~) = c - E{p,}= c - I_~ ~ ~(x)p(x) dx<~0,
(3)
and ~ e12 = {t~Cx) I 0~< ~ ( x ) ~< 1}, where c < 1. The ½ factor in the cost function is for notational simplicity. The solution of the problem given in (3) is based on the following theorem: Theorem [6, p. 221]. Let X be a vector space, Z a normed space, 0 a convex subset
of X, and P a closed positive cone with nonempty interior in Z. Let [ be a real-valued convex functional on 12. Let G be a convex and regular mapping from 12 into Z (i.e. there exists an xl ~ 12 for which G(xO < ~). Then, x* is the optimal solution of the problem minimize f(x) subject to G(x)<-O,
(4)
x~12,
if and only if the Lagrangian defined by L(x, z #) = f(x) + ( G(x), z#),
(5)
where z # is an element of the dual space of Z, possesses a saddle point at (x*, z#*), i.e. L(x*, z #) <~L(x*, z #* ) <~L(x, z #* ) (6) for all x ~ 12, and z #>> - ~. For the problem given in (3), let X be the space of piecewise continuous functions, Z the real numbers, P nonnegative real numbers. Let f, G, and 12 be as given in (3). Clearly,/2 is a convex set, and G and f are convex funetionals. The regularity of G can easily be seen by using ~ = 1 as the xl of the theorem. Thus, all conditions for the application of the theorem hold and the Lagrangian of the problem is
L(l~,}t)=½I?** ~2(x)dx + ~ [ c - I?** t~(x)p(x)dx}
(7)
where }t >~0 is the Lagrange multiplier. The Lagrangian is convex with respect to p. Thus, the necessary and suflieient condition for a function p* to minimize L is [6] Lk.(/~ - ~*) ~ 0
(8)
6
M.R. Civanlar, H.Z Trussell
for all ~ ~ O, where the quantity on the left of the inequality sign in (8) is the Gateaux derivative of L(., ),), calculated at t~*, in the direction of (Iz - tz*). The condition stated in (8) requires
(9)
I?. (ix*(x) Ap(x))(~(x) - p*(x)) dx ~ 0.
for all t~ E O. The optimal solution satisfying (9) was given in (1). Substituting this optimal value into (7), the following form for the Lagrangian is obtained: L(Iz*, k) = ½ I i {I(~tp(x))(~tp(x) - 1) z - )t 2p2(x)} dx + )tc
(10)
where i(x)={01
if x - I , otherwise.
(11)
It remains to find the maximizing )t value at the saddle point. Lemma. The ~t for which C(A) = I i
{I(Ap(x))p(x) + [1 - I(Xp(x))]hp2(x)} dx - c = 0
(12)
maximizes the Lagrangian given in (10).
Proof. (I) Existence and uniqueness: By simple substitution, C ( 0 ) = - c < 0 be obtained. Since for any Ag > ks, C(k2) - COts) = f
p(x)(1-)tsp(x)) dx + [
JR l
can
(k,-)t,)pZ(x) dx > 0,
"IR2
where
COt) is a monotonically increasing function. Additionally, C()t) is continuous for positive )~ values. This can be shown as follows: An upper bound for the absolute difference between the C values corresponding to two different k values is D(ka, )tz) = [C(~.,)- C(X~)I
~<21.1 where )t, = min{,~l, k2}, )t~ = max{)tx, ~tz}, e = )tg- k~, and Rx and R2 are as defined above. Thus, for any positive 8 and ~to, there exist a positive e, given by e = ½~toS,
Constructing membershipfunctions
7
for which D(Xo, A ) < 8 whenever [~,o-Al 0 , there exist a large enough value of A for which C(A)~> 0. Thus, C()t) must have a unique root h*. (II) Optimality of the )t*: Let • = X * + ~ be a Lagrange multiplier where ~1 is an arbitrary number and X* is the solution of (2). The difference between the Lagrangians corresponding to X and A* is L(/z*, A)- L(/z*, A*) = ½I_~ {I(h*p(x))(2h*rlp2(x) + ~12pZ(x) - 2-qp (x)) - 2h*rlp2(x) + "q2p2(x)[I(Ap(x)) - I(A*p(x))](Ap(x) - 1) 2} dx + ~1c
(13)
Using (2), this can be simplified as It , {(~.p(X)--1)2--'q2p2(x)} dx-Iza (Ap(x)-1)2dx - It3 ~2p2(X)dx
(14)
where 11 = {x [ Xp(x) > 1, X*p(x) < 1}, I 2 = { x l X p ( x ) < l , X * p ( x ) > l},
I3={xlXp(x)
The expression given in (14) is always negative, because the last two terms are negative integrals of squared functions and the first term is equal to Ir (A*p(x) - 1)(Ap(x) - 1 + npCx)) dx 1
which has a negative integrand. Thus A* maximizes the Lagrangian.
3. Consistency principle In [9] the possibility-probability consistency principle is stated as: the degree of possibility of an event is greater than or equal to its degree of probability. If the membership function is used as a grade for possibility, the consistency principle can be formulated as: (15) max /z(x)/max/x(x) ~ [ p(x) dx xGD Jo for any set D on the real line [4]. If the optimal membership function, defined by the conditions stated in Section 2, is required to satisfy the consistency principle, a lower bound must be imposed on the confidence level c of the formulation. Lemma. For every probability density function, there exists a lower bound for the confidence level c of the formulation, over which the optimal membership function satisfies the consistency principle.
M.R. Civanlar, H..L Trv.sseU
8
Proof. For any pdf p(x), there exists a confidence level and a set of x's which is not of measure zero, for which tz(x)= 1, because otherwise, from (2), }tI_~ p2(x) dx = c which can not be satisfied if c is selected such that 1 > c > )t max p(x). (This can easily be extended to the case where maximum of p(x) does not exist, using limit superior instead of maximum.) Assuming that c is such that there exist a set of non-zero measure over which
~(x)=l: (I) If D N{x :;tp(x)~> 1}= ~, the maximum value of t~(x) over D will be unity, which is greater than or equal to the integral of p(x) over any D. (II) Otherwise, D c {x : )tp(x) < 1}. Define D ' = {x : }tp(x) < 1}- D. (i) If D ' is of measure zero, from (2), A mDaXp(X)ID
p(x) dx>~c--I+ID
p(x) dx
which is equivalent to
Thus, the consistency principle will hold for all c such that C
~> 1 - f p(x) dx Jxp ( x ) > l
Obviously, there are such c values. (ii) Otherwise, using (2) the following can be written: hid p 2 ( x ) d x - I D
p(x) dx = c-1+ ID, (1- Ap(x))p(x) dx
and A max p(X)--
p( )dx c-l+
xED
Thus, the consistency principle will hold for all c such that c>l-ID,
(1 -
}tp(x))p(x) dx
for all D ' which are not of measure zero. Clearly, such a c exists.
4. Optimal membership functions for specific pdffs In the last section, the method for deriving an optimal membership function from a general pdf was described. The dlffieulty of applying this result is dependent upon the underlying pdf. In this section, the optimal membership function will be derived for some of the most common pdf's.
Constructing membershipfunctions 6/17
p(x)
1/1 I i
I
-4
,
,
~
"3
-2
-1
,
,
3
4
,
0
1
2
Fig. I. Partially uniform density.
4.1. Partially uniform Finding the optimal membership function for a uniform density is trivial. For a partially uniform density, as the one displayed in Figure 1, the optimal membership function can be found in few trials. The membership functions defined using the nonuniform density of Figure 1, are demonstrated in Figure 2 for different confidence levels. The membership functions corresponding to c levels larger than 0.78 satisfy the consistency principle for this example. 4.2. Exponential If the pdf is p(x)=Ke -~',
x~>0,
the membership function is determined by the parameter a as demonstrated in Figure 3. This parameter is a function of c and is given by a = m a x { 0 , - 1 1 n ( 2 ( 1 - c))}. _~(x)
1 •
c=1
i
.
.
.
.
.
.
.
.
.
'
I i j i
i ! I fI
t
-4
-3.
=2
c=0.79 "
'
"
c=O. 56 -
.
.
.
.
I
,
I
,
=1
0
1
.2.
.
-
.
,
3
. 4
x
Fig. 2. Membership functions for different confidence levels.
10
M.R. Civanlar,H.£ Tru~ell
~p(x) alc) x Fig. 3. A membership function corresponding to exponential density.
1
w ,S
/
/I
/+I".,,I •"
0
-a
a
x
Fig. 4. A membership function corresponding to the standard normal density.
¢,4
0
"~..:
o
|
0.77
!
0.84
0.92
i
0.99
C
Fig. 5. The parameter a of the membership function corresponding to the standard normal density versus confidence level.
Consn,ucdng membership[uncaons
11
The membership function derived from the exponential density satisfies the consistency principle for c ~>0.5. 4.3. Gaussian If the probability density function p(x) is p(x)
= ~
1
e -x~r~,
the optimal membership function, demonstrated in Figure 4, is determined by a single scalar a which is a function of the confidence level c. The plot given in Figure 5 shows the values of the parameter a versus the confidence level c. The value of the optimal Lagrange multiplier can be calculated using X*(c) = ~
e "~r~.
If a c value which is larger than or equal to 1/,12 is used, the membership function derived from the standard normal density will be in accordance with the consistency principle. The membership function corresponding to c = 1/-~/2 is a normalized Gaussian pdf with the highest value equal to one. That is, for this important case, the membership function obtained by the pdf normalization technique suggested in [4] and the optimal membership function are the same.
5. Solution of noise con~mlnated linear system of equations A very important problem in many areas of engineering is the solution of the linear system of equations given by y=Ax+n where A is an N × N matrix, x is the unknown vector, y is the noise contaminated data, and n is the noise vector. The vector n is unknown, but many times joint statistics of its elements such as mean value, variance, power spectrum or pdf are available. Using this information and the residual signal, which is defined by r=y-A~ where ~ is an estimate for the solution, the set of feasible solutions can be defined as
{~ [sample statistics of r are consistent with available noise statistics} The sample statistics of the residual signal are random variables with known statistics. Thus, as discussed in Section 1, fuzzy sets can be defined for their acceptable values. As an example, assuming that the elements of the noise vector are zero mean, uncorrelated, normal random variables with standard deviation o', some of the fuzzy sets that can be defined using residual signal are discussed below.
M.R. C~vanlar,HJ. ~sell
12
Mean value: The sample mean of a feasible residual is a normal random variable with zero mean and variance o@N. Thus, a fuzzy set can be defined using the parameter I p,,, = Y. [r], where [r]~ is the ith element of the residual vector defined above, and p,, is a standard normal r.v. The membership function can easily be constructed for any confidence level, and is given as
I
1 x(c)
10.,I a(c), -v/2 -
otherwise,
where c is the confidence level, and a(c) and }t(c) are the optimal parameters for normal pdf's, as given in Section 4. Variance: The sum of the squares of a feasible residual divided by the noise variance has a chi-square distribution with degrees of freedom N. However, for most of the applications the number of samples is large enough to approximate the ehi-square by the normal. The parameter used in defining this set is 1
The distribution of this parameter is approximately standard normal. Thus, the membership function can be obtained from ft,,, by replacing 19,, with Pv. Maximum deviation from the mean: Individual entries of the residual have a normal distribution. The pdf of the maximum value of the residual, r,,, is given by
fo(r,,) = N ~/2 [1 - 2Q(r,,)]cN-1)e -'~/2 where O(x) is the probability of having a standard normal random variable larger than x. The function fo(x) is a unimodal pdf, thus the form of the corresponding membership function is
I~°(r")={~ f°(r") otherwise.a
Constructing membershipfunctions
13
6. Conclusion A n algorithm for constructing m e m b e r s h i p functions of statistically based fuzzy sets is presented. T h e resulting m e m b e r s h i p functions are o p t i m u m with respect to a set of reasonable criteria, and they can be adjusted to satisfy the possibilityprobability consistency principle. It is not possible to p r o v e the necessity or sufficiency of the p r o p o s e d criteria; however, the constructed m e m b e r s h i p functions are reasonable and the a p p r o a c h is extendable in that o t h e r criteria can be a d d e d if n e e d arises.
References [1] E. Czogat'a, S. Gottwald and W. Pedrycz, Contribution to application of energy measure of fuzzy sets, Fuzzy Sets and Systems 8 (1982) 205-214. [2] M.R. Civanlar, Digital signal restoration using projection and fuzzy set techniques, Ph.D. Dissertation, N.C. State University, Electrical and Computer Engineering Department, Raleigh, NC (August 1984). [3] P.A. Devijver and J. Kittler, Pattern Recognition (Prentice-Hall, London, 1982).. [4] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications (Academic Press, New York, 1980). [5] H.M. Hersh and A. Caramazza, A fuzzy-set approach to modifiers and vagueness in natural languages, J. Exp. Psychol. Gen. 105 (1976) 254-276. [6] D.G. Luenberger, Optimization by Vector Space Methods (John Wiley and Sons, New York, 1969). [7] A.M. Norwich and I.B. Turksen, A model for the measurement of membership and consequences of its empirical implementation, Fuzzy Sets and Systems 12 (1984) 1-25. [8] M. Nowakowska, Fuzzy concepts in social sciences, Behav. Sci. 22 (1977) 107-115. [9] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1 (1978) 3-28.