03064379 89 S3.00+0.00 Copyright CT1989 Pergamon Press plc
/n/brmorion S.wrms Vol. 14.No. 6.pp.501-505. 1989 Printed in Great Britain. All rights reserved
FUZZY
LINGUISTIC
DATA BASES: AN APPLICATION
JOHN T. DOCKERY’ and EDWARD MURRAY’?
‘Organization of the Joint Chiefs of Staff, J6, Pentagon, Washington, DC 20318, U.S.A. ?Advanced Systems Development, Inc., 1701 North Ft Myer Drive, Arlington, VA 22209, U.S.A. (Received
I I August 1989)
Abstract-This paper discusses the empirical development and use of a fuzzy linguistic data base which is relational in character. Unlike conventional fuzzy searches for crisp data, we search a data base for fuzzy information. The data base supports another code which does fuzzy aggregation of data. It has been applied to support decision making connected with the construction of military budgets for command. control and communications (C3). Key words: Fuzzy relational data base. linguistic variables, applications of fuzzy sets.
PROBLEM Data bases containing
STATEMENT
verbal information
are exceed-
ingly common, and the literature on their construction is enormous. Typical data base contents include such items as payroll records, credit records, etc. The information, although verbal, is considered to be crisply defined, e.g. John Doe. The problem treated is how to extract crisp information in an approximate fashion. However, consider a change in viewpoint about the use and storage of verbal information. Let it be treated as a collection of fuzzy linguistic variables. The literature now falls relatively silent. A promising paper by van Velthoven appeared early on but the follow up has not been good, although a paper on the same general subject was presented at the Second World Congress on Fu:zy Sets [1,21.
An early and perhaps defining work on fuzzy relational data bases is due to Buckles [3]. One of the current works on fuzzy data bases is due to Zemankova and Kandel [4]. Their work is concerned primarily with the theoretical construction of fuzzy search techniques of crisp data. In fact, fuzzy search for crisp data seems the sum and substance of the literature. The search for fuzzy information is not considered. A collection of papers edited by Prade contains contributions on exand Negoita tant/planned applications for fuzzy data bases from the vantage point of fuzzy knowledge engineering [S]. The volume also contains some of the later work by Buckles and his co-workers.
BACKGROUND As part of a project to develop a fuzzy aggregation tool to aid in military decision making, we found tNow an independent consultant.
ourselves confronted with the task of storing and using fuzzy linguistic variables. Our initial task was the ad hoc design of a data base containing a mix of crisp and fuzzy data elements which would support a data-driven aggregation process. The results of that work are reported separately in Dockery and Murray [6]. Dockery has considered military information systems which are a mix of crisp and fuzzy data [7]. The subject data base was also to be designed so that it would eventually support a goal-driven process by which the elements to be aggregated are first identified. Relations among data base variables are more than infrastructure. They are part of the data base. Emphasis in our work is always on computation with fuzzy linguistic variables. These computations serve to strengthen or weaken the original input relationships. Thus, the relational aspects of the data base are not static as is usually the case. In fact, the final set of relationships which survive the aggregation process are an important output. They serve as fuzzy audit trails. Our approach in the current example was to treat the necessary data base construction problem in a very empirical manner. This brief contribution is about the results, and the lessons they may contain.
DATA BASE DESCRIPTION
AND COXI’ENT
The data to be manipulated in the aggregation process was the result of collecting informed military judgement. It was both linguistic and subjective in nature. It dealt with specific assessments of the current state of military preparedness within the so-called command, control and communications (C3) community. The assessment concerned the C3 support available to military operations at various levels of aggregation from local systems to global functioning.
501
JOHST. D~~KERYand
502
MURRAY
EDWARD
The collection process asked military officers at field installations to provide answers to the following kinds of questions. Identify your C3 capabilities, e.g. “see a 100 miles with a radar”. l Evaluate these capabilities by color, e.g. “message passing is not good and therefore yellow”. l Specify verbally how functional missions, e.g. air defense, which the capabilities support, are related, e.g. “essential”.
BEFORE AGGREGATION Ihlsat stmws c, as wfN4lmtly
Stored in data base data
I
em.1
liW?d
to budget I
l
The foregoing was later combined with information from headquarters units. In summary the final data set consisted of the following sorts of information. Information Content l
Category
Perceived deficiencies in C3 systems Hierarchical dependencies
-Textual information -Fuzzy linguistic variables l Performance data by color -Possibility distributions -Fuzzy algorithms l Rule sets -Mixed text and 0 Programmatic crisp numbers (budget/systems) data -Fuzzy linguistic l Decisions generated variables by user l
By usual standards it is a small data base, not exceeding 10,000 items worldwide, but nearly all fuzzy in nature. The data base serves as a source of computations, which are selected by the user. Results of computations are stored back into the data base. Although queries can be made to the original data base, the usual manner of query is in terms of sensitivity of the results of the fuzzy computations. One distinct advantage of using fuzzy information is its treatment of goals and constraints. Whether something is seen as goal or a constraint often depends upon the relative hierarchical level from which it is viewed. Fuzzy set theory puts both goals and constraints in the same decision space permitting direct trade-offs between them.
Cl
d
c3
c4
c2
Fig. I. Stage l-relationship of mission function (F& to supporting capability (C,) in terms of dependency (0,).
relationships which are forgotten! This pruning process mimics the actual decision maker’s approach to manual aggregation. As we have indicated, capabilities themselves are defined in terms of colors from green to red (good to bad). They are linked to one of three mission functions by dependencies which are in turn expressed by such fuzzy terms as essential or important. The three mission functions are then combined to give an overall color of the mission. The following four stages summarize the process. Stage l-The key initial data base relationships for the fuzzy aggregation are shown in Fig. 1. A mission function Fk is displayed. It is linked by dependencies 0, to its supporting capabilities C,. The insert shows additional linkages of a representative Ci to other functions FL, etc. and to relevant budgetary information E,. Stage I-After aggregation, few capability-dependency pairs survive; often only one. Shown in Fig. 2 are a surviving pair for each mission function Fk. The insert depicts the reduction in possible relations from the original population. Which pairs survive depends upon the fuzzy computational algorithms selected from the data base by the user. The user can adjust the strength of fuzzy relationships by drawing upon a set of fuzzy connectives selected from Dubois and Prade [S]. These algorithms reflect various possible T,S-norm pairs. Some express pes-
RELATIONAL STRUCTURE OF THE AGGRECA’MON The aggregation process induces a change in the relational data structure in four stages. Each stage represents a sequence in which sets of C3 capabilities, which support military mission functions, are combined in two steps to yield a judgement on the preparedness level of the mission itself to support still higher elements in the hierarchy, e.g. theatre operations. Further aggregation occurs which is not discussed in this contribution. An essential element of the aggregation process involves selective “forgetting” of many of the original relationships. The process appears Darwinian since it is the “weaker”
1
ci
I
Fig. 2. Stage 2-relationship of total mission (M) to capabilities (C,). [Only one dependency-capability (0,-C,) combination for each function (F& of mission (M) survives aggregation Stage I .]
503
Fuzzy linguistic data bases
AFTER AGGREGATIONOF FUNCTIONS TO MISSION [inserl shows reductionfrom original population] M
I
I
Fig. 3. Stage 3-aggregation of functions (F& into a single mission (M) assessment.
simism; others, optimism. In general, a pessimist will forget different relationships than an optimist although at times operators reflecting these opposing viewpoints may produce the same results. Stage 3-After the mission functions are combined, a further reduction in relationships occurs. We have shown only a single survivor (Fig. 3). The insert shows the continuing reduction in relationships. Stage 4-The final step is to store only the surviving relational chain(s), and associated external user decisions into a scratch pad memory for use in aggregating to the next higher level, e.g. theatre (Fig. 4). We note that the user decisions represent crisper information than the original data. This serves to contain the increasing fuzziness inherent in the aggregation process. Decisions taken by the user thus work together with selective forgetting of relationships during code execution to accomplish this end.
DATA BASE ARCHITECTURE The data base is relational in organizational character. Its task is both crisp and fuzzy retrieval of information for use in fuzzy aggregation. The code was originally programmed directly in BASIC for stand-alone computation on IBM compatible personal computers. In its final form the data base has been reprogrammed using the PACE software for Wang computers interfaced with an aggregation pro-
M Fl
bl,
02
II-
decisicmlD,,~made
thm Iwo levels of aggregation]
c2
@ Fig. 4.
Q--storing of surviving fuzzy relationships with associated intermediate decisions (D,).
Stage
gram to be written in PASCAL. The user interface is menu-driven externally and table-driven internally. Commands are by function key. A data dictionary contains the necessary parent-child relationships which are currently nonfuzzy. On the surface this appears to contradict earlier statements about the plasticity of the relationships. That there is no contradiction stems from the fact that the data base keeps two sets of relational books. It can always make available the atomic relations which were originally input. However, in so far as the user is concerned, the relations of interest to his aggregation problem only emerge during interactive sessions. An example are the computations done by one of our colleagues to uncover classes of “dominant” linguistic variables combinations. These drive the national assessment. During program execution the user works with the fuzzy variables until the aggregation to the next higher level either seems appropriate or irreconcilable with his a priori estimate. In the process he will be specifically queried as to whether he is a Pessimist or Optimist in case of ties. Since decisions are stored back for later use, linking between certain data elements gets better with use. However, no claim is made that the program learns in any sense. A comparison with the types of expressions put forth by Zadeh in his PRUF language finds all four present in either our data base or user-data base interactions [9]:
Expression a Modified Proposition l Composed Proposition l Quantified Proposition l Qualified Proposition
Example -Land Combat is MORE important than. . . -Capability 1 is Green And/Or Capability 2 is Yellow -The mission is ESSENTIALLY Yellow -That Capability 2 is Red is NOT VERYPROBABLE
In PRUF the concept of a possibility distribution replaces that of truth as a foundation for meaning representation. That we have empirically found it useful to include all four forms is an encouraging sign that we indeed have a fuzzy relational data base to which fuzzy logic may be applied. We are currently storing and using possibility distributions as tables directly. For intermediate levels of aggregation, however, possibility values are computed. Continuing work with the code is also requiring us to compute and store two related items, necessity and certitude. These have been defined by Dubois and Prade, and are extensively discussed in Klir and Folger [lo, Ill. We sought operational definitions since we had available sets of possibility distributions and membership functions. What was
504
JOHS T. DOCKERY
and
employed used a derivative form from Zemankova and Kandel [4]: poss (X is A) = sup,,,{min[fI,(u),
p,,(u)]),
net (X is A) = inf,,,- {max[( 1 - n,(u)), p”(u)]j, cert (X is A) = max (0, inf,,,b,(u)*H,(u)
> 0]),
EDWARD MURRAY Table I. Relation between colors which indicate a possibility that the row color could have been the column color Relatton
SR
Red
Y R
Yel
YG
GR
SG
Super-red Red Yellow’red Yellow
I .o 0.5 0.0 0.0
0.2 I.0 0.6 0.2
0.0 0.5 1.0 0.7
0.0 0.0 0.5 1.0
0.0 0.0 0.2 0.7
0.0 0.0 0.0 0.1
0.0 0.0 0.0 0.0
Yellow:green Green Super-preen
0.0 0.0 0.0
0.0 0.0 0.0
0.2 0.0 0.0
0.7 0.2 0.0
I .o 0.5 0.0
0.6 I.0 0.2
0.0 0.5 1.0
where: II,(u) = p.,(u) = U= *= > =
possibility distribution, membership function, U,u, and multiplication, excludes any individual product, which is equal to zero, from consideration as the minimum.
Computation of the foregoing quantities permits the user to judge the degree of fuzziness, and hence the degree of acceptability of his intermediate decisions, which in turn become data for further aggregation. Questions of consistency among the three detinitions exist. There is a requirement for normalization in Prade’s definition of possibility and necessity. At least one of the possibilities must be unity, which is a condition which can only rarely be guaranteed in a real-world data base. If the foregoing is not the case, then examples can be constructed for which the Necessity is greater than the Possibility! Zemankova offers a substitute unnormahzed definition for Possibility computations which is as follows [4]:
sometimes doing some kind of fuzzy interpolation. Thus, from:
n iRrformance(capability,)
=
0.3 I yellow + 0.8 I red,
we may infer the possibility that capability i is yellow-red if we can construct the relational table between yellow/yellow-red and red/yellow-red. An example of such a table is contained in Table 1 above. The matrix in Table 1 above is not symmetric nor should we in general expect real relations to be symmetric. It is therefore not a similarity relation. Using the table and the expression for II we get the possibility that the performance of capability j is yellow/red to be the: max[min(0.3,0.7),
min(0.8, OS)] = 0.5.
If the matrix had been symmetric, the answer would have been 0.6. Use of the suggested unnormalized version of possibility gives 0.4. A generalized difficulty with the operational use of relations had led one of us to investigate the subject further [13].
FORCING
RESULTS
Poss(.r is A) = sup,,,~,(u)+fl,~(u)]. We have experimented with this expression with mixed results. Use of the product relationship, or indeed any of the various other operations, introduces a kind of hierarchy into the definition of possibility. To what end we cannot say at this time except to experiment further. A very recent note by Dubois and Prade is instructive on the general subject of both Certitude and Possibility [12]. We have been more concerned with a consistent interpretation of the set A. It would appear that A is most conveniently associated with sets of requirements. Then we may ask what the possibility is that a given capability with a certain color meets the stated requirement. The necessity would then express the degree of belief in that proposition. The certitude is a kind of normalized necessity which will always be less than the possibility. For the cases in which a similarity matrix can be defined between the colors representing the performance state of the capabilities, one can use Zemankova’s definition of possibility and certitude to compute the possibility that the performance could be some other color. This can include one not in the original possibility distribution. We are in a sense
Should the user still not find the aggregated assessment to his liking, he clearly has available a variety of measures by which to experiment with the final aggregation results. Still there are limits to what the program will permit. When these are reached the code will ask what the user would like to see. It then audits the results and displays those fuzzy linguistic variables combinations which are driving the results. If the user does not feel that he is free to alter the basic input, the program goes into a different mode. It retreats to the capabilities level and suggests the changes necessary to change a capability assessment. Since the capabilities are tied to C3 programs, this creates a list of suggested programmatic changes for user consideration. Computations use the complement of the membership functions originally used as input. Changes may be required in several programs in order to effect any substantive change in the final assessment.
OBSERVATIONS We observed the following about building a data base centered about fuzzy linguistic variables.
Fuzzy linguistic data bases l
We found it useful to prune, i.e. reduce scope of
relationships rather than expand links, as this translates into reduced complexity and fuzziness. l The system is set up to (temporarily) forget links to source data as decisions about aggregation are made. l Reasoning chains are very short. l Input data is “consumed” in the process of aggregation. By this we mean that the results of computing with original source data become so “washed out”, i.e. fuzzy, that additional data, which is dependent upon intermediate decisions, must enter the data base. l Relevancy of the any particular data item is context dependent. lThe fuzzy aggregation process determines the content of new data sets created from the basic data. l Flexible use of fuzzy relations is necessary for l
experimentation with the data. Definite limits exist to the variety of conclusions that can be drawn even when the fuzzy data is as
“elastic” as we have indicated. REFERENCES [I] G. van Velthoven. Application of fuzzy sets theory to criminal investigation. First European Congr. on Operarions Research (EURO I), Brussels (1975). [2] R. Vandenberghe, A. van Schooten, R. de Caluwe and E. E. Kerre. A practical application of fuzzy database techniques to criminal investigation. Preprints of the
505
World Congress on Fu::v Sets, Vol. 2. pp. 661-664. Int. Fuzzy Systems Association: Japan Chapter through the Japanese Society of Instrument and Control Engineers (1987). [31B. P. Buckles. Fuzzy relational databases: a foundational framework and information theoretic characterization. Internal Memorandum, General Research Corporation, Huntsville, Alabama (1979). 141 M. Zemankova and A. Kandel. Fu:zv Relational Dara Buses-o Key ro Expert Systems. -TUV Rheinland Verlag (1984). PI H. Prade and C. V. Ne oita (Eds). Fu::y Logic in Knowledge Engineering. T & V Rhemland Verlag (1986). 161J. Dockery and E. Murray. A Fuzzy Approach to Rolling Up Assessments. Int. J. Approximafe Reasoning 1, 251-272 (1987). 171 J. Dockery. Fuzzy design of military information systems. Int. J. Man-Machine &dies 16, l-38 (1982). [81 D. Dubois and H. Prade. Criteria aggregation and ranking of alternatives in the framework of fuzzy set theory. Fuzzy Sets and Decision Analysis (H. J. Zimmermann er ul., Eds), pp. 209-240. North-Holland. Amsterdam (1984). [91L. Zadeh. PRUF: a meaningful representation language for natural language. Inr. J. Man-Machine Srudies 10, 395-460 (1976). WI D. Dubois and H. Prade. Fuzzy Sets and Systems, pp. 131-144. Academic Press, New York (1980). UII G. Klir and T. Folger. Fuzzy Sets. UncerkUnr_v, and Information. Prentice-Hall, Englewood Cliffs. New Jersey (1988). WI D. Dubois and H. Prade. An Alternative Approach to the Handling of Subnormal Possibility Distributions. Fuzzy Sers Systems 24, 123-126 (1987). 1131J. Dockery and M. L. McAllister. Similarity relations. Second
Proceedings: Annual Meeting Fuzzy Information Processing
(1988).
of the North American Society. San Francisco