Fuzzy Sets and Systems 38 (1990) 81-90 North-Holland
EFFICIENT C O M P U T A T I O N
81
OF T R A N S I T I V E C L O S U R E S
H. Legind LARSEN* Space Division, CR1A/S, Bregnercdvej 144, DK-3460 BirkerCd, Denmark
R.R. Y A G E R Machine Intelligence Institute, lona College, New Rochelle, NY 10801, U.S.A. Received November 1988 Revised April 1989
Abstract: We describe an efficient, time-space balanced algorithm for computation of the transitive max-min closure of a proximity relation, i.e. of a fuzzy relation that is reflexive and symmetric. The algorithm creates a binary tree representation of the transitive closure in O(m log2 m) time and O(m) space, where m is the number of edges in the proximity graph. A central idea in algorithm is to order the edges after decreasing strength and then to process them in that order. For any pair of vertices, their similarity, i.e. their membership value in the transitive closure, is looked up in the tree in O(log 2 n) time, where n is the number of vertices in the proximity graph. We compare the performance of the algorithm with the performances of two other presented algorithms that compute transitive closures of broader classes of fuzzy relations.
Keywords: Transitive closure; similarity relation; proximity relation; fuzzy relation; fuzzy graph; partition tree; fuzzy graph algorithm; all pairs strongest-path; all pairs bottleneck-path; term similarity; fuzzy matching; similarity based matching.
1. Introduction A common subtask in heuristic classification tasks is feature value based matching of a pair of objects characterizing respectively the problem and a possible solution. In many applications, such as information retrieval, the matching is based on the term proximities initially given by the domain expert(s). The terms are themselves objects which represent domain concepts and may be used as feature values. The term proximities form a proximity relation, i.e. a reflexive symmetric fuzzy relation, which is conveniently viewed as a reflexive undirected graph. Matching of two objects are typically done as follows. First, the similarities (the effective proximities) between the feature values (of the two objects to be matched) are inferred. Then these similarities are aggregated into a single value expressing the degree of match. Depending of the problem type context, the degree of match may be interpreted as: the degree of class membership (when the one object is a prototype characterizing a class or a concept), the degree of *Presently at Computer Science Department, Roskilde University, Postbox 260, DK-4000 Roskilde, Denmark. 0165-0114/90/$03.50 © 199(k--Elsevier Science Publishers B.V. (North-Holland)
82
H.L. Larsen, R.R. Yager
satisfaction (of an information object to a query object), or the degree of belief (that one predicate or state of affairs is satisfied, given another is). We notice, that feature value objects may themselves have features, requiring the matching method applied recursively. In [4] we described such a matching method and its application in the construction of end user views in information retrieval systems. We presented an algorithm that used matrix multiplication under m a x - * r composition, where *'r may be any T-norm operator [1], to compute the similarity relation as the transitive closure of a proximity relation. (In [4], 'similarity' and 'proximity' were called, respectively, 'effective similarity' and 'initial similarity'.) The algorithm computes the transitive max-*'r closure in O(n 3 log2 n) time and O(n 2) space, where n is the number of terms (vertices in the proximity graph). The similarity between two terms is looked up in the resulting matrix in O(1) time. When n is large, the creation and representation of the transitive closure is rather inefficient, although the look-up is fast. In this paper we describe a new algorithm that creates a binary tree representation of the transitive max-min closure in O(m log2 m) time and O(m) space, where m is the number of edges in the proximity graph. The similarity between any pair of terms is looked up in the tree in O(log2 n) time. The immaterial increase in look-up time, from O(1) to O(log2 n), allows the algorithm to store the similarity relation in O(m) space instead of O(n 2) space. This is a useful property for many applications, such as user views in information retrieval systems [4], where n typically is large while the membership matrix is sparse. Thus, besides computing the transitive max-min closure very efficiently, the algorithm provides a time-space balanced solution for look-ups, w e refer to this algorithm as the new algorithm. Although max-min is the typical composition of proximity relations used in fuzzy reasoning, the algorithm has interest for all applications using m a x - * x composition, such as [4]. Since min is the least restrictive T-norm operator, the transitive max-min closure contains the transitive m a x - * x closure of the same relation, for any *x- Thus, the new algorithm can be used to obtain fast upper bounds for similarities under other *x than min. In the following, we introduce the transitive max-*T closure computation and related concepts in Section 2; in this context, we present two more general algorithms (including the one from [4]) for computation of transitive closures. The new algorithm is then described in Section 3. In Section 4, the algorithm is illustrated by use of a simple example. Finally, in Section 5, we analyze the computational complexity of the new algorithm and compare it with the performances of the two algorithms presented in Section 2.
2. Transitive max-:gT closure computation A (max-*T) similarity relation is a fuzzy relation R(X, X) that is reflexive, symmetric, and max-*T transitive, i.e. ~n(x, x) = 1, /~R(x, y) =/~R(Y, x), and I~R(X,Z)>~l~n(x,y) *TIZR(y,z) for all x,y, z e X , where I~R:X ×X---~[O, 1] is
Efficient computation of transitive closures
83
the membership function associated to R. The term pair proximities given by a domain expert do in general not form a similarity relation. While the two first requirements are satisfied per definition, the relation may be incoherent in the sense that the third requirement is not satisfied. By deleting the third requirement we obtain the definition of a proximity relation, the kind of relation formed by the proximities given by the expert. The transitive closure TR of a proximity relation R satisfies the third requirement and is thus a similarity relation; in fact, TR is the smallest relation satisfying all three requirements and containing R [3]. TR represents the similarity (or effective proximity), between any pair of terms. A proximity relation is diagrammed by a proximity graph, i.e. a reflexive undirected fuzzy graph. In general, the transitive max-*a- closure TR of a fuzzy relation R ( X , X ) represents the strength of the strongest path between any pair of terms (vertices in the fuzzy graph). Thus, for any x, y • X , #rs(x, y) is the strength of the strongest path between x and y; the strength of a path [ X l , X 2 , . . • , Xk] is #R(Xl, X2) * T ' ' " :~T~R(Xk--I, Xk)" The problem of computing TR modelled by the graph problem known as the all-pairs shortest-path (length) problem, where 'shortest-path length' here may be read as 'strongest-path strength'. When *T is the min operator, the graph problem is known as the all-pair bottleneck-path problem. In fuzzy relational terms, R o R ( X , X ) is defined as follows under the max-*T composition ' o ': #Sos(X, z) = max (#s(x, y) *T # s ( Y , z)). y~x Since o is associative we can define powers of R by R j = R . . . . . R (j times R). The transitive closure of R is then [2] TR = T O R 2 U R 3 t . J • • • U R n-l,
where n is the order of X (n = IXI). As shown in [4], if R is a proximity relation, then TR = R k, where k is min{j I R~ = Ri+l}; R k = R TM for i = 1, 2 . . . . . Since no path can be longer than n - 1, we have k < ~ n - 1. R k represents the situation where no paths can be improved by using another path as subpath. Thus the max-*-r transitivity is satisfied, and R k is a similarity relation. Let Ms be the matrix representation of R, i.e. Ms = {ali}nXn where aij = #s(Xi, Xj), (Xi, Xj) • X X X , X = I x 1 , x 2 . . . . . Xn]. T h e n R p, p = 2, 3 . . . . . is represented by (MR)p, where (aij)p=
max
[(aiu)p-l*va~/],
i=1 .....
n, j = l . . . . .
n.
u ~ ( l .....n)
The properties of R k allow M r R = ( M R ) k to be computed by squaring M s ceil(log2n) times; thus, by matrix multiplication, the transitive m a x - * T closure of R is computed in O(n 3 Iog2 n) time and O(nz) space. W e refer to this method as the matrix method. In fact, the matrix method computes the transitive m a x - * T closure for any reflexive fuzzy relation,i.e. also for asymmetric relation. This is a useful property for heuristic matching using relations which are not symmetric, such as in
84
H.L. Larsen, R.R. Yager
domains where the degree to which x satisfies y is not necessarily equal to the degree to which y satisfies x. We may even compute the transitive closure of any fuzzy relation in only O(n 3) time and O(n 2) space. The algorithm with this performance is shown as Algorithm 2.1. It is essentially Floyd's algorithm for solving the all-pairs shortest-path problem [5]; here the algorithm is modified to work under max-*T composition. The algorithm works on a membership matrix {air}n×~, which initially represents the fuzzy relation, and after execution of the algorithm the transitive closure of the relation. By the test of a (j, i) > 0 is avoided unnecessary executions of the third for-loop. We refer to this algorithm as Floyd's algorithm. Algorithm 2.1. Floyd's algorithm modified to compute the transitive max-*'r closure of a fuzzy relation. fori:= ltondo for j : = l t o n d o if a(], i) > 0 then fork:= ltondo a(j, k): = max{a(j, k), a(j, i) *-ra(i, k)} In the following we describe the new, time-space balanced, algorithm for computation of the transitive m a x - * T closure when *T is the min operator.
3. The new algorithm
Let R(X, X ) be a proximity relation represented by its proximity graph GR(X, ER). We assume that X is a list Ix1 . . . . . xn], allowing xj to be represented by its index, j. We notice that the mapping X---~ {1 . . . . . n} is easily realized by a hash table using the term name as a key, see e.g. [5]; the hash table allows j to be looked up for any term xr in O(1) time and O(n) space. The new algorithm for computation of the transitive max-min closure TR(X, X ) consists, in fact, of two main algorithms: Create for creation of a binary tree representation of the transitive closure, and Lookup for subsequent look-ups of #rR(X, Y). The creation algorithm implements the following method. First, the set of edges {((i, j), air ) ~. ER I ao = #R(xi, x r) > O, i < j } , is ordered after decreasing value of air such that ((i, j), ao) is before ((i', j'), ai,r,) if air >I ai,p The edges are then processed in that order: Since the first edge now is the strongest edge in ER, i.e. the edge with the highest value of #R, that edge must be the strongest path between its ends. The ends are then bridged by their a-value to a 'set vertex'. We continue with each of the following edges, i.e. in decreasing a-value order; unless both ends are already in the same set vertex (in which case the strength of the strongest path has already been found), the edge determine a bridge of two (set) vertices into a new set vertex. The procedure continues until all edges are
85
Efficient computation of transitive closures
processed. The strength of the strongest path between any two (set) vertices is then the bridge (its a-value) used to connect the vertices into a single set vertex. Thus, the bridges represents the branching nodes in the partition tree ('similarity tree') of TR. The natural data structure for the algorithm is a binary tree (or rather a forest of binary trees, since GR is not necessarily connected). The tree has n external nodes, representing the vertices in GR; the internal nodes represent the bridges• For any pair of vertices (x/, x~), the membership value in TR, #rn(xi, x~), is then the a-value of the bridge represented by their nearest common ancestor of the two vertices in the tree. If no such an ancestor exists, then the vertices belong to disjoint trees, and #rn(xi, xj) = O. If i = j , then #rR(Xi, Xj) = 1. The two algorithms, Create and Lookup, are presented as Algorithms 3.1 and 3.2. 3.1. Create• The proximity graph GR(X, ER) is represented by: n = ISl is the number of vertices in GR. X = [Xl . . . . . xn] is the list of the vertices; actually X is not represented in the algorithm, where we use the index j as the reference to xj, j = 1 . . . . . n. m = IERI is the number of edges in GR. E = ER using index number representation of vertices, i.e. the j-th edge is represented by ((vj, wj), aj) ~ ({1 . . . . , n} x {1 . . . . . n}) × [0, 1], where vj and wj are vertex indices, and aj = #n(xvj, xwj). E is stored as three lists (m-tuples), v, w, and a, which, to simplify the algorithm, are indexed by j = n + 1. . . . . n + m. The resulting binary tree (or, in general, a forest of such trees) is represented by an (n + m)-tuple T. Special symbols used in the Algol-like notation are: • • • is followed by commentary text until the line end. [ ] begin-end brackets, enclosing a list of program statements separated by " ; " . ( ) the not-equal symbol. With this notation, the actual algorithm is: Algorithm
(la) . . . initialize: nl:= n+l;nm:= fori:=
n+m;
ltonmdoT(i):=0;
(lb) E : = Sort(E); •.. Sorts the set of edges after decreasing a-value, •.. yielding a list of edges, ((vj, wj), aj), j --- n l . . . . . • . . satisfying aj >I a~÷l for j = n l . . . . , n m - 1. (lc)
fori:=
nltonmdo
[ . . . for each vertex in (vi, wi) find the root •.. of its subtree (representing the largest •.. set containing the vertex): P v : = v(i); w h i l e T ( P v ) > 0 d o P v : = T ( P v ) ; P w : = w(i); w h i l e T ( P w ) > 0 d o P w : = T ( P w ) ;
nm,
H.L. Larsen, R.R. Yager
86
. . . if the vertices are not in the same set, • . . then bridge the two sets: if P v ( ) P w then T ( e v ) : = T ( P w ) : = i
] Algorithm 3.2. Lookup. Giving a pair of vertex indices, (i, j), this algorithm retrieves a =/~rR(Xe, Xj) from T (created by Create). (2a) . . . initialize: P v : = i;
P w : = j;
(2b) if P v = P w then a : = 1
. . . identical vertices else [ . . . search for the nearest common ancestor: w h i l e P v > 0 & P w > O & P v ( ) P w do [while P v > O & P v < P w do P v : = T ( P v ) ; while P w > 0 & P w < P v do P w : = T ( P w )
1;
if P v = 0 or P w = 0 thena:=0 ... else a : = T ( P v )
disjoint trees . . . bridge found
] 4. An example illustrating the new algorithms Assume that the proximity relation R, as given by the domain expert, is the one diagrammed by the graph GR in Figure 1 which, for illustration, describes R in two other forms, namely by the list of edges (List) and by the membership matrix (MR). The relevant storage contents at the start of lc in Algorithm 3.1 are illustrated in Figure 2. M. = { a,j },
List:
GR:
1 0.7 2 1kin
size:
n=4, rn=4 0.6
edges (sorted):
0.9
0.9 /
4
ao = pe(x~,xj): 1 2
3
4
1 1 0.7 0 0.9 2 0.7 1 0.9 0.6
((1,4), 0.9)
3
0 0.9
1
0
((2, 3), 0.9) ((1,2),0.7) ((2, 4), 0.6)
4 0.9 0.6
0
1
3 Fig. 1. Characterizations of the proximity relation.
87
E/ficient computation of transitive closures
i:
1
T=
n
nl
3
4
5
6
7
8
0 0
0
0
0
0
0
v=
1
2
1
2
w=
4
3
2
4
2
[0
t
a=
nm
0.9 0.9 0.7 0.6
Fig. 2. Storage contents at the start of lc. Figure 3 shows the values of P v and P w obtained during the iterations of the for-loop of lc and the resulting contents of T. Figure 4 illustrates the tree obtained. Finally, Figure 5 shows the transitive closure, TR, represented by the tree; TR is characterized by both the similarity graph GTR and the membership matrix MrR •
To illustrate look-ups in the created tree, assume that we want to know the similarity between the vertices 3 and 4, or, equivalently, the strength of the strongest path in GR between the two vertices. Figure 6 shows the values of Pv and Pw obtained during the run of Algorithm 3.2. At the end of the run both
i
Pv
Pw
5
1
4
6
2
3
7
1
6
8
5
4
2
4
7
7
i:
1 2 3 4 5 6 7 8
T=
{566577001
Fig. 3. Running values of Pv and Pw, and the final T.
Vertex:
Bridging edges:
1
4
2
r
3
i
5(0.9)
6(0.9)
I 7(0.7)
Fig. 4. The tree obtained.
88
H.L. Larsen, R.R. Yager
M m = { aij }, aii=/=m(x, xj):
GTR :
1
0.7
2
0.7
o l lo 4
0.7
1
2
3
4
1
1
0.7
0.7
0.9
3
o0.7
0.9
o1 o0.7
4
0.9
0.7
0.7
1
3
Fig. 5. The transitive closure represented by the tree.
Pv
Pw
(2a) Initial
3
4
(2b) whileiterations
6
5 7
7
Fig. 6. Running values of Pv and Pw in lookup for (3, 4).
variables point to the same briding edge whose a-value is the answer to the query. Finally, we obtain ItrR(x3, x4) as T ( P v ) = T(7) = 0.7.
5. Computational complexities In Section 5.1, we analyze the computational complexities of the new algorithm (Create and Lookup) for the average problem case. In Section 5.2, we give an overview over the performances of the new algorithm and the two other presented in Section 2, namely the matrix method from [4], and Floyd's algorithm. We assume O(n) ~< O(m) ~< O(n2), since n - 1 ~ m <- l n ( n - 1) holds for a connected graph (the left inequality may not hold if the graph is not connected, and the right inequality may not hold for a directed graph). 5.1.
The new algorithm
(Create, lb) The time and space complexity of sorting of a list of length m is known to be O(m log2 m) and O(m), respectively. A sorting algorithm with this complexity is Quicksort, see e.g. [5]. (Create, lc) The loop accounts for m iterations. The while-sentences iterate down to the depth of the subtree, which in the average case is of depth O(1og2 n). In this case the mean number of iterations in the two while-sentences under the processing of an edge is 2 + 2 = 4, and the total number of iterations for processing m edges is
89
Efficient computation of transitive closures T a b l e 1. T h e p e r f o r m a n c e s of the t h r e e m a i n a l g o r i t h m s Problem type span * x in Algorithm (1) F l o y d ' s (2) M a t r i x (3) N e w
Complexities
Fuzzy R:
Space
m a x - * "r
Refl.
Symm.
any any min
+ / + +
+ / + / +
n2 82 m
Time Create
Lookup
n3 8 3 log 2 n m log2 m
1 1 log2 n
Key: ' + / - ': m a y o r m a y n o t b e satisfied, ' + ': m u s t b e satisfied.
4m. Thus, the sorting is the determining factor for the time complexity of Create, namely O(m log2 m). The storage requirement of Create is 4m + n, as Figure 2 illustrates. Hence, the space complexity is O(m). (Lookup, 2c) The main while-sentence finds the nearest common ancestor of the two external nodes representing the two vertices. Since this is done by search in a binary tree of depth O(log2 n), the time complexity of Lookup is O(log2 n). The storage requirement is unchanged, i.e. O(m). 5.2. The performances of the three main algorithms compared In Table 1 we give an overview of the performances of the three main algorithms discussed in this paper, namely the two algorithms described in Section 2, (1) Floyd's algorithm, (2) the matrix method, and (3) the new algorithm. The algorithms compute the transitive closure of a fuzzy relation R, under the restrictions given in the problem type span' column. The computed transitive closure TR is a similarity relation, only if R is a proximity relation, i.e. when R is a fuzzy relation that is reflexive and symmetric. We notice, that Floyd's algorithm is in general more efficient than the matrix method, since the 'problem type span' of the latter is included in that of the first, which has a lower computational complexity. However, one advantage of the matrix method is that it provides a simple mathematical expression of the solution.
6. Conclusion
We have described a new, time-space balanced, algorithm for computation of the transitive max-rain closure of a proximity relation and analyzed its computational complexity. We have, i n this context, presented two more general algorithms for computation of transitive closures, namely the matrix algorithm
90
H.L. Larsen, R.R. Yager
introduced in [4] and Floyd's all-pairs shortest-path algorithm. The new algorithm provides a good time-space balanced solution, considering the large but sparse proximity matrices in many applications. Compared to the matrix algorithm and Floyd's algorithm, the new algorithm reduces the time for creation of the transitive closure considerably, which especially is useful when updates of the proximity relation are frequent. We have, finally, compared the powers and computational complexities of the three algorithms. The new algorithm has particular interest, because max-min is the typical kind of composition used in reasoning based on fuzzy relations. Further, the transitive max-min closure provides fast upper bounds for similarities under "1- other than min.
Acknowledgement H. Legind Larsen was supported in part by the Danish Academy of the Technical Sciences (ATV grant No. TS-9). R.R. Yager was supported in part by the Airforce Office of Scientific Research.
References [1] P.P. Bonissone, Summarizing and propagating uncertain information with triangular norms, lnternat. J. Approximate Reasoning 1 (1987) 71-101. [2] A. Kaufmann,Introduction to the Theory of Fuzzy Subets (AcademicPress, New York, 1975). [3] G.J. Klir and T.A. Folger, Fuzzysets, uncertainty,and information(Prentice-Hall, Englewood Cliffs,NJ, 1988). [4] H.L. Larsen and R.R. Yager, An approach to customizedend user views to informationretrieval systems,To appear. [5] R. Sedgewick,Algorithms, 2rided. (Addison-Wesley,Reading, MA, 1988).