JOURNAL
OF MATHEMATICAL
PSYCHOLOGY
Metrics SCOTT Society
of Fellows
lo, 26-59
(1973)
on Spaces of Finite
A. BOORMAN
AND
DONALD
Trees C. OLIVIER*
and Department of Psychology and Social Relations, Cambridge, Massachusetts 02138
Harvard
University
With the increasing popularity of hierarchical clustering methods in behavioral science, there is a need for ways of quantitatively comparing different tree structures on the same set of items. We employ lattice-theoretic methods to construct a variety of metrics on spaces of trees and to analyze their properties. Certain of these metrics are applied to data from Fillenbaum and Rapoport (1971) on the semantic structure of common English kin terms. This application shows that tree metrics can be used to select a componential analysis which is maximally consistent with an empirically derived set of trees.
With the increasing popularity of hierarchical data analysis methods and tree construction schemes (see, e.g., Miller, 1967, 1969), behavioral scientists are frequently faced with the problem of comparing different tree structures on the same set of points. Such structures may arise in several ways-from proximity matrices derived from different populations of subjects; from different proximity measures applied to data generated by a single group of subjects; or from different procedures for hierarchical clustering of the same proximity matrix (see, e.g., Hartigan, 1971, for a recent survey of hierarchical clustering algorithms). In addition, the problem may arise of comparing an observed tree structure with one predicted on theoretical grounds, as from a componential analysis (see Section 6 below). These situations all motivate development of measures for structural comparison of hierarchical clusterings and other tree-like data structures. A standard kind of distance measure possessing desirable intuitive properties is, of course, a metric, which in the present case would be a function p defined on pairs of trees and satisfying the following axioms: Axiom
1:
p(T,,
T,) >Oand
Axiom
2:
ATI
3 Ts) = AT,
Axiom
3:
pEL(TI , Td < p( TI , TJ + p( Ts , TJ. (Triangle inequality.)
* Order
of authorship
=Oifandonlyif 3W
is alphabetical.
26 Copyright All rights
0 1973 by Academic Press, Inc. of reproduction in any form reserved.
Tl = T,;
METRICS
ON
SPACES
OF FINITE
TREES
27
The objective of the present paper is to explore several such tree metrics and to analyze their behavior and characteristics. Comparable theoretical developments exist for similarity and dissimilarity measures on many kinds of structures other than trees-for example, measures of correlations between linear orderings (Kendall, 1962), measures of association for cross tabulations (Goodman and Kruskal, 19541, and metrics on subsets of a finite set (developed in a psychological context by Restle, 1959, 1961; Hays, unpublished). One probable reason for the absence of a well-developed literature on structural measures for trees is the comparative diversity of what can be meant by the term “tree.” In its most fundamental definition, a tree on a finite set of nodes N is a connected, cycle-free undirected graph on N, but many small variations on this structure are also called “trees” in the literature. The distinction between rooted and unrooted (topological) trees is classical in combinatorics (e.g., Riordan, 1958); unrooted trees appear to have little behavioral science interest (though see the first proposed method in Rapoport, 1967, which generates unrooted lexical trees with labeled nodes and edges). Rooted trees may conveniently be classified according to which nodes are labeled. Trees with all nodes unlabeled are not frequently met in the applied literature. Trees with all nodes labeled play an important role in structural sociology, where they typically represent dominance and organizational structures (see Friedell, 1967, for a critique); they also appear in transformational linguistics in connection with the grammatical description of sentences. Hierarchical clusterings in the sense of Ward ( 1963) and Johnson (1967) occupy an intermediate position : here the terminal nodes (i.e., nodes with only one edge) are labeled by elements of the set of items being clustered, but nodes at higher levels are unlabeled. Trees appearing in the substantive literature often carry additional structure. The case where edges as well as nodes are labeled has already been mentioned; we might also note the linguistic applications where the edges emanating from a given node are ordered (see, e.g., Chomsky and Miller, 1963). In the hierarchical clustering schemes proposed by Johnson (1967), the nonterminal nodes are assigned real numbers which describe quantitatively the history of merging of subtrees. For the purposes of many substantive psychological applications, hierarchical clusterings and immediate generalizations are the most important kind of tree structure and the most frequently encountered in the literature. Specifically, we consider rooted trees where only the terminal nodes are labeled, and establish a uniform terminology for the three cases we will deal with. These are: bare trees, whose terminal nodes are labeled by elements of an item set S and which carry no additional structure; ranked trees, whose terminal nodes are labeled by S and whose nonterminal nodes are ranked on an ordinal scale; and valued trees, whose terminal nodes are labeled by S and whose nonterminal nodes are assigned real numbers with at least an interval-scale interpretation (see discussion of the metric s in Section 1 below). In a rooted tree there is a natural partial ordering of nodes, where node n is > node II’ if there is a sequence of
28
BOORMAN
AND
OLIVIER
edges from n’ to n to the root. We draw our trees to reflect this ordering, following the common perverse practice of putting the root at the top (Fig. 1). The subtree determined by a node n is the tree obtained by detaching the nodes below n with n asthe root. The set of labelsof terminal nodesin the subtree is called the node set of the node n. We will alwaysassumethat the values or ranks in a tree are strictly increasingin the tree ordering-that is, if n is above n’ then the value or rank of n is greater than the value or rank of n’. A tree is called binary if every nonterminal node has exactly two nodesimmediately below it. Despite the diversity of tree concepts, there seemto be two basic approachesto constructing tree metrics. The first is to formalize a concept of a transformation of a tree, and to define a metric on trees as the least number of moves necessaryto transform one given tree into another. This approach to structural distance should not be novel to anyone familiar with the linguistic notion of transformational complexity (Miller and Chomsky, 1963),and is alsoclosely related to the natural way of imposinga metric on the vertices of a graph (Hakimi and Yau, 1965). Distance measuresof leastmoves type have the advantagethat they necessarilysatisfy the triangle inequality, and their sensitivity properties are frequently quite transparent. The main drawback is that, unlessthe move concept is carefully selected,the measureof distance between trees is often very hard to compute. A secondapproachto tree distanceis to representa tree in terms of simpler structures for which adequate metrics are available, e.g., sets, partitions, or incidence matrices, These simpler metrics are then used to induce a metric on a spaceof trees; from a computational point of view thesemetrics are generally quite tractable. From the viewpoint of practicality and eleganceit is fortunate that several tree metrics can be defined which combine these two approaches,and which possessthe advantagesof both. Section 1 introduces the topic by presentingseveralmetrics which occur in first consideration of the problem of tree distance. Section 2 developslatticetheoretic machinery in preparation for the main constructions. This machinery is then applied in Section 3 to construct a very useful classof tree metrics generalizing one of the ideasin Section 1 and containing another asa special case.These metrics are basedupon the idea of representing a tree as a sequenceof partitions, and employing a partition metric to induce a metric on trees. They uniformly admit a natural and informative least-movesinterpretation. Section 4 developssensitivity analysisfor severalof the suggestedmetrics and explorestheir numerical behavior, which Section 6 tests on alternative tree structures over the semantic domain of common English kinship terms. For computing one group of tree metrics (referred to asm, , Y, , and b, below) one of the present authors has written a FORTRAN IV pr0gram.l
1 Copies
of this program
are available
from
the authors
on request.
29
METRICS ON SPACES OF FINITE TREES 1. THREE EXAMPLES OF TREE METRICS
One strategy for constructing a metric on bare trees is to identify a tree with the collection of node sets of its nonterminal nodes. Consider the two trees in Fig. 1. T, has three nonterminal nodes with node sets {a, b}, {a, b, c}, and {a, b, c, d}; T, also has three nonterminal nodes, with node sets {a, b}, {c, d}, and {a, b, c, d}. Note that in both cases the structure of the tree can be completely recovered
A A
a
b
c. d
a
b
Tl
c
d
T2 FIG. 1. Two bare trees.
from
the
collection
of node
sets, as is true
in general.
Now
define
a distance
/?(T, , T,) to be the minimum of the sum of the symmetric difference distances between node sets of Tl and T, , the minimum to be taken over all possible l-l mappings between the two collections of node sets (on symmetric difference metrics for sets see Halmos, 1950). The idea is that we “forget” the tree structure of the subtrees and match them up in the optimum way as sets; from a computational viewpoint this is an optimal assignment problem (Ford and Fulkerson, 1962). In the case under consideration we clearly have an optimum pairing which assigns {a, b} to (a, 61, {a, b, c} to {c, d}, and {a, b, c, d} to {a, b, c, d}, giving fi(T, , Tz) = 0 + 3 + 0 = 3. In general, let S be a set of size n and let Tl and T, be trees whose n terminal nodes are labeled in l-l correspondence with elements of S, and whose nonterminal nodes are unlabeled. Let a collection of sets (IV, , W, ,..., W,-,} be defined by replicating each node set in a Tl subtree e - 1 times, where e is the number of edges emanating from the parent node (if Tl is binary, each node set will hence appear exactly once). That the resulting system of subsets will have n - 1 members is easily verified. Let {Xi , -X2 ,..., X,_,} be the corresponding system of subsets determined by Tz . Then, DEFINITION 1.l. p( Tl , T,) = min, Cyzl / WidX,ci) 1 where f is a permutation of the first n - 1 integers. We now demonstrate that /3 admits a natural least-moves representation:
30
BOORMAN
AND
OLIVIER
DEFINITION 1.2. Let {F,}~=, be an unordered system of subsets of S. Define a move transforming this system into another similar system to be the addition of a single element s of S to some Fj or the subtraction of a single element s from some Fj . Then:
THEOREM 1 .l. p( Tl , T,) is the least number of moves(in the senseof the above dej&ition) neededto transform{ Wi}F=;’ into {Xi}:“=;’ .
Proof. Straightforward from the observation of Flament (1963) that the symmetric difference metric 1 XdY / on subsets of S is representable as the least number of additions or subtractions of a single element needed to transform X into Y. This establishes that /3 has a least-moves interpretation, though to obtain it we must go to a class of structures (namely, finite systems of subsets of S) not all of which correspond to trees. The fact that p is a metric follows immediately from Theorem 1.1, since any least-moves distance obviously satisfies the triangle inequality. One major advantage of this metric is that the same principle can be applied to a wide variety of tree comparison problems, including comparison of trees on different sets S and of trees with all nodes labeled. Computation of /3 can be difficult when the number of items is large; as pointed out above, it reduces to an optimal assignment problem (Ford and Fulkerson, 1962). If we restrict ourselves to hierarchical clusterings, however, it would be desirable to have a metric which takes account of the cluster values 01which Johnson’s procedure (1967) assigns to the nodes of the tree. Our first approach applies to the case of valued trees, where we take into account the numerical values of the 01and assume these values comparable between trees. We begin by noting that the 01 determine a distance concept on S: DEFINITION 1.3. Let T be a valued tree on a node set S. If a and b are members of S, define u,(a, b) to be the value of the lowest node in the tree whose node set includes both a and b (and = 0 if a = b). As Johnson (1967) shows, uT satisfies the axioms for an ultrametric on S-that is, uT is a metric which satisfies the stronger form of the triangle inequality
da,
4 d m=+Aa,
b), +(b,
and the structure of the valued tree can be completely on valued trees may now be defined as follows:
4) recovered
from uT . A metric s
1.4. s(T, , T,) = 3 ‘&b I uT1(a,b) - ur,(a, b)l. That s is a metric is direct, since (up to a positive constant) it is simply the city-block metric on the distance matrices considered as vectors in Rne.As an example DEFINITION
METRICS
ON
SPACES
OF FINITE
31
TREES
of the computation of S, consider the valued trees in Fig. 2. Table 1 shows the associated distance matrices u, and we can compute s(T, , T,) = 5.25. It should be noted that the metric s requires that tree values be comparable across trees, and that it extracts cardinal, not merely ordinal, information from the values, not being invariant under arbitrary monotone transformations of the input data. This lack of ordinal invariance is somewhat contrary to the nonmetric philosophy of
T2
FIG.
2.
Two
TABLE Distance a a b
0 ----___.50
Matrices
d
.I.5 1.3
trees.
1 to T, and T, in Fig.
b
c
d
.50
.I5
1.3
a
.I5
1.3 ~.
b
0
1.3
c
1.3
0
d
0 ..~ ____
c
Corresponding
valued
.7.5 1.3
uT1
a
2
b
c
d
2.0
2.0
0
2.0
2.0
2.0
2.0
0
2.0
2.0
0
.80 .80
.25 .25
0
u T:!
Johnson’s approach, which seeks to give clustering results invariant over ordinal transformations of an input proximity matrix. In many cases to which Johnson’s clustering technique is applied, however, this input matrix can in fact be assumed to 480/ro/1-3
32
BOORMAN
AND
OLIVIER
have at least interval scale properties (so that the output tree is valued), and in this case the s metric is fully applicable. If we restrict ourselves to the ordinal information contained in the node values of a hierarchical clustering we have the case of a ranked tree, and here an approach which is superficially very different is possible. We can represent a ranked tree as a sequence of partitions Pi of S, where the cells of a partition Pi are determined by the node sets of subtrees ranked i or below; more precisely, items a and b belong to the same cell of partition Pi if and only if they both belong to the node set of a subtree ranked i or below. Thus for the two clusterings shown in Fig. 2 we have the corresponding partition chains:
It is clear that the structure of the ranked tree can be fully recovered from the partition sequences. Now a metric on ranked trees can be constructed from any metric on partitionings. Given any two increasing partition sequences {Pj}jk_l and {Qj}El corresponding to two ranked trees Tl and T, , we define DEFINITION 1.5. t,(T, , T,) = XL1 S(Pi , QJ, where 6 is any partition metric and the sequences {P,}fE1 and (Qj}& are extended to infinity by adding the trivial partition P, which lumps all elements of S together as the tail of sequence. For any partition metric 8, the function t, is a metric on the principle that any linear combination of metrics with positive coefficients is again a metric. The infinite sum does not give convergence problems, since ifj > max(k, m) we have Pj = Qj = PI and hence S(P, , Qj) = 0. A wide variety of possible partition distance measures is discussed in Boorman (1970) Boorman and Arabie (1972), and Arabie and Boorman (1972), which give detailed references to the relevant literature. We detail here only the minimum information necessary to construct two of the most useful partition metrics. DEFINITION 1.6. Given two partitions P and Q of S, the intersection P n Q is defined as follows: a, b E S are in the same cell of P n Q if and only if they are in the same cell of P, and also in the same cell of Q.
METRICS
SPACES OF FINITE
P = {{a, b, c}, {d}}
If
EXAMPLE.
ON
and
33
TREES
Q = {{a, b}, {c, d}],
then
Pn Q :
ih % 6% WI. DEFINITION 1.7. follows: a, b E S are a = a,, a, )..., a, = or uj , uj+r are in the
If P = {{a, b, c}, (d}} and Q = {{a, 6}, {c, d}}, as before, then P
EXAMPLE. @,
b, c, 4)
Given two partitions P and Q of S, the union P u Q is defined as in the same cell of P u Q if and only if there is a finite sequence b such that for each j, either uj , uj+r are in the same cell of P, same cell of Q.
=
u
Q
=
PI.
Given these definitions, it is shown in the references just cited that the following functions are metrics on the space 9(S) of partitions of S: DEFINITION 1.8. C(P,Q)=/Pl+lQ/-2lP~Ql,where/PIisthenumber of cells in any partition P.
DEFINITION
1.9.
D(P,Q) =g
9’)
+I(‘$‘)-2c( j
‘7’
)
i,j
where p
=
fci>,
Q
=
{d,b
P n Q = (Zij},
and (g) is standard notation for x(x - 1)/2. We compute the metrics sc and sn on the hierarchical clusterings depicted in Fig. 2 and now interpreted only as ranked trees. Using the above examples, we readily see that we have the C-distance between {{a, b}, {c}, {d}} and {{a}, {b}, {c, d}} equal to 2, while that between {{a, b, c}, {d}) and {{a, b}, {c, d}} is. 1i k ewise 2 (by the second example above). The D-distance between the first pair is 2, and between the last pair is 3. Hence, combining to obtain the distance between the hierarchical clusterings as a whole, we obtain t, = 4 and t, = 5. It is also possible to give the terms 6(Pi , Qi) d i f f erential weightings in the sum For any particular choice of positive weights {IV,}, we occurring in the definition oft, . obviously still have a metric. Thus in many hierarchical clustering applications we take more seriously the lower than the higher levels of the clusterings, and this bias can be formally captured by giving heavier weight to 6(Pi , QJ for small i, though choice of any particular weighting coefficient raises substantive and statistical problems.
34
BOORMAN
AND
OLIVIER
2. LATTICE METRICS The last two concrete metrics presentedin the previous section, namely, t, and t, , are closely connectedin their definitions and propertieswith the natural lattice structure on the partition spaceB(S) (Sz&z, 1963;Birkhoff, 1967).The presentsectiondevelopsa general approach to defining metrics on a lattice. As we will seein Section 3, this viewpoint provides a natural unification of a variety of approachesto tree distancewhen we extend the lattice structure on B(S) to a spaceof hierarchical clusterings on S. DEFINITION 2.1. A lattice L is a set together with a relation < which satisfiesthe following properties:
(i)
< is a partial ordering, i.e., x
(ii)
Every pau . of e1ements x, y has a unique least upper bound, denoted by x u y (thejoin of x andy).
(iii)
Every pair of elementsx, y hasa unique greatest lower bound, denoted by x n y (the meetof x and y).
We have already met one example of a lattice, namely, the lattice of partitions of S, with the ordering defined by P < Q if every cell of P is contained in somecell of Q. It is easyto verify that the intersection and the union of two partitions defined in the last sectioncoincide with the above definition of join and meet, respectively. DEFINITION
2.2. A supervaluationv on L is a real-valued function v on L which
satisfies ~(4 + V(Y) G 74%u Y) + V(X n Y).
(1)
The following basictheorem now showshow we can usesupervaluationsto construct metrics on lattices. A considerably more abstract proof of the theorem, valid for any lattice L (finite or infinite), can be modeledalong the lines of the proof of Theorem 1 in Birkhoff (1967: Chapter IO), but the least-movescharacter of the metric is not brought out by this more generalapproach. THEOREM 2.1. (Supervaluation theorem). Let L be a jkite lattice and let v be a real-valuedfunction onL which is strictly increasingin the lattice ordering, i.e., if x < y, then v(x) < v(y). Then thefunction definedby
P(X, Y) = ~(4 + V(Y) - 24~ n Y)
(2)
METRICS
ON
is a semimetric (i.e., satisjies the jirst is a super-valuation on L.
SPACES
OF FINITE
35
TREES
two metric axioms), and is a metric if and only if v
Proof. That p is at least a semimetric is direct; separation of distinct points follows from the assumption that v is strictly increasing. As for triangle inequality, we proceed as follows. LEMMA.
Any semimetric dejked by (2) possesses the following
additivity
properties:
forallx,y,zEL. Proof of lemma: Computation. Assume that v is a supervaluation. Define a move to connect x and y in L if and only if either x covers y or y covers x (i.e., x < y or y < x and there is no intermediate z strictly between them in the lattice ordering). Assign 1v(x) - v(y)] as the weight of this move. Define a metric F(x, y) as the least weighted sum of moves along a path connecting x and y, for any x, y EL. Then, clearly,
P(%Y)
G CL@>Y>,
(5)
since we can choose a path P from x to y which descends from x to the intersection x n y and then back up toy, and by the above lemma the length of this path will be just I-+, y). As for the converse inequality, suppose P’ is any other path in the lattice connecting x and y. Assume P’ has a spike of the form
,%,
where x2 covers both xi and xa .
Call this segment of the path S’. In this case wzcan:hoose an alternative path P” where we replace S’ by a segment S” leading down from xi to xi n xa and back up to xa . Since x2 covers both xi and xa , x2 = x1 u xa and we find weighted
length
of S” = v(xr) + <
~v(x,
v(x3)
u xa) -
-
21(x,
v(xi) -
n x3) v(xa) = weighted
length
of S’,
where the inequality follows because v is a supervaluation. By successive replacements of segments like S’ with segments like S” we see that a geodesic from x to y can be taken to have two linear segments, from x down to some minimum a, and then from z back up toy. By the way we have defined the weighting of moves, the length of such a
36
BOORMAN
path will be V(X) + v(y) since v is monotonic,
AND
OLIVIER
2v(z). Since z < x and z < y, we have z < x n y, and
P(x, Y) = 44 + V(Y) - 2744 > v(x) + V(Y) - 24x n Y) = CL@, Y). Thus p = p and the path from x down to x n y and back up toy is a least-moves Since fi is a metric, so is t.4. Conversely, let p be a metric. Then the triangle inequality
reduces
path.
Ax, 4 G /4x, Y) + P(Y, 4
(6)
a(~ n Y) + V(Y f-7 4 < V(Y) + v(x n 4,
(7)
to
and taking y = x u z this reduces to the definition COROLLARY.
of a supervaluation. on L and de$ne
Let v be a strictly decreasing function v(x, y) = v(x) + 'u(Y)
- 2+
u Y).
(8)
Then Y is a semimetric and is a metric if and only if v is a supervaluation. Proof. If v is a monotone decreasing supervaluation on L, then v is an increasing supervaluation on the dual lattice E, where x < y in z if and only if y < x in L (Birkhoff, 1967). Since meets in L are joins inL and vice versa, the result follows by the supervaluation theorem just proved. The metric v has a least-moves representation with minimal paths going from x up to the union x v y in L and then back down to y. 1.8 is an For the partition lattice 9(S) d’iscussed above, the metric C of Definition example of a metric of type (8) with the decreasing supervaluation being simply the number of cells of P. The weight of a move in this case is just unity. The metric D of Definition 1.9 is defined by the increasing supervaluation h(P) = I( 2
’ ; ’ ) = ; (z
where P = {cl, c2 ,..., cle}. Other possible information-theoretic considerations, are
z
1 ci 12) - ; ,
increasing
supervaluations,
derived
from
and
T log, I ci I!.
(11)
METRICS ON SPACES OF FINITE TREES
37
Each induces a metric on P(S) by the supervaluation theorem; we label the metrics induced by (10) and (11) E and M, respectively. (See Boorman and Arabie, 1972; Arabie and Boorman, 1972, for details of the behavior of these metrics and examples in the literature; see also Boulton and Wallace, 1969, for a comparison of (10) and (1 l).)
3. A LATTICE-THEORETIC
APPROACH TO TREE METRICS
By contrast to sets or partitions, hierarchical clusterings do not have a self-evident lattice structure. However, as we now show, it is possible to endow them with such a structure in a way that generates useful tree metrics. The central idea is a generalization of the partition-chain representation of Section 1. We will proceed in terms of three cases, corresponding to the three kinds of trees delineated in the Introduction. Metrics on valued trees turn out to be basic, and metrics on ranked and bare trees follow as special cases.
Case I. Valued trees. We may view such trees as increasing mappings from the non-negative real numbers into the partition lattice 9(S), where S is the set being clustered. Formally, DEFINITION 3.1. For any real 01E [0, co), let P(a) be that partition which clusters elements of S which have distance 01or less in the ultrametric u associated with the tree. That is, a and b are in the same cell of P(U) if and only if u(a, b) < ~1. We now define a distance m, on valued trees induced by an arbitrary partition metric 6: DEFINITION 3.2. m,(Tl , T2) = s: S[P,(a), P,(a)] da, where the P,(a) are the partitions associated with each Ti and 01as in Definition 3.1. This definition should be flagged as the central technical principle on which this paper is based (see also a closely related use of this principle in Jardine and Sibson (1971, p. 108)); it carries partition metrics over into metrics on valued trees, in two essential steps. First, each valued tree is associated with an increasing mapping P,(a) from the nonnegative reals into the partition lattice. Then the partition distance ~[P,(x), P,(X)] is integrated over the positive real line. As in the case of the metrics t, defined on ranked trees, m, is obviously a metric since it is a (continuous) linear combination of metrics. The integral is well-defined because Pi(x) and Pz(x), and hence 6[P,(x), I’s(x)], are piecewise constant functions of X. It is finite because for all x greater than the largest of the values in Tl and T, we have PI(x) = Pi = PI, so that the integrand is zero. Before discussing the properties of the metrics thus defined, we first show the following theorem.
38
BOORMAN THEOREM
3.1.
AND
OLIVIER
If 6 = D, then m, given in Dejinition 3.2 coincides with s in Definition
1.4. Proof. It can be shown (see Johnson, unpublished; Boorman and Arabie, 1972; Mirkin and Chorny, 1970) that D may be expressed as follows. Define the incidence matrix of a partition P of S to be an n x n (0, I)-matrix on S, where the (i, j)-th entry {P>ij = 1;
i and j not clustered together in P, i and j clustered together in P.
(12)
Then, we have (13) i.e., D is simply the city block metric on incidence matrices. Then we compute
since the last integrand in the above is just 1 where {Pl(r)},i and {P*(T)}~~ differ, and zero elsewhere, and the measure of this interval is just the difference between the corresponding ultrametric values in Tl and T, . This shows that s defined earlier is a special case of the present general approach. Additional insight about m, , where 6 is any metric of the kind discussed in Section 2, may be obtained by endowing the space of all increasing functions from [0, cc) to P(S) with a lattice structure. 3.3. Let F be the space of all functions F : [0, co) -+ B(S) with the following properties: DEFINITION
(i)
F is nonstrictly monotone increasing: ri < y2 2 F(r,) 6 F(r,); (ii) F is continuous from above, which in the present case means the following: ri 4 Y implies that, for some i. , F(r) = F(rJ for all i 3 i. ; (iii)
F(0) = PO , the trivial partition in which every cell is a single element of S;
(iv)
F(r) = PI, for some r.
Define a partial ordering on 9 by setting F < G if and only if F(r) < G(r) for all r. Clearly all F E F correspond to valued trees on S.
METRICS
ON
SPACES
OF FINITE
TREES
39
THEOREM 3.2. With the ordering of Definition 3.3, 9 is a lattice, with the join F v G definedby (F v G)(r) = F(r) u G(Y) and the meetF A G dejked by (F A G)(Y) = F(Y) n G(Y).
Proof. Direct by verifying that the properties (i) - (iv) are inherited under the operations of join and meet as defined. Theorem 3.2 amounts to the formation of a sublattice of a direct product lattice (see Birkhoff, 1967) Chapter 1, Theorem 7. We can use the same kind of consideration as in the proof of the supervaluation theorem to obtain information on the behavior of m, . Let us call a transformation from F to F’ in .F a move if the following conditions are satisfied: (i)
F(x) = F’(x) outside an interval
(ii)
F(x) andF’(x)
(iii)
[a, 6);
have the constant values P and P’ on [a, b);
P and P’ are connected
by a single move in ./p(S).
The length of the tree move is m,(F, F’) = (b - a)S(P, P’). If 6 is a partition metric derived from an increasing supervaluation v on P(S), so that it is a least-moves metric where the shortest path from P to Q goes down from P to P n Q and back up to Q, we can show that m, is a least-moves metric on F and the shortest path from F to G goes from F down to the intersection F A G and back up to G. Since mg is a metric and since m,(F, G) = m,(F, F A G) + m,(F A G, G) (this follows from the analogous relation for a), it will be enough to show that for any F and F’ with F’ < F there is a sequence of moves leading down from F down to F’ whose lengths sum to m,(F, F’). Such a sequence is easy to construct. Since F and F’ are piecewise constant functions we have 0 = x,, < x1 < ... < x*-r < xk: such that F and F’ have the constant values Pi , P,’ on each interval [xi-r , xi) and F(x) = F’(x) = P, on [+ , co). First make a sequence of moves on [x0 , x1) which take PI down to PI’; follow this by a sequence of moves on [x1 , x.J which take Pz down to P2’; and so forth. For each i, the total length of the moves made on [xi-i , xi) is (xi - +J8(Pi , P,‘). But, by definition,
ms(F,F’) = 1%S(F(x),F’(x)) dx = t 0
i=l
(xi - .x-~) 6(P, , Pi’).
(15)
If 6 is derived from a decreasing supervaluation an analogous argument shows that ms is a least-moves metric where the shortest path from F to G goes up to F v G and then back down. As an illustration, consider the metric m, derived from the partition metric C. C is derived from a decreasing supervaluation and gives each partition move a length of 1. Hence the shortest path from F to G relative to m, will go through F v G, and the length of each move on a segment [a, 6) will be (b - a). Figure 3 illustrates the shortest path from Fl to F4 ; the reader can verify that m,(F, , F4) = 201 - /3 - y and that F3 = Fl v F4 .
40
BOORMAN
AND
OLIVIER
Case 2. Ranked trees. Cases of this kind typically occur when we do in fact have real numbers associated with the nonterminal nodes, but we believe them only up to an order-preserving transformation. Let us assume first that we have a binary tree with no tied ranks. If we assign to each node an integer corresponding to its rank we have a valued tree, and we can apply the
Fl FIG.
3.
Least-moves
a-B path
F2
-
F3
B-Y
from
Fl to F, relative
-
F4
a-B
to the metric
mc ,
metrics developed for Case 1; in fact, these are just the metrics t, of Section I. In other words, the metrics t, result from a way of embedding the space of ranked trees in the space 9 of valued trees. Note, however, that in a strict sense the metrics t, are not least-moves metrics on ranked trees, since a move in 9 may take a tree identified with a ranked tree into one which is not identified with any ranked tree. It is the availability of the least-moves interpretation which makes valued trees basic. We now briefly raise a point of technical interest. In the more general case, where we are not restricted to binary trees and tied ranks can occur, we suggest a somewhat different approach from the naive one used in defining t, . Specifically, we proceed by analogy to the theory of rank order correlations by assigning mean ranks when ties are present (compare Kendall, 1962, Chapter 3). For example, assume that we have a tree corresponding to the following partition sequence:
PI = {{a,% {c>,VI, {e>>, Pz= Ha,b,c,4, @>>, Ps= {Ia,b,c,4 4-1. In the transition
between
PI and Pz , two separate binary mergings
occur, and we assign
METRICS
ON
SPACES
OF FINITE
41
TREES
PI the rank 1, Pz the rank 2.5, and P3 the rank 4. The defined by
associated
function
F will be
(16)
In general, given any ascending
sequence of partitions
PO= PI < P2 < ..’ < Pk < P,,, = P, in the partition
lattice corresponding oIf = n -
to a rankd treee, define LX,,= 0,
&(I Pi 1 + 1 p,+1 ( -
1)
for
i = l,..., 12,
(17)
and define a function F in 9 by F(r) = Pi , czi+ < Y < 01~and F(r) = P, for Y > 01,~. A metric corresponding to this way of handling ranked trees we call r8 , to distinguish it from the analogous metric for valued trees m, . In the case of a binary tree without tied ranks 1 Pi / = n - i + 1 and thus CQ = i, so that this procedure generalizes the way of identifying ranked trees with valued trees given above. Its advantage is that when the ranks are derived from real numbers attached to subtrees the distance function rs will be more stable under small perturbation of these numbers than the original t 6.
Case 3. Bare trees. Continuing along similar lines we may make bare trees into valued trees by assigning to each nonterminal node the size of its node set. The metrics m, can then be employed as before, and we may call them b, in applications to bare trees. Here again there is no least-moves interpretation for bare trees because a move in 9 may take a valued tree which is identified with a bare tree into one which is not. As an example, if the trees Fl and F4 in Fig. 3 are treated in this way then iy = 3 and /3 = y = 2, so that b,(F, , F4) = 201 - p - y = 2. The valued trees F, and F3 on the least-moves path do not correspond to bare trees. In contrast to the bare-tree metric /3 defined in Section 1, computation of the metrics b, does not require an optimal assignment algorithm, so that they will generally be much less laborious to compute for trees of substantial size.
4. DECOMPOSITION
AND
SENSITIVITY
PROPERTIES
OF METRICS
The present section discusses the way in which small changes in the structure of a tree are handled by the various metrics we have been considering. Such analysis attempts to provide better intuition for the meaning of the metrics in practical situa-
42
BOORMAN
AND
OLIVIER
tions. For convenience, we employ letters G, H,... to refer to subtrees and introduce the notation G for the node set of G and a(G) for the value of its root. The following decomposition property of a metric is worth discussion: DEFINITION 4.1. A tree metric p is additive if it has the following property: If T1 and Tz are two trees on a set of items such that T1 has disjoint subtrees G, , G, ,..., G, and T, has subtrees H1 , Hz ,..., Hk with Gj = Rj and CX(G& = CU(H~) for j = 1, 2 ,..., k, and if T1 and T, are identical except within these subtrees, then p(T, , TJ = Cj”=, CL(G~, ffj). Note that implicitly Definition 4.1 refers not just to a metric p but to a family of metric rules, one for each possible item set S. We might call a metric “local” if the effect of a small “local” change in a tree depends only on the neighboring structure of the tree. Additivity is a very strong form of “localness” property.
THEOREM
Proof.
4.1.
The metric p of Section 1 is additive.
The following lemma is the core of the proof.
LEMMA. Let S beanyfinite setand let {si}zl and(ti}El betwo collectionsof subsets of S. Let C$be a permutation of the integers1, 2,..., m which minimizesCc, 1sjAtOcj)1.If for somei andj, si = tj , then4’ minimizing the samesumcan bechosen for whichC’(i) = j.
Proof. setting
Suppose #(i) = p # j, and 4(q) = j. Define a new permutation 4’ by k = i, k = q, k # i, q.
(18)
Then we have
1 I skAt4tk)I - C I skAtd’(k)I = I -dtp I + I -4tj I - I SiAtj I - I SqAtpI k
k =
1 SJtp
1 +
1 SgAsi
) -
1 Sat,
) 2
0,
where the last inequality follows by the triangle inequality for the symmetric difference metric on sets.Hence 4’ must alsominimize the sum zy=, 1sjAt+,,cjj1,and the lemma is proved. Now supposeT1 and T, are asin the definition of additivity, and considerthe optimal assignmentof the node setsof T1 to the nodesetsof T, . The proof of the lemmashows that all node setsof nodesnot in the subtreesGi , Hj can be matched exactly and hence contribute nothing to the sum, while clearly nodesin Gi are matched to nodesin Hi for i = 1, 2,..., k; the result follows.
METRICS
ON
SPACES
OF FINITE
TREES
43
The additivity property also holds for the lattice-based metrics on valued trees defined in Section 3, provided that the supervaluation is of the additive form specified by the following equation: 4p>
(19)
= c f(l c 1). CEP
For such supervaluations, if P is a partition and c is a cell of P, and P’ is the finer partition which results from splitting c into smaller cells ci , ca ,. .., ch , then the distance between P and P’ in the metric is
I .(P> - .(P’)I = 1.1.(lc I) - ilf(l
G I)1.
Thus the weight of a move which consists of splitting the cell c depends only on the sizes of the mother and daughter cells, and does not depend at all upon the rest of the partitioning. All supervaluations considered in Section 3 are of this type. Rather than give a formal proof of additivity for the lattice-based metrics in Case 1 (valued trees), we can exploit their characterization as least-move metrics to give the following informal argument. Let 7’r and T, be as in the definition of additivity, and consider the least-moves path from Tl to T, . Clearly this path consists of moves which take Gi into HI , G, into H, , and so forth. Since the cost of a move in Gi is independent of the structure of the tree outside of Gi , the same moves with the same costs would constitute the least-moves path from Gi to Hi if they were independent trees, rather than subtrees, and this fact suffices for additivity. It is easy to see that the metrics on bare trees derived from the lattice-based metrics on valued trees in Section 3 inherit the additivity property, but the metrics on ranked trees do not. The difference lies in the way values are assigned to nodes in the two cases. For bare trees the values assigned to nodes in a subtree depend only on the subtree, and are independent of the tree in which it is embedded, while for ranked trees the values of nodes within a subtree depend on how they are ranked relative to other nodes outside the subtree. The additivity property says that if two trees differ only on a subtree, then their distance depends only on that subtree. There is a second, in a sense dual, decomposition property: DEFINITION 4.2. A metric TV is subtree-opaque if, whenever trees Tl and T2 share a subtree G, then p( Tl , Tz) depends only on / G (, and not at all upon the internal structure of G. To show that the metric /3 is subtree-opaque, we note that in the optimal assignment of node sets of Tl to those of T, all node sets of nodes in G can be matched exactly, and so contribute nothing to the sum of the symmetric differences. For the lattice
44
BOORMAN
AND
OLIVIER
metrics derived from additive functions as in (19), observe that in the sequence of moves which takes Tr into T, none of the nodes in G need to be moved. The lengths of moves of nodes which are not above G are independent of G, while the lengths of moves above G depend only on / G 1. The combined force of the additivity and subtree-opaqueness properties is to make it possible to discuss the effect of changes in a tree involving a few neighboring nodes, without having to take into account the structure of the tree above or below these nodes. We now proceed to explore a class of such changes, with the aim of clarifying some of the sensitivity properties of the metrics. Specifically, consider the trees shown in Fig. 4, where X, Y, and 2 represent subtrees. For brevity we write x, y, and .a for the sizes of their respective node sets; and we note that T,, is Tl A T, .
,
I
XY
I
z
x
Tl
FIG.
4.
Y
I
I
I
,
z
x
Y
z
T12
Two
trees
and their
T2
intersection
tree.
In the case of the metric p the relevant nodes of Tl are represented by the sets X u p and X u P u 2; those of T, by Y u z and X u y u z; and those of T,, by X u Y u Z and X u Y u z (recall the convention that we replicate the node set of a nonbinary node a number of times equal to the number of edges emanating from the node, minus one). The lemma involved in the proof of Theorem 4.1 assures us that the optimal assignment matches nodes in the shared subtrees X, Y, and 2 exactly, so that we have /3(T,, T,) = ((x u P)d (Y u z)l = x + x,
(21)
/3(T,, T,,) = 1(x u Y) A (x u F u z)l = z,
(22)
/l( T, , T,,) = I(7 u z) A (x u y u z)/ = x.
(23)
Note that T12 is between
Tl and T, in the sense that p(T, , T,,) + ,8(T12 , T,) =
METRICS
ON
SPACES
OF FINITE
45
TREES
/3( Tr , T,). The replication convention for non-binary nodes was chosen so as to maintain this property. We see that the distance in the fi metric from Tl to T, is insensitive to the size of the subtree Y and varies linearly with the sizes of the subtrees X and Z. If X and Z are large we can think of Y as having moved across a major constituent boundary; the metric reflects this in a larger distance value. For the m, metrics derived in Section 3 the change from Tl to T,, means moving the X u Y node up from /3 to oi. Referring to Eq. (20) we see that the length of the move will be
(a - B)lfb + Y) -f(x) where lattice.
f
is the function on cell sizes which For the C metricf(x) = 1, and
- f(Y)l?
induces the supervaluation
on the partition
m,(T,, T,,) =a-P. For the metric D (Definition
1.9)
f (3)
(24)
= (i) and the distance will be
mATI , T,,) = (a - B)
to the other two supervaluations
(25) mentioned
at the end
m,(T, , TJ = (a - p)[(x + y) lo&(x + y) - x log, .x - Y ‘o&Y]>
(34
m,dT, , T12)= (m- 8 1% (” 1’).
(27)
The distances from T,, to T, may be computed in the same way, with 01 - y replacing cy - /3 and z replacing x. The relation m,( Tl , T,,) + m8(T12, T,) = m,(T, , T2) holds in each case: this follows by computation for m, and by the nature of the leastmoves path for the other three metrics. We note that if we use the b, metric on bare trees as suggested in Section 3, assigning each node the cardinality of its node set, then we have 01 = x + y + z, p = x $- y, and y = y + z, and it follows that b&T, , T,,) = z, 6,. T,, , T,) = x, and b,( T, , Ta) = x + z. Hence in this case the distances coincide with those given by the metric /3. The two metrics are not identical, in general, however. The distance from Tl to T, in m, is insensitive to the sizes of the subtrees. For the
46
BOORMAN
AND
OLIVIER
other metrics the distance is an increasing function of the sizes. We compute two special cases as examples, Ify = 1, x = z = h, and if h is large, relative to unity, we have
m,( Tl , T,) = (20122
(2a
P - r)[@ + 1)k,(h + 1) - h log, 4
- B - y)[log,@+ 1) + log, 4,
%f(~l I T2) = (2a - P - Y) logdh + 1).
(29) (30)
The distance varies linearly with the size of X and 2. for m, and logarithmically for mM and mE . Note that the behavior of mE and mM is approximately identical except for a linear transformation. Second, suppose x = y = z = h and h is large relative to unity. Then
- P - Y) h2 mdT, , TJ = @a: - P - r)W5
m,(T, , TJ = (2~
m&T,,
Td = P - B - Y) ~og2K2~)VW21 g (201 - j3 - y)[2h - 3 log,(?Th)].
(31) (32)
(33)
Roughly speaking, we can summarize these results by saying that m, is much more sensitive to changes near the top of a tree than changes near the bottom, m, is equally sensitive to both, and mM and mE behave in an intermediate fashion.
5.
REPRISE
The story thus far has been long and tangled, and it may be helpful to the reader if we summarize the main points. First, we introduced a lattice-theoretic orientation to the problem of defining metrics, and in particular the Supervaluation Theorem 2.1, which links supervaluations and least-moves metrics on lattices. A number of metrics on spaces of partitionings have been shown to be based on supervaluations on the partition lattice, and hence to be least-moves metrics (Boorman, 1970; Boorman and Arabie, 1972). Definition 3.2 provides a way of carrying these partition metrics over into metrics on valued trees. Theorem 3.2 shows that the lattice structure of partitions can also be carried over to the domain of valued trees, and the derived tree metrics are least-moves metrics on this lattice. These metrics on valued trees are extended in reasonably natural ways to ranked and bare trees, but they do not inherit the leastmoves characterization. Section 4 then shows that the metrics for valued and bare trees defined in Section 3.
47
METRICS ON SPACES OF FINITE TREES
as well as the /3 metric of Section 1, possess some convenient properties: Roughly speaking, if two trees differ only within subtrees, their distance depends only on the distances between subtrees; while if two trees differ only outside a subtree, their distance does not depend on the internal structure of the subtree. These properties make it possible to discuss the way in which small “local” changes in a tree are reflected by the different metrics, and the metrics are shown to differ in the relative weight they give to different kinds of local changes. 6. EXAMPLES We illustrate the use of some of the metrics we have proposed with a set of five trees on the set S of fifteen common English kinship terms: Grandfather, Grandmother, Grandson, Granddaughter, Father, Mother, Son, Daughter, Brother, Sister, Uncle, Aunt, Nephew, Niece, Cousin. The data on which this section is based were collected by Rapoport and Fillenbaum (1972; see also Fillenbaum and Rapoport, 1971). Using two different methods, they obtained distance estimates from groups of subjects for elements of the above semantic domain. The first method required the construction of labeled, unrooted trees analogous to those discussed in Rapoport (1967), and the second method required exhaustive ranking of all the interitem distances. Groups KTM and KTF, males and females, respectively, used the tree-construction method with instructions to construct the trees on the basis of “simiiarity of meaning.” Groups KLTM and KLTF used the same method, but were asked to construe similarity in terms of mutual affection. Group KC used the ranking method. Clusterings were obtained from the five distance matrices using Johnson’s maximum method (1967); they are shown in tree form in Fig. 5-9.
Gf
Gm
Gs
Gd
F
M
FIG. 5. Valued tree for group KTM. Rapoport (1971), Chapter 4, Figs. 3-7.1 480/10/r-4
So
D
B
Si
U
A
Ne
Figures 5-9 are redrawn
NI
C
from Fillenbaum
and
48
BOORMAN
Gf
Gm Gs
Gd FIG.
Gf
Gm
Gs FIG.
Gd
F
D
AND
So
M
OLIVIER
B
Si
U
A
Ne
Ni
C
Ne
Ni
C
6. Valued tree for group KTF
F
M
So
D
B
Si
U
A
7. Valued tree for group KLTM.
Rapoport and Fillenbaum (1971) discussin detail the differences among the trees obtained for the given domain. Their only attempt at quantification, however, is to employ a normalized form of the D-metric on partitions to measurethe distancesamong the partitions PI0 defined by the five (ranked) trees asin the construction of t8 above. The results are shown in Table 2. Tables 3-8 show the distances among the five clusterings for a selection of the metrics defined in this paper. Specifically, we calculated m, and mDfor the clusterings considered as valued trees; the analogous metrics yc and yD for the clusterings considered as ranked trees; and the extension b, of m, to bare trees, together with the set metric j? on bare trees. Since the input proximity matrix for the
METRICS
FIG.
FIG.
ON
8.
SPACES
Valued
9.
Valued
OF FINITE
tree for group
tree for group
TREES
49
KLTF.
KC.
Johnson program was only ordinally comparable between group KC on the one hand and the other four groups on the other, the group KC is not included in the valued-tree metric computations. Computations were performed by hand for /3, program referred to in the Introduction for wz, , Ye , mD, and yD , by the computer and b, . Tables 2-8 also display the rank ordering of the interpoint distances for each case considered; for the purposes of comparing metrics these are more informative than the actual metric values. Kendall’s rank order correlation coefficient 7 (Kendall, 1962) was computed for the six metrics on the basis of their respective ordering of distances
50
BOORMAN
AND
OLIVIER
TABLE
2
Distances Among Trees (from Fillenbaum
and Rapoport, 1971)
Distances KTM
KTF
KTM
-
KTF KLTM KLTF KC
.067 .295 .067 .076
Ranks
KLTM
KLTF
KC
KTM
KTF
KLTM
7 1 2
8 9
KLTF
KC
.229 .038 .057
.267 .286
.095
3.5 10 3.5 5
-
TABLE
6
-
3
Distances Among Valued Trees (mc Metric) Distances KTM KTM
-
KTF KLTM KLTF
42.1 103.7 90.7
KTF
Ranks
KLTM
KLTF
KTM
KTF
KLTM
4 2
-
KLTF
96.4 80.4
1 5 3
111.2
TABLE
6
-
4
Distances Among Valued Trees (mo Metric) Distances KTM KTM KTF KLTM KLTF
KTF
KLTM
Ranks KLTF
KTM
KTF
1 5 2
4 3
KLTM
KLTF
279.7 1003.6 435.1
861.4 513.3
1127.7
-
6
-
METRICS
ON
SPACES
OF FINITE
TABLE
51
TREES
5
Distances Among Ranked Trees (rc Metric) Ranks
Distances
KTM KTF KLTM KLTF KC
KTM
KTF
22 35 28 24
33 26 32
KLTM
28 41
KLTF
KC
36
-
TABLE
KTM
KTF
1 8 4.5 2
I 3 6
KLTM
4.5 10
KLTF
..~ KC
9
-
6
Distances Among Ranked Trees (TD Metric) Distances
KTM KTF KLTM KLTF KC
KTM
KTF
KLTM
35 158 66 87
161 58 94
134 159
Ranks KLTF
KC
-
90
TABLE
KTM
KTF
KLTM
KLTF
1 8 3 4
10 2 6
I 9
5
KC
-
7
Distances Among Bare Trees (bc Metric) Distances
KTM KTF KLTM KLTF KC
KTM
KTF
4 22 9 9
26 11 11
KLTM
31 31
Ranks KLTF
12
KC
-
KTM
KTF
KLTM
1 7 2.5 2.5
8 4.5 4.5
9.5 9.5
KLTF
KC
6
-
52
BOORMAN
AND
TABLE
OLIVIER
8
Distances Among Bare Trees (/3 Metric) Distances KTM KTM
-
KTF KLTM KLTF KC
4 12 9 9
KTF 16 9 9
KLTM
21 21
Ranks KLTF
12
KC
-
KTM -
KTF
KLTM
KLTF
1 6.5 3.5 3.5
8 3.5 3.5
9.5 9.5
6.5
KC
-
among the four trees KTM, KTF, KTLM, KTLF (see Table 9). The alternative metrics for each of the three cases (valued, ranked, and bare) gave good agreement in the ordering. For domains as highly structured as kinship terms, one often can formulate one or more a priori hypotheses about the tree structure, based on theoretical considerations outside the body of data at hand. In the case of the English kinship terms under consideration, two distinctly different componential analyses have been offered in the TABLE Rank Correlation
(Kendall’s
me mD TC TD
bc B R&F
.867 .690 .600 .733 .828 .690
9
T) of Distances Among Trees KTM,
KTF,
KLTM,
and KLTF
mD
TC
TD
bc
B
R&F
.867 .552 .467 .867 .828 .552
.690 .552
.600 .467 .828
.733 .867 .414 .600
.828 .828 .500 .690 .966
.690 .552 .643 .552 .414 .500 -
.828 .414 .500 .643
.600 .690 .552
anthropological literature. Wallace and Atkins (1960) dimensions: sex, generation (five levels, +2, to -2), including brother and sister, uncle and aunt, niece and only cousin). Romney and D’Andrade (1964; see also propose an alternative analysis with four dimensions:
.966 .414
.500
propose an analysis with three and lineality (lineal, co&e&, nephew, and ablineal, including Wexler and Romney, in press) sex, generation (3 levels: 12,
METRICS
ON
SPACES
OF FINITE
53
TREES
51, 0), reciprocity (ascending vs. descending generation), and lineality (2 levels, direct and collateral). These componential analyses are discussed in detail in Fillenbaum and Rapoport (1971), and are displayed in Tables 10 and 11. TABLE Componential Adapted
Analysis from from Fillenbaum
10 Wallace and Atkins (1960). and Rapoport (I 972) I
Lineal
Colineal
~ +2
)
GRANDFATHER
GRANDMOTHER
FATHER
MOTHER
~ UNCLE i- 1 0 SON
~ AUNT
BROTHER N---
(EGO)
-I
SISTER
~ GRANDSON I
NEPHEW GRANDDAUGHTER
~ NIECE
~ I
I
TABLE Componential Adapted
11
Analysis from Romney and D’Andrade from Fillenbaum and Rapoport (1971)
(1964).
Direct GRANDFATHER
t-2
~
1 COUSIN I
DAUGHTER I
--2
Ablineal ___-
~
Collateral GRANDMOTHER
-
GRANDSON
GRANDDAUGHTER
FATHER
MOTHER
41
UNCLE
AUNT
,-SON
-j
DAUGHTER
____
NEPHEW
NIECE __-___-
0
BROTHER
~
SISTER
COUSIN /
A componential analysis together with a specified ordering of the dimensions determines a bare tree, and with metrics on bare trees it becomes possible to compute the distances from these trees to the empirically derived ones. Figures 10-14 show two
54
BOORMAN
AND
OLIVIER
trees derived in this way from the Wallace and Atkins analysis and three from the Romney and D’Andrade analysis. Tree WAl, derived from the Wallace and Atkins analysis, corresponds to a model which divides the terms according to lineality, then subdivides by generation, and finally subdivides by sex. The division by lineality and generation are reversed in WA2. The three orderings considered for the Romney and D’Andrade analysis are: lineality, generation, reciprocity, sex (RDl); generation, lineality, reciprocity, sex (RD2); and generation, reciprocity, lineality, sex (RD3). These five trees are the only orderings of the componential analyses
Gf
Gm
10.
FIG.
Gf FIG.
FIG.
12.
Gd
Tree
Gm
Gs
Tree
11.
Gf
Gs
Gm
Tree
Gs
F
M
derived
from
Gd
M
F
derived
from
Gd
M
derived
F
from
So
D
B
Wallace
U
A
D
Romney
U
A
and Atkins
So
Wallact
So
Si
D
Ne
Si
U
Ni
C
analysis
Ni
and Atkins
B
Ne
8
(WAl).
Si
analysis
A
and D’Andrade
Ne
Ni
analysis
C’
(WA2).
C
(RDI).
METRICS
Gf
FIG.
Gm
13.
Gf
FIG.
14.
Gs
Tree
ON
Gd
F
derived
Gm
Gs
Tree
SPACES
M
So
from
Gd
F
derived
OF FINITE
U
D
Romney
M
U
from
TREES
A
Ne
Ni
B
and D’Andrade
A
So
Romney
D
Ne
C
analysis
(RD2).
B
c
NI
and D’Andrade
SI
Sl
analysis
(RD3).
consistent with the model of Wallace and Atkins, which postulates that sex is a weaker distinction than either lineality or generation; and of Romney and D’Andrade, which postulates in addition that reciprocity is a weaker distinction than generation (see discussion in Fillenbaum and Rapoport, 1971). Table 12 shows the distance matrix for these five trees together with the five empiTABLE Distances KTM
Among KTF
KTM KTF
KLTM
and Theoretical
KLTF
KC
Trees
WA1
(bc Metric)
WA2
RDI
RD2
RD3
22
9
9
43
54
7
32
36
-
26
11
11
47
58
11
36
40
4 4
Empirical
12
KLTM
22
26
-
31
31
21
56
21
30
38
KLTF
9
11
31
-
12
50
61
14
39
43
9
11
31
12
-
44
57
10
35
39
WA1
KC
43
47
21
50
44
-
59
36
45
49
WA2
54
58
56
61
57
59
-
18
7
11
21
14
10
36
55
55 -
26
RDl
29
37
RD2
32
36
30
39
35
45
26
29
RD3
36
40
38
43
39
49
18
37
8 8
56
BOORMAN
AND
OLIVIER
rical ones, using the metric b, for bare trees, and Fig. 15 shows the result of a ShepardKruskal nonmetric multidimensional scaling of this matrix in two-dimensional Euclidean space (Kruskal, 1964). Th e results indicate that RDl is by far the bestfitting of the theoretically derived trees, which is in good accord with a casual impression. Neither of the Wallace and Atkins trees provides as good a fit to the Rapoport-Fillenbaum data.
RD2
FIG. 15. Shepard-Kruskal (stress = .035).
scaling
from
7.
.
distances
WA2.
l
RD3
among
clusterings
of
kinship
terms
DISCUSSION
For the metrics we have discussed, several important problems remain. As the sample calculations show, the scale of values of the different metrics varies widely, and it would be convenient to have normalizations which made the values more comparable. It is not clear how best to effect this normalization. The values of the metrics m, depend on the values on the trees and can be arbitrarily large or small. For the rs and b, and for /3, the range of values depends only on the size of the set of items and the choice of the partition metric S, but the computation of these ranges is a combinatorial problem of some difficulty. On another line of development, the lattice-theoretic considerations in Section 3 can be generalized in various directions; for example, by using some measure other than Lebesgue measure on the real line in defining the integral in Definition 3.2. It would also be possible to consider ultrametrics which assume values in more general structures than the real numbers, and to define distances between such generalized ultrametrics by various streamlined extensions of Definition 3.2 (compare Melter, 1968). Such extensions of the method are, however, likely to be of purely theoretical interest.
METRICS
ON
SPACES
OF FINITE
TREES
57
For practical applications a distribution theory for the values of the metrics would be useful. There seem to be two possible approaches. One could take the “shape” of two trees and the node values, if any, as given, and try to derive a distribution for a particular metric over random assignment of the item labels; Barton and David (1966) have used this approach in studying the intersection of random graphs, and in fact their results give the distribution of the partition metric D under certain conditions. A second strategy would be to assume the use of a particular hierarchical clustering method and investigate the distribution of distances between trees derived by it from pairs of random proximity matrices; Ling (1971) h as made a start in this direction.
ACKNOWLEDGMENTS We are indebted to Phipps Arabie, Paul Holland, Paul Levitt, Francois Lorrain, and Roger Shepard for concrete comments which have substantially improved this paper. We must also thank the referees for detailed and useful criticisms. Support for the present investigation was obtained in part from Roger Brown of the Harvard Psychology Department under PHS Grant HD-02908 from the National Institute of Child Health and Development and also from Harrison White of the Harvard Sociology Department under NSF Grant GS-2689.
REFERENCES P., partitions. Sciences, BARTON, D. Research
ARABIE,
BIRKHOFF,
BOORMAN, S. A. Multidimensional scaling of measures of distance between Technical Report, The Stanford Institute for Mathematical Studies in the Social July, 1972; Journal of Mathematical Psychology, in press. E., AND DAVID, F. N. The random intersection of two graphs. In F. N. David (Ed.), papers in statistics. New York: Wiley, 1966. G. Lattice theory. Revised edition. Providence, R. I.: American Mathematical Society, AND
1967. S. A. Metric spaces of complex objects. Unpublished senior honors thesis, Harvard Division of Engineering and Applied Physics, 1970. BOORMAN, S. A., AND ARABIE, P. Structural measures and the method of sorting. See Shepard et al. (Eds.), 1972. BOULTON, S. M., AND WALLACE, C. S. The information content of a multistate distribution. Journal of Theoretical Biology, 1969, 23, 269-278. CHOMSKY, N., AND MILLER, G. A. Introduction to the formal analysis of natural languages. In R. D. Lute, R. R. Bush, and E. Galanter (Eds.), Handbook of mathematical psychology, Vol. 2. New York: Wiley, 1963. FILLENBAUM, S., AND RAPOPORT, A. Semantic Structures: An experimental approach. New York: Academic Press, 1971. FI.AMENT, C. Applications of graph theory to group structure. Englewood Cliffs, NJ: PrenticeHall, 1963. FORD, L. R., AND FULKERSON, D. R. Flows in networks. Princeton, NJ: Princeton University Press, 1962. BOORMAN,
College,
58
BOORMAN
AND
OLIVIER
FRIEDELL, M. F. Organizations as semilattices. American Sociological Review, 1967, 32, 46-54. GOODMAN, L. A., AND KRUSKAL, W. H. Measures of associations for cross classifications. Journal of the American Statistical Association, 1954, 49, 732-764. HAKIMI, S. L., AND YAW, S. S. Distance matrix of a graph and its realizability. Quarterly of Applied Mathematics, 1965, 22, 305-317. HALMOS, P. R. Measure theory. Princeton, NJ: van Nostrand, 1950. HARTIGAN, J. A. Clustering a data matrix. Paper read at the Second Annual Meeting of the Classification Society, North American Branch, The University of Western Ontario, London, Ontario, May 6-7, 1971. HAYS, W. L. Lattice models in psychological scaling. Unpublished, University of Michigan. JARDINE, N., AND SIBSON, R. Mathematical taxonomy. New York: Wiley, 1971. JOHNSON, S. C. Hierarchical clustering schemes. Psychometrika, 1967, 32, 241-254. JOHNSON, S. C. Metric clustering. Unpublished, Bell Laboratories, Inc. KENDALL, M. G. Rank correlation methods. 3rd edition. London: Griffin, 1962. KRUSKAL, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 1964, 29, l-28. LING, R. F. Probability theory of cluster analysis. Unpublished, University of Chicago, April, 1971. MELTER, R. A. Autometrized unary algebras. Journal of Combinatorial Theory, 1968, 5, 21-29. MILLER, G. A. Psycholinguistic approaches to the study of communication. In D. L. Arm (Ed.), Journeys in science: Small steps-great strides. Albuquerque, NM: University of New Mexico Press, 1967. Pp. 22-73. MILLER, G. A. A psychological method to investigate verbal concepts. Journal of Mathematical Psychology, 1969, 6, 169-191. MILLER, G. A., AND CHOMSKY, N. Finitary models of language users. In R. D. Lute, R. R. Bush, and E. Galanter (Eds.), Handbook qf mathematical psychology, Vol. 2. New York: Wiley, 1963. Pp. 419-491. MIRKIN, B. G., AND CHORNY, L. B. Ob izmerenii bliznosti mezhdu razlichnymi razbieniami konechnovo mnozhestva ob’ektov. (On measurement of proximity between various partitions of a finite set.) Avtomatika i Telemekhanika, 1970, No. 5, 120-127. RAPOPORT, A. A comparison of two tree-construction methods for obtaining proximity measures among words. Journal of Ve? bal Learning and Verbal Behavior, 1967, 6, 884-890. RAPOPORT, A., AND FILLENBAUM, S. Experimental studies of semantic structures. See Romney et al. (Eds.), 1972. RESTLE, F. A metric and an ordering on sets. Psychometrika, 1959, 24, 207-220. RFSTLE, F. Psychology of judgment and choice, u theoretical essay. New York: Wiley, 1961. RIORDAN, J. An introduction to combinatorial analysis. New York: Wiley, 1958. ROMNEY, A. K., AND D’ANDRADE, R. G. Cognitive aspects of English kin terms. In A. K. Romney and R. G. D’Andrade (Eds.), Transcultural studies in cognition. American Anthropologist, 1964, 66 (Special Publication). ROMNEY, A. K., SHEPARD, R. N., AND NERLOVE, S., Multidimensional Scaling: Theory and applications in the behavioral sciences, Vol 2. (Applications.) New York: Seminar Press, 1972. SHEPARD, R. N., Some principles and prospects for the spatial representation of behavioral science data. Paper presented at the Mathematical Social Science Board Advanced Research Seminar, Irvine, California, June 13-18, 1969. SHEPARD, R. N., A taxonomy of some principal types of data and of methods for their analysis. In Shepard et al., 1972. SHEPARD, R. N., ROMNEY, A. K., AND NERLOVE, S. (Eds.), Multidimensional scaling: Theory and applications in the behavioral sciences, Vol. 1. (Theory) New York: Seminar Press, 1972.
METRICS SzAsz,
ON
SPACES OF FINITE
TREES
59
G. Introduction to lattice theory. 3rd edition. New York: Academic Press, 1963. A. F. C., AND ATKINS, J. The meaning of kinship terms. American Anthropologist, 1960, 62, 58-79. WARD, J. H., JR. Hierarchical grouping to optimize an objective function. Journal of the America Statistical Association, 1963, 58, 236-244. WEXLER, K., AND ROMNEY, A. K. Some cognitive implications derived from multidimensional scaling. See Romney et al., 1972. WALLACE,
RECEIVED:
October
17. 1971