JOURNAL
OF MATHEMATICAL
PSYCHOLOGY
22, 157-175
On the Reciprocity
(1980)
of Proximity
Relations
GIDEON SCHWARZ Hebrew
University AND
AMOS TVERSKY’ Stanford
University
The degree of reciprocity of a proximity order is the proportion, P(l), of elements for which the closest neighbor relation is symmetric, and the R value of each element is its rank in the proximity order from its closest neighbor. Asstmkg a random sampling of points, we show that Euclidean n-spaces produce a very high degree of reciprocity, P(1) > 3, and correspondingly low R values, E(R) < 2, for all n. The same bounds also apply to homogeneous graphs, in which the same number of edges meet at every node. Much less reciprocity and higher R values, however, can be attained in finite tree models and in the contrast model in which the “distance” between objects is a linear function of the numbers of their common and distinctive features.
Relations of proximity play an important role in the behavioral and social sciences. The proximity between pairs of colors, phonemes, emotions, animals, languages and countries has been investigated using a variety of empirical methods including association, rating, sorting, substitution and covariation. The data obtained by these methods can be treated as actual measures of distance or proximity between all pairs of objects under study, or they can be treated as providing only the ranking of these proximities. Furthermore, in many situations it seems desirable to relax the assumption of comparability and to assume that the data provide only conditional orderings, namely, rankings of all the objects by their proximity to a given object. Let T = {X, Y, Z,...} b e a set of objects or points for which a conditional proximity order is defined. That is, for any tied point X, all other members of T are ordered with respect to their proximity to X, Such orders can be obtained directly by asking subjects to rank all objects by their similarity to X, or they can be inferred from the frequency with which each object is associated, confused or classified with X. For X, Y in T, let X(Y) be the rank of Y in the proximity order defined by X, and let x’ denote the closest neighbor of X in T, that is X(X’) = 1. Note that each point has one closest neighbor (assuming the proximity order has no ties), but a point may be the closest neighbor of several different points. Next, let R(X) = X’(X), that is, R(X) is the rank 1 To
whom
reprint
requests
should
be sent.
157 0022-2496/80/060157-19$02.00/O Copyright Q 1980 by Academic Press, Inc. All rights of reproduction in any form reserved.
158
SCHWARZ
AND
TVERSKY
of X in the proximity order defined by its closest neighbor. For example, if proximity is interpreted as the rankings of friends in a given class, then R(X) is the position of X in the rankings of her best friend X’. The degree of reciprocity of a proximity relation may be defined as the proportion of elements X E T for which R(X) = 1, or equivalently as the probability that the closest neighbor relation is reciprocal, denoted P(1) = P(R(X) = 1). More generally, let P(i), i = 1, 2 ,..., be the probability that R(X) = i, XE T. Some proximity relations, such as geographical distance between cities, are likely to exhibit high reciprocity, or low R values. Other relations such as sociometric ratings, or the frequency with which one psychology journal cites other journals, are likely to yield higher R values. The present paper investigates the distribution of R under several models, both continuous and discrete, which have been used to represent proximity data. In Section. I we show that if the points in T can be viewed as.a random sample from some “smooth” distribution in n-dimensional Euclidean space then R has, approximately, a geometric distribution with an expectation that cannot exceed 2. Hence, the Euclidean model generates an extremely high degree of reciprocity: more than 50 y0 of the points must be the closest neighbors of their closest neighbors, and the expected value of R ranges from 1.5, when n = 1, to 2, when n tends to infinity. The approximation is exact if the underlying distribution of the points in uniform. This remarkable property of R, that it is largely insensitive to the dimension of the space and to the shape of the underlying distribution, could perhaps be used as a diagnostic test for the Euclidean model. One of the major difficulties in testing this model is the improvement in fit caused by increasing the dimensionality of the space. The relative stability of R under changes of the dimension alleviates this difficulty. We also investigate in this section a (positive) random walk on the line, and we show that arbitrarily high average R values can be attained in Euclidean space for appropriately constructed (nonrandom) configurations of points. Section II investigates the distribution of R under several discrete models: homogeneous graphs, finite trees and the contrast model-assuming small continuous perturbations that break all ties. In a homogeneous graph, the same number of edges meet at every node, and the distance between points is the length of the shortest path that joins them. This model includes as special cases n-dimensional Cartesian grids, and the vertices of an n-dimensional cube. It also includes finite complete graphs in which every pair of points are linked by an edge. Since all rankings of the distances are equally likely in this case it can be regarded as purely random. Although a homogeneous graph differs greatly from the Euclidean model, the distributions of R in the two models are very similar. Reciprocity occurs with probability greater than g, and the expectation of R is less than 2. Entirely different distributions of R, however, are obtained in finite trees, where objects are represented as (terminal or nonterminal) nodes and the distance between nodes is the length of the path that connects them, and in the contrast model (Tversky, 1977) where objects are characterized as collections of features and the “distance” between objects decreases linearly with the number of their common features and increases linearly with the number of their distinctive features. The distributions of R in these models are highly sensitive, respectively, to the structure of the tree and to the
159
RECIPROCITY OF PROXIMITY RELATIONS
weight of the common features. These models include cases where the probability of reciprocity approaches zero and the expected value of R is no longer bounded. The interpretation and the implications of these results are discussed in Section III.
1. EUCLIDEAN V&PACES
We first evaluate the distribution of R for a Poisson point process of density h in ndimensional Euclidean space. Such a process is the limit of the processes obtained by choosing k points uniformly and independently in a cube of volume k/h, and letting k go to infinity, for fixed h. Alternatively, it is the only process for which the numbers of points in disjoint regions are independent, and have distributions depending only on the volumes of the regions. Fom the latter description it follows easily that the smallest sphere with a given center that contains a point of the process has a volume that is exponentially distributed with parameter X. The same is true for the conditional distribution, given that there is a process-point X at its center, of the volume of the smallest sphere S, that contains at least one more point Y. Since R(X) - 1 is the number of points that are closer to Y than X is, it is the number of points in the interior of S, - S, , where S, is the sphere with Y at its center, and X on its surface, see Fig. 1. Given that Vol (S,) = Vol (S,) = v equals, say, er, the distribution of R - 1 is therefore Poisson with parameter Xw(1 - CL,),where Al, is the proportion of S, that lies in S, . (Clearly 0~~depends only on the dimension: we leave its evaluation to a separate section below.) Hence, P(R(X) = Y 1 1’ = v) = e-ho(l-aJ(h~(l -
CX,))~-~/(Y
-
l)!,
and since the density function of XV is e-t, the unconditional probability P(R(X) : = r) is (1
. .aSl.52 -
(r-l)!
a e--t(2--a,)
%J’-’
f-1
sO
&
,
which, since the integral is (2 - CL,)-’ r(r), reduces simply to (1 - a,,)‘-‘/(2 - c+$‘. In other words, R has a geometric distribution with parameter qn = (1 - a,),(2 - OIL). In particular, E(R) = 2 - a, , and P(1) = P(R(X) = 1) = l/(2 - an).
Y
l
.
x
.
.
FIGURE
1
I60
SCHWARZ
AND
TVKRSKY
1 I-12 x
i--
Y t
cED
FICUR~ 2
The Evaluation of a, For the evaluation of (Y, , we may assume that the distance between X and Y is 1, and hence S1 and S, are unit spheres in R*. Denote their volumes by W, , and the volume of S1 n S, by K,, . For ?J< t < 1 the intersection of S, n S, with a hyperplane orthogonal to the segment (X, Y) at distance t from Y is an (n - 1)-dimensional ball of radius (1 - t2)l12, and hence of (n - l)-volume (1 - t2)(n-1)/2 W,,-, , see Fig. 2. By symmetry, t values less than 4 need not be considered, and we have
K, = 2W,-, I ’ (1 - t8)‘n-W/8 dt l/2 and therefore ff-n-
K,,/ W, = (2W,-,/W,)
I’
(1 - t2)‘“-“f2 dt.
l/2
Now, for tl 2 3 we shall express or, in terms of anme. First, integrating by parts, we obtain for the integral I,, in the expression for OL,the recursion formula n-1 I, = -
n
p12 - n-12-+30+1)/2
Second, the well-known formula w, = ++/f(l
+ n/2),
together with XI’(X) = r(x + l), yields another recursion formula, 2w?&-, -=--* W7l Multiplying
n 2w?z-3 n - 1 w,-,
the two recursion formulas, we obtain am = c$-2 - n-12-~n-1)3~n-1~12w,-,/w,
,
which is a simple recursion formula. To start off the recursion, the values CQ= + and cr, = $ - 1/5/2rr are obtained by elementary geometry.
RECIPROCITY
OF PROXIMITY
161
RELATIONS
In closed form, we now obtain the following expressions for 01, : %,+1 = ; -
5 (2k + 1)-f2-2k3kW&W~k+l k=l
and a.2,
=
23
2
-
-
f
(2k)-12-(2k-1)3k-1/2W2,JWZk
;
k=2
the second formula simplifies to
%m= ; - 1/3 2 k-12-=3k-1 W,,-,I W*, . k=l
We could express the W-ratios in terms of gamma functions, or binomial coefficients, but that would hardly make the formulas more explicit. For the purpose of calculation, a more useful result is obtained by combining the recursion formulas of the 01,,and of the W-ratios. Defining d, = n-12-(n-1)3+1)/2 W,-,/W,, , we obtain from the latter recursion formula d = 3(n - 2, d _ n 4(n-1) la2’ and the former simply becomes a!n=
"n-2
-
4,
with initial values d, = d/3/2 r, ol, = Q for even dimensions, and 6i = 4, 01~= 4 for odd dimensions. The values of E(A) and P( 1) for 1 < 71< 20 are displayed in Table 1. As n increases, 01, decreases, and tends to zero geometrically. Here is an outline of the proof: From the first formula for CL,,, we have
-
%+1
_
I n+1
%
-
- I,
WTk2 .
w,,w,+,
*
The first factor can be regarded as a weighted average of (1 - t2)lj2 with weight function (1 - t2)(n-1)/2 on [Q, I]. It is therefore bounded by max (1 - t2)21a= 2/g/2. Since the weight gets concentrated at t = + as n tends to co, the above bound is also the limit. The second factor can be seen to decrease to 1, by using known properties of the gamma function. For n = 3 it is 32/9rr, which is already less than 2/n. Combining these facts, we see that for n > 3, LY,decreases geometrically to zero. In fact, it decreases for all n, and (~,+Jo1, approaches d/3/2.
162
SCHWARZ
AND
TVRRSKY
TABLE The Odd
Values
of iy”,
n
1 3 5 7 9 11 13 15 17 19
E(R), and P(1) Even
=
P(R(X)
1 =
1) in EuclideanIn-Space
for n =
l,...,
20
n
0.5000 2
1.0000
0.5000 0.3911
1.5000 1.6089
0.6667 0.6215
4
0.5000
0.3125 0.2532
1.6875 1.7468
0.5926 0.5725
6
0.3000
0.2070 0.1705
1.7930 1.8295
0.5577 0.5465
8
0.1929
0.1411 0.1173
1.8589 1.8827
0.5379 0.5311
10
0.1286
0.0979 0.0819
1.9021 1.9181
0.5257 0.5213
12
0.0877
0.0687 0.0577
1.9313 1.9423
0.5177 0.5148
14
0.0607
0.0486 0.0410
1.9514 1.9590
0.5124 0.5104
16
0.0425
0.0346 0.0292
1.9654 1.9708
0.5088 0.5074
18
0.0300
0.0248 0.0210
1.9752 1.9780
0.5062 0.5056
20
0.0213
0.0178 0.0151
1.9822 1.9849
0.5045 0.5038
0.1875 0.1055 0.0659 0.0436 0.0292 0.0201 0.0140 0.0098 0.0070
Sampling from Smooth Densities Although the distribution of R was obtained for the Poisson process, the fact that it does not depend on the density constant h of the process implies that it is valid for a more general model. That model is the process that results if first X is chosen at random according to some distribution, and then, conditional on that choice, the points are chosen to form a Poisson process with that density. According to a straightforward adaptation of deFinetti’s (1937) ch aracterization of exchangeable processes, such “mixtures” of Poisson processes are exactly those point processes for which the numbers Ni of points in any finite collection of disjoint regions of equal area have joint distributions that remain the same if the Ni are permuted (see Freedman, 1963; Davidson, 1970). Independence can thus be relaxed. On the other hand, retaining independence, the assumption of a uniform density can be relaxed. If the points are not coming from a Poisson process, but are chosen independently from some n-dimensional distribution F, the results above are still approximately valid, provided there are enough points, and F is sufficiently smooth. Indeed, the diameter of S, u S, is three times the distance from X to its nearest neighbor Y, and if f, the density of F, is almost constant on regions of that diameter, the results above are almost exact.
RECIPROCITY
OF PROXIMITY
RELATIONS
163
If the total number of points is k, the number of points per unit volume is locally I$, and hence the diameter is mostly of order of magnitude (kf W,)-ljn, or, by Stirling’s formula ~;i(k.)-+“. Hence, for fixed n, if changes inf over distances of order &(kf)-ll” can be neglected compared to f itself, the derived expressions are valid approximations. Equivalently, if g = 1grad log f 1, then the approximations are good whenever gn n@ is negligible compared to the local point density h = kf in most of the space. For an improper F, and k = CO, such as in the Poisson case, kf is not defined, but X is still meaningful. The values of R for different points are not independent in general. For example, if R(X) = 1, then X is the point closest to the point closest to it, and R = 1 for that point as well. However, given that the circles S, , S, corresponding to one point are disjoint of the circles for another point, the R values for these points are independent. Consequently, it can be shown that for two distinct points drawn at random from a sample of k, the correlation coefficient pk of the R-values tends to zero as k approaches infinity. Since the average R-value, denoted R, satisfies Variance (R) = V(R) (1 $ (k - l)p,)/k, R is a consistent estimator of its expectation 2 - (Y,. By a similar argument, the empirical distribution function of the R-values almost surely approaches the above geometric distribution when k approaches infinity. Robustness:
Computational
Results
The robustness and the rate of convergence of the preceding result are investigated in several simulations. Each simulation consisted of a set of 20 samples of k points, generated according to the following processes. (a) Uniform distribution over the unit square, k = 20. (b) Uniform distribution over the unit cube, k = 30. (c) Uniform distribution over a ring-shaped region of the plane defined as the set of points (X, , X-,) satisfying 0.7 < XI2 + X,Z < 1, k = 20. (d) Standard bivariate normal, k = 20. (e) Standard trivariate normal, k = 30. The preceding five cases all consist of independent random samples from uniform or normal distributions in two or three dimensions. To explore the effects of non-independence, a 5 x 5 grid containing 25 squares was constructed, and the following two methods for selecting points were employed. (f) One point was selected from each of the 25 squares according to a uniform distribution. (g) Six squares were selected at random from the grid, and four points were selected within each of these squares according to uniform distribution. Note that (g) produces clusters or positive dependence among points, while (f) prevents clusters thereby producing negative dependence among the points. The average values of R and P(l), across samples, along with their standard deviations are presented in Table 2. The results indicate that the normal samples yield less reciprocity than either the uniform or the non-independent samples. The values of Ii: and P(l), h owever, are not very sensitive to variations in the nature of the underlying distribution, or to the presence of (positive or negative) dependence among the points.
164
‘SCHWARZ
ANDTVERSKY
TABLE Means
(m) and Standard
Deviations
2
(sd) of the distribution
of R and of P(l),
obthined
in a simulation
R
P(l)
k
m
sd
m
sd
20
1.73
0.25
0.65
0.11
30
1.75
0.25
0.62
0.08
20
1.58
0.30
0.65
0.16
Process a.
Uniform 0
square < 1
b.
Uniform o
cube
c.
<
Uniform ring 0.7 < x,2 + xs*
1
< 1
d.
Standard normal
bivariate
20
2.00
0.36
0.58
0.12
e.
Standard normal
trivariate
30
1.96
0.40
0.59
0.12
f.
Anticlustering square
25
1.47
0.14
0.72
0.10
g.
Clustering square
24
1.58
0.21
0.63
0.12
Positive Random Walks on the Line In the one-dimensional case, another generalization of the Poisson model is the (onesided) random walk model. Here the points are no longer an independent sample from an underlying distribution F, but the distances between neighbors are a sample from a distribution Fl on the positive half line. The Poisson process is included in this model, and corresponds to exponential Fl . To avoid ties, only continuous distributions are considered. Let (X,) be independent positive random variables with common continuous distributions F1 for 11= f 1, f2,...; the point process to be considered consists of the pointsxero,xyX,,and-x:X-,form = 1,2 ,.... Obviously, R has the same distribution for all points, so we restrict our attention to R(0). Also, by symmetry, R is independent of the event {XI < X-r}, and the distribution of R remains unchanged if we condition on this event. The distribution of X, , however, changes to that of the minimum of two independent steps. The event (R > n} is clearly the same as {XI > Xa + a** + Xn+I}, and therefore P(R > n) = P(YI + *.. + Y, < Min (Y,+l , Yn+a)), where y, > yz ,.** are independent and have distribution Fl . By conditioning on the value of the sum, we obtain P(R > n) = l(l - Fl(t))a dF(n)(t), where Ffn) denotes the n-fold convolution of Fl . Explicit evaluation is not possible except in special cases. There are, however, bounds available for the distribution of R. First note that P(R > 1) = P( YI < Min( Us , Ys)) = + for all Fl , or equivalently, P(R = 1) = 8. Next, if Fl
RECIPROCITY
OF PROXIMITY
165
RELATIONS
takes its values in an interval [a, b] with u < b < 2a, then Yi + Yz will certainly exceed Min(Y, , Y4), and R cannot exceed 2. So in this case R is 1 or 2 with probability Z$and f, respectively. This stochastic lower bound for R yields E(R) = 4. For an upper bound, the following combinatorial argument was suggested by Shimon Friedman: Obviously the (“i”) events obtained from A, = {Yi + ... + Y,, < Min (Y,+l, Yn+2)) by permuting the indices are equiprobable. Since A, C (Y,,, and Y,,, are the smallest among Yi ,..., Yn+J, the A, are disjoint. Hence, P(R > n) = P(A,) < (“:“)-I. Thus a random variable that takes on the value tt with probability (ni1)-1 - (“i2)-l = 2/n(n + 1) 2/(n-t 2>(n+ 1) = 4/n@ + l)(n + 2) is a stochastic upper bound for R. Its expected value is 2, and it has infinite variance. Since the inclusion of events above is always proper, equality to the bound is never attained. We shall see, however, that the bound is sharp, since it is approached as Fi ranges over all distributions of the form F,(t) = tA for O
by induction on II and a standard integration. From here, one easily obtains
P,(R > n) = ($+;;;
(" ;')-l.
These probabilities are decreasing in A, since
= -$%wrw ++1))” 1) n&ogr(h+
l)-;log~(?zA+
1))
which is negative since log r is convex. For X + 0, the stochastic upper bound is approached. For X --+ co, h can be restricted to integer values, and for such values P,(R>2)= S(“,“)-l, which approaches zero as h -+ co. Since P,,(R> 1) is always f, the lower stochastic bound P(R = 1) = 3, P(R = 2) = -$is th us approached by R. We saw earlier that for many Fl this bound is actually attained. If we compare the distributions of R in Poisson processes of different dimensions with those in random walks with different step distributions, we see that the latter also have expectations bounded by 2, but their variances can be arbitrarily large, while in the multidimensional Poisson case the variances never exceed 2 either. Finally, it is important to note that all the results established in this section assume a random process of one type or another. Indeed, it is possible to construct nonrandom sequences of points on the line that attain any fixed R value throughout. Consider the
166
SCHWARZ
AND
TVERSKY
sequence {X,}, - 00 < n < CO,where X, - X,-i forward calculation shows that if q > 1 and - 1 < log &q/logq
Y
for an integer
Y
= 4” for some fixed q. A straight-
> 2, then R(X,)
=
Y
for all 11.Since, for 1 < q < 2 the function
takes on all real values larger than 1, any finite Y > 2 is attained in this way. To complete the picture, note that for q 3 2, R(X,) = co, and for {X,} being the sequence of all integers not divisable by 3, R(X,J = 1. Such sequences, however, are “hand-picked,” and they are not likely to arise naturally in a random process. The interpretation of the assumption of a random process is discussed in Section III.
II.
DISCRETE
MODELS
In this section we investigate several discrete models: homogeneous graphs, finite trees and the contrast model. The following combinatorial result is used to calculate the distribution of R under these models. Let A and B be$nite disjoint sets of N > 0 and K 3 1 elements, respectively. LEMMA. Rank the elements of A v B randomly, so that all (N + K)! rankings are equally probable. Then, if R is the lowest among the ranks assigned to elements of B,
and E(R) =(N+K+
l)/(K+
1).
Proof. The set B has equal probabilities of being assigned any one of the (“2”) subsets of cardinality K of the set (1, 2,..., N + K). Th e event (R = Y} occurs whenever this set contains Y, and its other K - 1 elements are chosen from {Y + I,..., N + K}. This can be done in (“i?i’) ways. To evaluate the expectation of R, use the well-known property of Pascal’s triangular array (‘:I) = (i) + (&) to establish
~;(“t”,i)
= y-tKgl
-‘),
RECIPROCITY
OF PROXIMITY
RELATIONS
167
which, when applied twice, yields N+l
N+l
i
E(R) = zl ;/‘(R=i)=
=“;:;I.
Q.E.D.
Homogeneous Graphs A graph is homogeneousif the same number m 3 2 of edges meet at every node. This family includes, as special cases, the vertices of the n-dimensional cube (m = n), the n-dimensional Cartesian grid and its finite analogues on the n-torus (m = 2n), and infinite trees in which each edge splits into m - 1 branches. The length of each edge equals 1 plus a small continuous random perturbation, and the perturbations of different edges are independent and identically distributed. The distance between points is defined as the length of the shortest path that connects them. The perturbations serve to break all ties while satisfying the condition that any single edge is shorter than the sum of two edges. Let X be any node, and x’ its closest neighbor. If B is the set of edges meeting at X, and A the set of edges meeting at X’, except for the edge (X, xl), then R(X) coincides with R as defined in the lemma, which applies with N = m - 1 and K = m, and yields
P(R(X) = r) =
(2m-r-lm-lm (’
1
m(m - 1) .. . (m - r + 1) = (2m - 1)(2m - 2) . . . (2m - r)
m-7
for Y = 1, Z,..., m, and E(R) = 2m/(m + 1). For reciprocity, we have P(1) = P(R(X) = 1) = m/(2m - 1). Although R is bounded in this model, its distribution does not differ greatly from the geometric distribution derived for the Poisson model: reciprocity still occurs with probability at least 4, and E(R) is less than 2. In fact, the models share the same limiting distribution as n and m approach infinity, namely, the geometric distribution with q = $ and E(R) = 2. This limiting distribution is also the distribution of R attained, when the number of points grows large, in the purely random model, where all differences between distances are due to error. Note that the purely random model with k points is a complete, and hence homogeneous, graph (m = k - 1). Finite Trees We study next the reciprocity relation in finite trees. As in the previous section we assume that the distance between any two points is the sum of the perturbed lengths of the edges that connect them. There are two ways in which trees are used to represent proximity data. In the more common terminal model only the terminal nodes of the tree are associated with the objects of study, while the full model associates an object with each node of the tree (see, e.g., Carroll, 1976).
168
SCHWAFU
AND
TVERSKY
In principle, the distribution of R in any tree model can be obtained by applying the combinatorial lemma separately to different classes of nodes. We first study the uniform full tree model where each branch splits into the same number b > 2 of branches until after h such b-furcations the terminal nodes are reached. The distribution of R in this model is highly sensitive to 6, but it rapidly approaches a limiting distribution as h increases. We therefore evaluate E(R) and P(1) for this limiting distribution. For the terminal nodes, that constitute (b - 1)/b of all nodes, R is uniformly distributed on {I,..., b + l}, and hence P(1) = l/(b + 1) and E(R) = (b + 2)/2. The nodes one edge away from the terminal nodes constitute (b - l)[b* of the total. With probability b/(b + 1) any one of these subterminal nodes has a terminal node as its nearest neighbor. In this case, R is identically one. With the remaining probability of l/(6 + l), the distribution of R for the subterminal nodes is the same as for the remaining nodes, that constitute l/b* of all nodes. Here the results for homogeneous graphs apply with b = m - 1, and yield P(1) = (b + I)@ + 1) and E(R) = 2(b + l)/(b + 2). Adding up the different values of E(R) and P(1) with the corresponding weights yields
‘(‘)
= (2b + ;b
E(R) =
+ 1) ’
bS+4ba+5b+6 2(b + l)(b + 2)
’
As h increases, these limits are approached quite rapidly. For example, in the binary case b = 2, the limit of E(R) is #, and the exact formula is E(R) = 4 + 2/9(2h+1 - 1) for h 2 3. This expression is obtained by treating separately the terminal nodes, the subterminal nodes, the sole “root” and its two neighbors. The limiting values of P(1) and E(R) for several values of b are displayed in Table 3. Note that, as the number of branches b increases, P( 1) tends to zero and E(R) is no longer bounded, in sharp contrast to the previous models. Next, we turn to the study of the terminal tree model. To simplify the analysis, we assume that the terminal nodes occur in terminal clusters of cardinality q 2 2; a terminal cluster consists of all terminal nodes that are directly linked to the same node. Since the present analysis assumes only that the closest neighbor of any terminal node is found in the same terminal cluster, the results apply also to any other tree model, such as the hierarchical clustering scheme (Johnson, 1967) and the additive similarity tree (Sattath & Tversky, 1977) whenever this assumption is satisfied. The distribution of R inside a terminal cluster depends only on its cardinality q, and is given by P(R = r j q) = 2/q = UC7 = 0
andE(R I q) = (q - I)/2 + 1/q.
for
r=l
forl
RECIPROCITY
OF
PROXIMITY
169
RELATIONS
The distribution over all terminal nodes is easily expressed in terms of the distribution of cluster size. Thus, if p(q) is the fraction of clusters of size q among all clusters, and hence qr(q)/ l&(q) is the fraction of terminal nodes in such clusters, we find P(1) = c qn(q)P(P = 1 I q) / c 444) * (I = 2 c a(q) / 1 qr(q) = 2 /C q+d = P Q I
2iQt
where Q is the size of the average cluster. For T > 2, we obtain P(R =
Y)
= 1 a(q),‘Q. g>r
These formulas can also be used to obtain the m(q) when the distribution of R is known, yielding 42) = (P(1) - 2P(2))/P(l) and for q > 3 v(q) = 2(P(q - 1) P(q))/P(l). Consequently, the distributions of R attainable by the terminal tree model are exactly those which satisfy P(l)/2 > P(2) 3 P(3) > . . . . The expectation of R is given by E(R) = 1/2(&z) + P(1) -
1) = (E(q) -
1)/2 + l/Q,
where E(q) is the expected size of the cluster of a randomly chosen terminal node, not to be confused with the size Q of the average cluster. Some values of P(1) and E(R), for the conditional distributions of R given the cluster size q, are given in Table 3. These values can also be interpreted as the unconditional P(1) and E(R) for terminal trees in which all terminal clusters have the same number q of nodes. In a tree with uweruge cluster size Q = q, P(1) is the same as in a model with all clusters of size q, while E(q), and hence E(R), increases for fixed Q with the variance of the cluster size. Note that in the terminal tree model, as in the full tree model, P(1) can vary from near zero to unity depending on the structure of the tree. Maximal reciprocity is achieved in a binary terminal tree, where P(1) = E(R) = 1. For a fixed number K of terminal nodes, reciprocity is minimal when the tree consists of a single terminal cluster. In this case, P( 1) = 2/K, P(i) = l/K for i = 2 ,..., n - 1, and E(R) = (k - 1)/2 + l/k. This distribution is actually (as Micha A. Perles convinced us) a stochastic upper bound for R in a finite metric space of K elements: there are in every such model, for each 1 < j < K - 1, at least j + 1 points with R < j. Proof. List all K(k - 1)/2 pairs of points in increasing order of their distances. Then label the points like dramatis personae: let X(i) be the ith point to make its first appearance in the listing of the pairs; when two points make their entrance in the same pair, choose one arbitrarily to precede the other one. Each point X(i) appears in the list first in a pair with its closest neighbor, say Y(i). If this is also the first appearance
170
scHwAru
AND
TALKY
of Y(i), then R(X(i)) = 1. Otherwise, Y(i) = X(m) for some 1 < m < i - 1, and Y(i) has appeared earlier, with partners among X(l),..., X(i - l), excluding X(m) itself. Since there are i - 2 such potential partners, R(X(i)) < i - 1 in that case. Hence, R(X(i)) < Max(l, i - I), and X(l),..., X(j + 1) all have R 0, 8 > 0. Note that this is not a metric. The contrast model is usually expressed in terms of the weights associated with the features rather than in terms of their number. This difference, however, is not essential since it is possible to replace a highly salient feature by an appropriate number of equally weighted “subfeatures.” Throughout this section we assume that all distances between objects are perturbed at random so that all ties are broken, but no strict inequalities are reversed. These perturbations can be interpreted as small independent errors in the measurement of distance. We analyze here two cases: (i) T consists of all (3 subsets of cardinality k of F, , 1 < K < n; (ii) T consists of all 2” subsets ofF, . Note that in case’(i), 2C + D = 2K, hence we can ignore the common features and express the distance between objects only in terms of their distinctive features. Furthermore, we can represent the objects as the nodes of a homogeneous graph, where each object is connected just to the K(n - k) objects that are obtained from it by a substitution of one feature. The distribution of R in this case is the same as for a homogeneous graph with m = K(n - A), hence
‘(‘I
k(n - k) = 2k(n - k) - 1
and
2k(n - k) E(R) = k(n - k) + 1 .
In case (ii), where T consists of all subsets of F, , the distribution of R depends on the parameters OLand 6. For 0 = 0 the contrast model reduces to an n-cube, which is a homogeneous graph with m = n, hence P(1) = n/(2n - 1) and E(R) = 2n/(n + 1). For B > 0, the closest neighbor of a subset is one of its extensions by a single element, since other subsets have fewer features in common and/or more distinctive features. Consequently, R(X) = 1 only for F, and for one of its 7t reductions by a single element and P(1) = 2/2”. Unlike P(l), which does not depend on 01and 19provided only that 0 is positive, E(R) is a non-decreasing function of @3. We study the behavior of R under two conditions: 0 < e/a < 1 and t+ > n - 2. If 0 < e/a < 1, any extension of a set by two or more elements is at a greater distance
RECIPROCITY
OF PROXIMITY
171
RELATIONS
from that set than any of its reductions by a single element. Let X be a subset of F,, with cardinality 0 < k < 12- 1. Let B(X) d enote the set of all extensions of X by a single element, hence the closest neighbor of X is some Y in B(X). Any extension of Y is closer to Y than X is, and there are n - k - 1 such extensions. The only other sets that can be closer to Y than X is are reductions of Y by a single element, excluding X itself. Denote the set of these reductions by A(X). Let S(X) be th e number of sets in A(X) that are indeed closer to Y than X is. Clearly R(X) = n - R + S(X). Now apply the lemma, with A = A(X), B = B(X), N = k and K = n - k. Since S(X) + 1 coincides with R of the lemma, we obtain
P(R(X) = n -
n--t-l ( k + t 1k) = ’ -,”
i - ’ .
0k Since k is binomial (n, s), the joint distribution denoted c(X), is
of R(X) and the cardinality of X,
Also, we have P(c(X) = 11and R(X) = 1) = 2-“. Substituting have t = Y - n + k, and hence, for 1 < 12- k < r < 12, P(c(X) = k and R(X) = The distribution
Y)
= 2-” (‘“r!L:
Y
for n - k + t, we
r ‘) .
of R is therefore given by P(R =
Y)
= 2-” k;;‘v (2n ,--” ;:
; ‘) = 2+ (y ” ,)
for 2 < Y < 71,while for Y = 1, P(R = 1) = 2- n+l. This distribution can be expressed via a binomial (n, 4) variable T by letting R = T + 1 when 0 < T < n - 1 and R = 1 when T = n. From this representation we obtain E(R) = n/2 - n/2” + 1. Next, we turn to the case e/a > n - 2, where any subset of F, is closer to all its extensions than to any of its reductions. The evaluation of E(R) is essentially the same, except that n - k - 1, the number of extensions of a set of k + 1 elements by a single element is replaced by 2n--k-1 - 1, which is the number of all its (proper) extensions. Thus, we have P(c(X) = k and R(X) = 2n-k-’ + t) = 2+ (z 1: for 0 < k Q n - 1, and P(c(X) = 1zand R(X) = 1) = 2-“. &O/22/3-2
1 :)
TABLE Values
of E(R)
and P( 1) for
3 Different
Model
Discrete
E(R)
Homogeneous
P(l)
graph
n-Dimensional
cube
2 -
2/(n
+
n=l ?I=2 n=3 n=4 grid
2 - 2/m
random:
k points
trees
Full
l/2
+
1/(4n
model,
limit
2
-
2)
1.00 0.67 0.60 0.57
+ 1) 1.33 1.60 1.71 1.78
l/2
- 2/k
l/2
n=l n=2 n=3 n=4
Finite
1)
1.00 1.33 1.50 1.60
n-Dimensional
Purely
Models
+ 1/(8n
-
2)
0.67 0.57 0.55 0.53
b8 + 4bp + 5b + 6 2(b + Mb 4 2)
+
1/(4k
-
6)
+
1)
.4b (2b +
l)(b
(uniform)
b=2 b=3 b=4 b=5 b=6 Terminal
model
(constant
(4 -
I)/2
All
0.53 0.43 0.36 0.30 0.26
l/q
2/q
1.00 1.33 1.75 2.20 2.67
1.00 0.67 0.50 0.40 0.33
+
9)
q=2 q=3 q=4 q=5 q=6 Contrast
1.67 2.10 2.57 3.05 3.54
Model subsets
of size k
2 -
2/M
-
k) +
1)
l/2
+
1/(4k(n
-
k) - 2)
All 2” subsets e=o
2 - 2/(n + 1) n/2 - n/2” + 1
(Y>e>o n=2 n=3 n=4 n=5 n=6 8 > a(n n=2 n=3 n=4 n=5 n=6
1.50 2.13 2.75 3.34 3.91 -
2)
(3” + 2”+’
-
2n -
1)/2”+’ 1.50 2.25 3.25 4.63 6.59
l/2
+
1/(4n
-
2)
21-n 0.50 0.25 0.13 0.06 0.03
21-n 0.50 0.25 0.13 0.06 0.03
173
RRCIPROCITY OF PROXIMITY RELATIONS
In this case there is no convenient formula for the marginal distribution of R but individual values can be computed from the joint distribution. To evaluate E(R), we firstdefmeR*by:R*=Rifk
1 by the lemma and for k = n by definition. Consequently, E(R*) = 2-” i
(;)(2n-k-1
+
n -“K + 1 )
k-0
=
112
Ii
(3
2’
+
2”
il
(3
n
_
k”
+
1
k=O
= l/2 (1 + l/2)” + 2-n k1 (1) k=O
= 3-2-n-r + 2-*(2* - 1). Finally, since R = R* when k < n - 1 and R = R* - n + Gjwhen k = n, we have E(R) = E(R*) + 2+(1/2 - n) = 3” ’ ‘“;1-
2n - ’ .
For larger n the approximation E(R) = 3’22-n-1 + 1 may be used. The preceding results show that the contrast model can generate widely different distributions of R depending on the structure of T, the value of 91~~and the cardinality of F, . When the common features can be ignored, either by setting 8 = 0 or by considering subsets with an equal number of features, P(1) > l/2 and E(R) < 2. When 0 > 0 and all subsets of F, are considered, however, P( 1) = 2/2” and E(R) is a non-decreasing function of e/a which increases steadily with n. Table 3 presents an overview of the results obtained for the various discrete models, along with numerical examples.
III. DISCUSSION AND SUMMARY The present paper investigates the reciprocity and the R-statistic associated with conditional proximity relations generated by several models. If the points can be viewed as a random sample from some “smooth” distribution in an n-dimensional Euclidean space, then the data must exhibit a very high degree of reciprocity. In particular, the probability of reciprocity exceeds 4, and R has a geometric distribution whose expectation is smaller than 2, irrespective of the dimensional&y of the space and the shape of the underlying distribution. This result is precise if the distribution is uniform, and it is asymptotic for other smooth distributions. Simulation indicates that the above result
174
SCHWARZAND
TVERSKY
is not very sensitive to the nature of the underlying distribution and to the presence of positive or negative correlations between the density of points in adjacent regions. Robustness aside, how shall we interpret the distributional assumption? Under what conditions is it reasonable to treat the points as a random sample from some multivariate distribution? There are several ways in which this assumption may be justified, First, sampling may refer to the selection of objects. An experimenter may select for study a random sample of color patches, random shapes or words, drawn from the respective populations of objects. Second, even if the points were not randomly selected by the experimenter, the entities under study may be viewed as a result of a random selection process. Consider, for example, the set of English words that describe affective states (e.g., anger, boredom, joy) and assume that these states can be represented as points in a Euclidean space (see, e.g., Schlosberg, 1954). The process of constructing a set of verbal labels to describe points in a continuous emotion space is likely to include a random component in the selection of the states as the prototypical carriers of “joy,” “fear” or “rage”. Indeed, different cultures and languages evolved slightly different labeling systems for what is presumably the same underlying affective space, and these differences contribute to the difficulty of translating emotion-terms from one language to another. The points in the space that correspond to emotion-terms in the English language, therefore, may be viewed as a sample from a population of possible states. Third, the random element may be introduced not in the selection of the objects but rather in the perception or the response produced by them. Consider, for example, the responses of an individual to the different Rorschach cards. Here the objects (i.e., the 10 cards) are fixed but the response to any one card may vary depending on the mood of the respondent or on the order of the cards. The points in the response space, therefore, may be viewed as a random sample generated according to some process determined jointly by the respondent and the Rorschach cards. The plausibility of the distributional assumption should, of course, be examined separately in each situation. The present discussion merely suggests that it might be applicable even when no explicit process of object selection appears to take place. Section II characterizes the distribution of R in several discrete models that differ markedly from the continuous Euclidean spaces of Section I. Furthermore, the assumption of sampling points from some underlying distribution is replaced by the introduction of small continuous perturbations which break all ties. Nevertheless, the distribution of R in a homogeneous graph (where the same number of edges meet at every node) is very similar to its distribution under the Euclidean model. In both cases, P(1) > 4 and E(R) < 2. Moreover, the models yield the same upper stochastic bound for R: the geometric distribution with q = $. Unlike the Euclidean model and the homogeneous graph which generate a high degree of reciprocity regardless of the number of points and the dimensionality of the space, the contrast model and the finite tree models give rise to widely different values of P(1) and E(R). In the binary terminal tree model, for example, R can be identically 1, while the expected R for a tree consisting of a single terminal cluster with q terminal nodes is approximately q/2. Similarly, the expected reciprocity in the contrast model is highly sensitive to the structure of T and the weight of the common features. If the common
RECIPROCITY
OF PtiOXIMITY
RELATIONS
175
features are ignored, the contrast model reduces to a homogeneous graph which yields high reciprocity, but if the common features are taken into account, then P( 1) approaches 0 and E(R) increases indefinitely as the number of features becomes large. Although we have analyzed here only a limited number of models, the approach used in this paper can be readily applied to evaluate the distribution of R in other models of interest, or to obtain bounds for its expectation. The results of the present paper suggest that reciprocity may serve as a useful1 diagnostic property because different proximity models produce drastically different distributions of R. In particular, very high R values may indicate (i) a specially patterned (nonrandom) configuration in Euclidean space, or (ii) a tree with large (terminal) clusters, or (iii) a contrast model in which the common features cannot be neglected. The application of these results to empirical data is discussed in a forthcoming paper.
ACKNOWLEDGMENT We are grateful
to Wes
Hutchinson
for
many
valuable
comments.
REFERENCES CARROLL, J. D. Spatial, non-spatial and hybrid models for scaling. Psychonre~rika, 1976,41,439-463. DAVIDSON, R. Exchangeable point processes. In E. Harding & D. Kendall (Eds.), Stochasticgeotnetty. New York: Wiley, 1974. DE FINETTI, B. Foresight: Its logical laws, its subjective sources. Annales de Z’lnstitut Henri PoincarC. 1937, 7. Also appears in II. E. Kyburg, Jr. & H. E. Smokler (Eds.), Studies in subjectiweprobubi&ty. New York: Wiley, 1964. FREEDMAN, D. Invariants under mixing which generalize de Finetti’s theorem: Continuous time parameter. Annals of Mathematical Statistics, 1963, 34, 1194-1216. JOHNSON, S. C. Hierarchical clustering schemes. Psychometrika, 1967, 32, 241-254. SATTATH, S., AND -KY, A. Additive similarity trees. Psychometrika, 1977, 42, 319-345. SCHLOSBERC, H. Three dimensions of emotions. Psychological Rewiew, 1954, 61, 81-88. TWBKY, A. Features of similarity. Psychological Review, 1977, 84, 327-352. RECEIVED:
January 25, 1980