November 1982
Statistics & Probability Letters ! (1982) 61-67 North-Holland Publishing Company
Hierarchical Classification of Mathematical Structures Christophe Perruchet Centre National d'Etudes des T~l~communications PAA /,4 TR / MT1, 92131 lssy -les - Moulineaux, France
Received July 1982; revised version received August 1982
Abstract. The application area of the methods of cluster analysis has largely developed during the last fifteen years. Nevertheless, the applications of duster analysis were always made on concrete data resulting from experiments, observations, or more simply, from simulations. This paper presents an application of hierarchical clustering to exact mathematical structures. On the one hand the set of formal series defined on a field, on the other hand the set of integers fit with the p-adic distance. In addition to a purely technical aspect, the aim of this work is also to show how, and in what way, cluster analysis enables us to understand the structure of a strictly organized data set. Keywords. Classification, mathematical structures, formal series, p-adic distance.
1. Introduction The application area of the methods of cluster analysis has largely developed during the last fifteen years and concerns zoology as well as telecommunications, medicine as well as geochemistry, social psychology as well as informatics. During the same time a mathematical theory of cluster analysis has been developed (Benzecri, 1973; Diday, 1979; Jambu, 1978; Lerman, 1981; and others), now used a tool for research. This theory is presently always developing, by itself, and by its borrowing from statistics, graph theory, combinatorics, etc. Nevertheless, the applications of duster analysis w e r e a l w a y s made on concrete data resulting from experiments, observations, or more simply, from simulations. We present here an application of hierarchical clustering to exact mathematical structures. On the one hand the set of formal series defined on a field, on the other hand the set of integers fit with the p-adic distance. In addition to a purely technical aspect, the aim
of this work is also to show how, and in what way, cluster analysis enables us to understand the structure of a strictly organized data set.
2. Indexed hierarchy We recall here sufficient theory to understand this paper. The reader will find a detailed treatment of the subject in Benzecri (1973) and Jambu (1978).
2.1. Hierarchy of subsets Let E be a set; we define a hierarchy of subsets of E to be a set % of nonempty subsets of E satisfying - the intersection axiom,
v { A , B ) c %: A n B
{A, B, O),
- the union axiom,
VA e.%: U (BIBe~}C,B~A, BcA)e{A,I~}, i.e., any element of %, non-minimal for inclusion, is the union of elements distinct from it, included in it.
0167-7152/82/0000-0000/$02.75 © 1982 North-Holland
61
Volume 1, Number 2
STATISTICS & PROBABILITYLETFERS
November 1982
Vx~e:,,((x))=o,
The hierarchy is total if E~C
v ( E ) = 1,
and, V x ~ E , (x)~ 0(~. The set of the successors of A (A ~ ~ ) is defined as
is called an indexed hierarchy. v is called the level index or diameter index of the hierarchy. Graphically the level index is represented as an axis associated with the tree. For example, if we take again the hierarchy illustrating Section 2.1, and if we define v by
S u ( A ) = (B ~ %~ B c A, B * A).
The set of the immediate successors of A (A ) is defined as
Sui(A)=(B~Su(A)IV
C E Su(A):
s n c ~ (0, c ) ) . Every hierarchy can be represented by a tree; for example, the hierarchy that is defined on E = (1, 2, 3, 4, 5) by
v(
and v((4, 5 ) ) = 2±, we obtain the following indexed tree: is
1
% = ({1), (2), {3), (4), (5), (1, 2, 3), (4, 5), E} 2
is represented by the tree E
±_ 4
1
1
2
3
4
2
3
4
5
5 3. H i e r a r c h y o f s e r i e s
This is a total hierarchy and we have S u ( E ) = ((1, 2, 3), (4, 5), (1), (2), (3), (4), (5)), S u i ( E ) = ((1, 2, 3), (4, 5)),
3.1. The ultrametric Consider the set of formal series defined on a field K; we denote this set by K [[ X]]. A member of K[[X]] is S, defined by
Su((4, 5)) = Sui((4, 5)) = ((4), (5)), V x ~ E: Su((x)) = Sui((x)) = O.
V x~K: S(x)=E(a.x"ln~N Finally, we have to recall the following proposition which we will have to use: A set ~)(.of non-empty subsets of a set E is a total hierarchy on E iff satisfies (i) the intersection axiom, and
(ii)
E~OC; V x ~ E :
(x)~JC.
}
where (a.) is a sequence of members of K. We want to determine the hierarchy involved in K [[ X]] by a certain ultrametric distance. For this, we call 'order of the series S ' the rank of its first nonzero term,
O(S)=inf(nln~M,a, *0). 2.2. lndexed hierarchy A total hierarchy on a set E fit with a function p of ~C in R + and satisfying
V(A,B)c%:(AcB, 62
A*B)
=
v(A)
Proposition 3.1. Let d be defined by
v s ~ r [ [ x]], v s' ~ r [ [ x]] - ( s ) :
d(S,S')=~i O ( S - s')
Volume l, Number 2
STATISTICS & PROBABILITYLETTERS
and V
S~K[[X]]" Then d is an
d(S,S)=O.
ultrametric distance on K[[X]].
Proof. For every triplet (S, S', S") in K[[X]] we have
d ( S , S) = 0
Benzecri (1973)), is presented here for the case of a set of formal series fit with a certain ultrametric distance. We have to prove that
V(B,B'}c°~:BAB'=O
or B c B '
or B ' c B ,
(1) K[[X]]e~
by definition;
November 1982
and
VS~K[[X]]'(S)~,
(2)
d(S,S')=d(S',S) forO(S-S')=O(S'-S).
ray is strictly increasing on ~ .
Finally, if S and T have as general term a,, and b, respectively, S + T has as general term a M+ b, in K[[X]]. Therefore, we have
Ad (1). Since d is ultrametric, if S ~ B N B', then S can be chosen as center of B and B'. Now, in any metric space, in case of two concentric disks one is necessarily included in the other. Ad (2). We have, for any S, that S - - B ( S , 0), and K[[X]] is a closed disk with radius one and any center for
O ( S + T)>~ m i n ( O ( S ) , O(T)}, whence, writing
ray( K[[ X]] ) = sup(d(S, S')I (S, S') c K[[ X]] )
S-S'=S-S"+S"-S',
~..~. ½ i n f O ( S -
we get O ( S - S') > / m i n ( O ( S - S"), O ( S ' -
(3)
S")},
which is equivalent to
i0
=3
=1,
Ad (3). We have
V (B,B')cK[[X]]"
(B*B',BCB')~ = r a y ( B ) < ray(B'),
d ( S , S') < m a x ( d ( S , S"), d ( S ' , S")). Thus, d is an ultrametric distance on K[[X]].
S')
for, if S" ~ B' - B, then, for every S ~ B, ray(B') > / d ( S " , S ) > r a y ( B ) .
3.2. The indexed hierarchy 3.2.1. Definition Since d is ultrametric, any point of a disk in K[[X]] can be chosen for center of that disk, the radius of a disk equals its diameter and (K[[ X]], d ) is separated. Denote by B(S, r) the closed disk with center S and radius r in (K[[X]], d), r a y ( B ) = d i a m ( B ) = sup(d(S, S') I (S, S') c B) is the radius ('rayon' in french) or diameter of a disk B, and ~ the set of dosed disks in (K[[X]],
d). Proposition 3.2. ~ is an indexed hierarchy on K [[ X]] whose level index is the function ray. Proof. The proof of this proposition, which can be made in a very general case (equivalence between indexed hierarchy and ultrarnetric distance, el.
Finally we have to notice that the values of the level index of the hierarchy ~ are
1
2,
"'',
....
3.2.2. Description The hierarchy defined in that manner can be described in an agglomerative or divisive fashion. In the divisive fashion, the set Sui(K[[X]]) of the immediate successors of K[[X]] is defined at level ½ as the set of the clusters of series having their zero rank term constant. If the coefficients of the series have values in a field K, then the number of these clusters is IK I. We have V B ~ S u i ( r [ [ X 1] ): d i a m ( B ) = sup(d(S, S') I (S, S') c B) = 2-inf(O(S. S')KS. S')cB) = 2 - 1 _- - 2x" 63
Volume 1, Number 2
STATISTICS& PROBABILITY LE'I'TERS
Further, V (B, B') c Sui(K[[ X]] ): • d ( B , B') = inf(d(S, S ' ) I S ~ B, S' ~ B') = 2-sup(O(S-S')~S~B.S'EB')
=20=
1.
Because, for every S and S' members of B and B' respectively, S and S' are different from their zero rank term onwards, whence O ( S - S') = 0 = s u p ( O ( S - S') I S ~ B , S' ~ S ' ) . Each o n e of these clusters is then divided in [K[ clusters having level ¼. Each one of these clusters is a set of series equal among themselves up to their rank one term. In a general fashion, every cluster having level (½)" is divided into IK[ clusters having level (½)"+ l, which are the clusters of series equal a m o n g themselves up to their rank n term. Let B be such that d i a m ( B ) = (½)". T h e n
v (s', s") c Sui(S): d i a m ( B ' ) = s u p ( d ( S , S') I (S, S') c B) =
2-
inf(O(S-- S ' ) I(S,S') c B)
{1] n+l
d(B',B")=inf(d(S',S")lS' if B' = B" = 2-sup(O(S'-S")IS'~B'.S"~B")
= (½)".
K[[X]]
2
!-4
8
Fig. I. 64
November 1982
For, if S' and S" are members of B' and B" respectively, then members of Sui(B), S' and S " are members of B and are equal up to their rank n - 1 term. In the agglomerative fashion it is obvious that, K[[X]] being infinite, the beginning of the aggregations cannot be described. We could write that we aggregate the distinct series "equal at the infinite". In general, to go from the level (½)" to the level (½)'-~ we have to aggregate the clusters having level (½)" containing the series equal up to their rank n - 2 term. 3.3. E x a m p &
Let us choose the finite field K = (0, 1), fit with +
0
0 0
1 1,
0
1
1
0
1
I
0" 1
If we denote by a, the rank n term of the series S, the tree associated with the hierarchy is a binary tree (i.e., every node has two immediate successors) such that at each level several nodes or clusters, here 2", are defined. This tree can be represented as shown in Fig. 1.
Volume 1, Number 2
STATISTICS & PROBABILITYLE'VI'ERS
November 1982
V (n, m , s ) c Z:
4. Hierarchy of integers
Oe(n-- m ) = Op(n-- s + s-- m )
4.1. The ultrametric
>~ inf(O.(n -- s), Op(s- m)}, Let p be a prime number ( p ~ N), n an integer (n ~ Z). We denote by Op(n) the exponent o f p in the factorisation in prime numbers of n. We have the following properties:
if n - s + s - m = O, i.e., n = m, whence, max(-Op(n-s),
- O p ( s - m)) >1 - O . ( n -
m)
¢0 max(2 - °.('-s), 2 - ° . ( s - ' ) ) >~ d(n, m)
V(n, m) c Z
(since 2 x is increasing)
(i) (it)
Op(n)=Op(-n),
(iii)
Op(nm)=Op(n)+Op(m),
max(d(n,s), d(s, m)) >1d(n, m). This last relation being obvious for n = m.
4.2. The indexed hierarchy and at last, if we put
then
The equivalence between ultrametric structure and indexed hierarchy (for a proof, see Benzecri (1973)), allows us to write that ~ , the set of disks in (Z, d ) (closed or not since d is ultrametric), is a hierarchy on l whose level index is the function ray (see Section 3.2.1). Taking into account that the values of the level index ray are
n + m =p°.(")(n' + m,pO.(m)-o.(.)),
1, ½, ( ½ ) " , . . . ,
whence,
it may be seen that the clusters of the hierarchy having level (½)" are the disks having radius (½)".
n = pO.t. )n'
where Op(n') = O,
n =p°.t")m'
where Op(m') = O,
and if we suppose that
O,(n) <.~O.(m),
ifn+m~:0:
Op(n+m)>tOp(n)
from(tit),
i.e.,
ray(B)=(½) ° ~" ] y ~ Z : B = B ( y , ( ½ ) " )
(iv)
Op(n-4- m)>t inf(Op(n), Op(m)) ifn+m=O.
Thenl we can prove the following proposition. Prolmsifion 4.1. Let d, application from Z 2 in R +
be defined by
V(n,m)cl:
{d(n,n)=O, d(n,m)=½o.(,,_,,,) ' n = m .
Then d is an ultrametric d i s t a n c e on Z, called p-adic distance. Proof. We have d(n, m)= d(m, n) from (ii), and
¢o 3 y ~ Z : B = ( x = y + p a q l a > ~ n , O , ( q ) - - O ) (since d(x, y) < (½)" ¢* Op(x - y ) >1n)
~* x - y = paq where a >~n, Op( q ) = O. Thus p and q are prime between them since p is a prime number. N o w we prove that we can choose as center of the clusters immediate successors of a duster B having level (½)", containing a number x, the p numbers dements of B,
x , x + p " - I , . . . , x + k p " - I , . . . , x + ( p - - 1)p " - I Denoting by [p] the set ( 0 , . . . , p ) , by ]p] the set ( 1 , . . . , p ) and by B(x) the duster having level (½),- l containing x, we have
65
Volume 1, Number 2
V k~[p-
STATISTICS& PROBABILITYLETTERS V x, y ~ B: r a y ( B ) = (½)"
1],Vk'~[p]-(k):
B(x +
¢~ x = y + p ~ q ,
a(x + k'p°-'),
a>~n, Oe(q)=O
,~ x = y ( p ~ ) .
Otherwise, a q ~ Z should exist such that
The clusters of the hierarchy ~ are then the classes of congruency modulo the successive powers
x + kp ~-I = x + k'p "-m +p~q ¢~ ( k - k ' ) p " - l = p ~ q
ofp.
,~ k - k' = paq
These clusters fit together because of the relation
wherefl>l.
N o w we can always suppose that k - k' is positive, since k - k' ~ < p - 1 implies the last inequality is impossible. Therefore, the p clusters {B(x + kp ~ - | ) l k [ p - 1]) are disjunct since they have distinct centers in an ultrametric space. To prove that this set is Sui(B), we also have to prove that
V y ~ B: 3 k ~ [ p - 1 ] ; y ~ B ( x + kp~-I). Now
y~B
~
3h>~n-l,3q~Z;
Op(q)=O,y=x+pXq. Then, seek k ~ [ p - 1 ] , such that
y=x+kp"-I
a>ln, q'~Z;
O?(q')=O
+p~q '.
This is equivalent to
pXq = k p , - i + p~q,
n'>n
wherefl--)~-n+l>ll
~, paq =p'q' + k whereS=a-n+l>/l. The last equation is the Euclidean division of
paq by p. This quotient can be written as p S - ~q,, where 8 - 1 is positive and q' prime with p since fl >~ 1. The remainder is k, k < p - 1. Thus we proved that the set of the immediate successors of any cluster having level (½)n-i is a set of p clusters having level (½)" for which we found a set of centers. Consequently the number of d u s t e r s having level (½)" is p L These results can be interpreted by the followLag fashion,
=~ ( x = y ( p "') ~ x = - y ( p " ) ) .
The congruency modulo pn being a relation of equivalence induces a partition of Z; the set of these partitions (n varying in N) forms the hierarchy o~. Specially at the level (½)", the possible remainders of the Euclidean division by pn are 0, 1 . . . . , p" - 1. The n u m b e r of clusters, having level (½)~ as well, equals to p". It follows that we can choose as set of centers of the clusters having level (½)" any sequence of p" consecutive numbers and for example ]p"] as we will see it in the following example.
4.3. Example Put p = 2. Then we have ISui(Z)l = 2, Z = B U B', where B = (n = 2k} is the set of the even numbers, B' = (n = 2 k + 1) is the set of the odd numbers. We have
~, p ~ - ' ( p a q - k ) = p ~ q '
66
November 1982
r a y ( B ) = s u p ( d ( n , m ) l (n, m) c B) = 2 -inf o2~..-,) Now
02(n-m)=O2(2(k-k'))>~
1,
and the i n f i m u m for k - k' is odd, e.g., n = 2 k = 6 and m = 2 k ' = 16; whence r a y ( B ) = ½. By the other way, ray(B') = ½, since
n-m=2k+
1 -2k'-
1 = 2(k- k').
Atl~t
d(
B') Laf(d(n, n') n B, n' B') =
2--SUP02(n--n')
Volume 1, Number 2
STATISTICS & PROBABILITY LETTERS
November 1982
Z
±8 i Fig. 2.
N o w for any n a n d n',
References
n - n' = 2 ( k - k ' ) - 1 is o d d .
Benzecri, J.P. (1973), L'Analyse des Donn~es. Tome 1: La Taxinomie (Dunod, Paris). Caillez, F. and J.P. Pages (1976), Introduction ~ l'Analyse des Donnees (Smash, Paris). Diday, E. et al. (1979), Optimisation en Classification Automatique (INRIA, Le Chesnay). Jambu, M. (1978), Classification Automatique pour l~nalyse des Donn~es. 1-M~thodes et Algorithmes (Dunod, Paris). Jambu, M. and M.O. Lebeaux (1978), 2-Logiciels (Dunod, Paris). Lerman, I.C. (1981), Classification et Analyse Ordinale des Donn~es (Dunod, Paris).
Whence
d(B, B')=
2 -°=
1 = ray(Z).
A t the level ¼ the clusters are B (1), B (3),
i m m e d i a t e successors of B',
B (2), B (4),
i m m e d i a t e successors of B.
W e obtain the tree, d e p i c t e d in Fig. 2, where each o f the clusters is m a r k e d by o n e of its centers.
67