Fitting straight lines to point patterns

Fitting straight lines to point patterns

(x)31 3203 •4 !~3.(~)+ .t~) I'crgamon Press lad 19S4 Pattern Recognition Society Pattern Ret't~lnition Vol. 17, No. 5, pp. 479 4X3. 1984. Printed in ...

323KB Sizes 0 Downloads 101 Views

(x)31 3203 •4 !~3.(~)+ .t~) I'crgamon Press lad 19S4 Pattern Recognition Society

Pattern Ret't~lnition Vol. 17, No. 5, pp. 479 4X3. 1984. Printed in Great Britain.

FITTING STRAIGHT LINES TO POINT PATTERNS F. M URTA(;tl

Department of Computer Science, University College, Dublin, Republic of Ireland and A. E. R AFTERY* Department of Statistics, Trinity College, Dublin 2, Republic of Ireland (Received 29 July 1983; received for publication 1 December 1983) Abstract--In many types of point patterns, linear features are of greatest in terest. A very general algorithm is

presented here which determines non-overlapping clusters of points which have large linearity. Given a set of points, the algorithm successivelymerges pairs of clusters or of points, encompassing in the merging criterion both contiguity and linearity. The algorithm is a generalization of the widely-used Ward's minimum variance hierarchical clustering method. The application of this algorithm is illustrated using examples from the literature in biometrics and in character recognition. Hierarchical clustering Constrained clustering Principal components analysis Karhunen-LoSve expansion Variance Unsupervised classilication

I. I N T R O D U C T I O N

Visual information can be presented as grey levels, colour images or dot patterns. Over the past decade many algorithms have been formulated which have attempted to distinguish groupings of points in a given point pattern. A pioneering and still central reference is Zahn, tt~ and applications embrace all areas where an image is represented by a set of points and where salient subsets of these points are to be automatically selected. Many algorithms in this area use some connectedness algorithm (e.g. the minimal spanning tree used by Zahn It)) and subsequently carry out a second phase of processing in order to obtain homogeneous groups, particular density-separating contours, and so on. Certain recent research has instead attempted to incorporate differing aims in the one clustering criterion. One example here is the classifying of geographic regions on the basis of socio-economic variables, but also taking into account geographic proximity.~2- 6j In this study, we propose to determine groups which are cohesive but which are in addition constrained to be linear. A clustering criterion which takes these two features into account is described. The clustering criterion differs from the type used in the five lastmentioned references: rather than creating a composite criterion to be optimised, a given criterion is decomposed into parts which are related to the

*To whom correspondence should be addressed. PR 17:5-A

differing objectives. The algorithm is easy to apply but requires one control parameter--a degree of linearity--and the choice of this coefficient is discussed. A number of examples illustrate the algorithm and its usefulness. Finally, alternative strategies and suggestions for further work are outlined.

2. A L G O R I T H M

Agglomerative hierarchical clustering algorithms rtake the given set of N points and successively merge pairs of points and groups. Therefore, the N - 1 agglomerations each define a resulting partition of the points, and the classes of the initial (zeroth) partition consist of the singleton points while there is just one class--the entire point set--in the final, (N - l)th partition. The minimum variance clustering method (error-sum-of-squares method or Ward's method) seeks two groups (possibly singletons) such that if merged the variance of the resulting set of centres of groups is maximum over all such merges. The variance of the resulting partition will necessarily decrease, so that this criterion may be restated as minimizing the change in variance, given that two groups are to be merged. 17) Given the variance of a class of points, Vi, we may decompose this variance into variance of points projected onto the principal axis ofelongation through the class and variance of projections on the second axis, orthogonal to the first. If p is any point, a member of class i, and g is the centre of gravity of i, we have 479

480

F. MURTAGHand A. E. RAFTI~RY

Vi -- ~ d2(p,(]) pei

V] = ~,, d"(p',,q) V~ = y ' d2(p 2, ,q) where p has coordinates (p~,p2) on axes 1 and 2. Evidently,

v, = v~ + v?. By scaling down the component related to the principal axis of elongation through the class,

V * = ~ . V ] +V~

0<~_<1

we increase the importance of cohesiveness of projections on the second axis; the redefined measure of variance therefore gives greater importance to the linearity of the class. Coefficient ~t = 0 would allow very non-contiguous singleton points to be merged, so the strict inequality is required. For a = 1, one obtains the usual variance as a special case. The algorithm in summary form is as follows. Each of the N - 1 agglomerations involves four steps. Step I, For each pair of classes, i and j, calculate the variance, V~j, of the union of classes i and j, i uj. Step 2. Determine the percentage variance explained by principal and orthogonal axes and divide V~ into corresponding proportions V;i = V~ + V~. Step 3. Form V* = ctV)j + V2 and find the minimum of this criterion over all i, j. Step 4. Replace classes i and j so obtained by the centre of gravity o f / w j and also store membership list of class i wj. As an alternative to Steps 1 and 2, the projections of points onto the principal axes might be determined and hence V~j and V2 calculated directly. The storage of membership lists (Step 4) is necessary for the principal components analysis in Step 2. If a packed form of storage is used for the membership lists of all clusters, O(N) storage is required, in total, by the algorithm. It is difficult to give a theoretical bound on the complexity, principally because of the repeated calls to the eigen-extraction routine (Step 2). Empirical results found for examples discussed below on a DEC20 machine were

Average CPU time N N N N N

= = = = =

30 37 61 110 119

7s lls 47s 263s 334s

However O(N 2) storage of percentages of variance associated with principal axes of classes would have obviated repeated calculation and thus reduced the above timings. 3. I M P L E M E N T A T I O N

Having constructed a sequence of partitions (the

Fig. 1. Letter A(1), determined by adjacency relationships among points.

classes of which are partially ordered by set inclusion), the best partition is to be obtained. The preferred solution to this is to define another, external criterion. This is the weighted average percentage variance explained by the principal axes of the classes of the partition, if the kth class in the partition has nk members and percentage variance Pk associated with its principal axis, then the weighted average percentage variance is given by AVE = ~ (nk/n) "Pk k

where the summation is over all classes, k, of the partition, and where n = Enk. This may be taken as a definition of the average linearity of classes in the partition. We are therefore looking for the partition with optimally linear classes (subject to the contiguity constraint inherent in the agglomeration criterion and without which the classes would be meaningless). The presence of a control parameter in the algorithm presented (i.e. coefficient ~) is disadvantageous if the algorithm is to be unsupervised and so the following two-stage procedure is recommended: - - determine the approximate value of at; in all the examples to be discussed, the interval 0 < ct < 0.1 was found to be suitable; - - determine the best partition for a number of values ofct; the overall best partition may then be taken as the partition which is decomposed into classes of greatest linearity. In practice, it is unlikely that partitions containing a great number of classes will be of interest, so in the examples to be discussed we have restricted attention to partitions with between 2 ~nd 5 classes. It is possible that some untried value of ct would give a more linear set of classes, but good suboptimal partitions will be satisfactory for most purposes.

Fitting straight lines to point patterns

481

Fig. 2. Letter A(2}, determined by adjacency relationships among points. 4. EXAMPLES Five examples will be described. Figures 1-5 give these point patterns and the partitions obtained. Note that in no case would the single-linkage clustering method, which uses a minimal connectedness criterion, have achieved comparable results. The first two examples (Figs. 1 and 2} are modelled on characters described in Lu (s) and are defined principally by adjacency relationships between the points. The linear clusters extracted here were found for a value of ct (linearity coefficient) of 0.025 (or 0.05 also in the case of Fig. 1). The third example (Fig. 3) represents the geographic locations of redwood trees. The data is from Ripley. ¢9) The best partition obtained here (with three classes, for ~t = 0.025) was found to present very non-contiguous clusters. The second best partition (ct = 0.05, with three classes) is shown in Fig. 3. This example illustrates that one type of suboptimal solution which can be obtained might be "too good": the clusters found

Fig. 3. Locations of trees.

Fig. 4. Letter A(3}. determined by cloud pattern of points. might not be the contiguous, elongated clusters sought. In order to avoid this difficulty it may be noted whether or not the average, weighted percentage of variance explained by principal axes of classes is above some predefined threshold-ceiling or, alternatively, (visual) reference may be made to the point pattern. Ripley(9) analysed these data using techniques designed to detect local isotropic clustering and repulsion effects and failed to find an adequate model.

Fig. 5. Letter A(4k determined by cloud pattern of points.

482

I". Mt.II~.IA~.;IIand A. E. RAI i l R Y

Table I. Average percentages of variance explained by first axes of classes, weighted by class sizes,for partitions with 5, 4, 3 and 2 classes and for wdues of:¢ (linearity control parameter) 0.1)25, I).05, 0.075 and 0.1 No. of classes

~ = 0.025

~ = 0.05

a = 0.075

u = 0.1

99.42 99.54 99.68* 72.34

98.12 98.26 97.13 86.06

98.12 98.26 97.13 86.06

Letter A ( I ) (cf. Fig. I )

5 4 3 2

99.42 99.54 99.68* 72.34

SUMMARY

Letter A ( 2 ) (cf. Fig. 2)

5 4 3 2

98.64 99.27 99.71 * 87.81

98.61 99.13 99.66 87.98

98.52 99.13 99.66 87.98

98.52 99.13 99.66 87.98

91.96 89.44 94.23"t 78.70

91.96 89.44 94.23"t" 78.70

88.46 90.21 90.52 83.04

93.36 98.60 99.24* 92.97

97.99 97.15 97.93 94.81

95.66 93.61 84.84 91.31

93.88 91.96 92.36 92.06

96.43 97.77 98.48* 84.63

96.53 95.13 95.69 82.58

Location of trees (cf. Fig. 3)

5 4 3 2

89.94 90.64 96.08* 85.34

Letter A ( 3 ) (cf. Fig. 4)

5 4 3 2

97.83 98.46 98.56 92.97

Letter A(4 ) (cf. Fig. 5)

5 4 3 2

94.85 95.46 95.00 85.14

small N ( < 60) is not computationally expensive. An obvious cxtetlsion will be to examine more general polynomial curve-fitting (cf. generalized principal components analysis, discussed in Gnanadesikan, ~1°) p. 53 et seq.). Such non-linear curves arise not only in alphanumeric characters but also in such areas as biometry and anthropology where the point patterns represent, for example, human settlements along river banks. It would be useful to have available general, unsupervised curve-fitting algorithms for such cases.

*Indicates maximum value. Corresponding partitions are shown in Figs. 1, 2, 4 and 5. "i'Secondhighest value, shown in Fig. 3. See text for details. The clusters shown in Fig. 3 are all in roughly the same direction, which may be due to seeds being blown along a prevailing wind direction or to some other anisotropic topographical feature. Thus the dominant effects in the pattern seem to be global and anisotropic rather than local and isotropic, so that techniques such as those of Ripley19) would not be useful. The final two examples (Figs. 4 and 5) relate to linear characters--the first is from Diday ~2' and the second is modelled on this. Values of ~¢ of 0.05 and 0.075 were found to give the desired partitions, with three classes in each. 5. CONCLUSION An alternative algorithm for a similar purpose has been proposed by Diday, '2) Chap. 8. Using as criterion Vi*, an arbitrary division of the point set into a specified number of classes is followed by iterative relocation ofpoints in order to better the criterion. The stepwise algorithm described in this paper is more flexible when the number of classes cannot be fixed before the analysis. The algorithm presented is easy to apply, and for

An algorithm is proposed for obtaining clusters of points which are simultaneously contiguous and elongated. A step-wise agglomerative algorithm is implemented. This algorithm is a generalisation of the widely-used minimum variance hierarchical clustering method (Ward's method or error-sum-of-squares method). The variance of each pair of clusters and/or singleton points, which are candidates for agglomeration at any stage, is broken into variance about the principal axis through the potential new cluster and variance about the axis orthogonal to this. By means of a control parameter, the former is scaled down, thus increasing the ellipticity of the redefined criterion. A number of examples from character recognition and an example from biometrics show that the algorithm works well. In order to bypass the difficulty inherent in specifying a control parameter, it is suggested that the agglomerative clustering algorithm be implemented for, at most, four values of the control parameters and the partition chosen on the basis of the results obtained. Although each partition is characterised by the value of the clustering criterion which has brought about the last agglomeration, these cluster criterion values do not allow an easy choice of partition, since in general they increase as the agglomerations proceed. Therefore, another measure of the "goodness" of a partition is proposed. This is the average percentage of variance explained by principal axes of classes of the partition, where this average is weighted to downgrade classes of few members. This aids considerably in the choice of a suitable partition. REFERENCES

1. C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput. C-20, 68-86 (1971). 2. E. Diday, Optimisation en Class!/ication Automatique. INRIA (1979). 3. M. M. Fischer, Regional taxonomy, Reff. Sci. Urban Econ. 10, 503-537 (1980). 4. A. Ferligoj and V. Batagelj, Clustering with relational constrain t, Psychometrika 47, 413- 426 (1982). 5. F. Murtagh, Fondements theoriques de la classification hi6rarchique sous constrainte de contiguilc continue, Actes Reyroupd, s des Journdes de Class!lication de Toulouse (Mai 1980) et de Nancy (Juin 1981 ), I. C.

Lerman, ed., pp. 177-191. IRISA, Rennes 11982). 6. C. Perruchet, Constrained agglomerative hierarchical

Fitting straight lines to point patterns classification, Pattern Recognition 16, 213-217 (1983). 7. D. Wishart, An algorithm for hierarchical classifications, Biometrics, 22, 165-170 (1969). 8. S.-Y. Lu, A tree-to-tree distance and its application to cluster analysis, I EEE Trans. Pattern Amd. Math. lntell.

483

PAMI-I, 219-224 (1979). 9. B.D. Ripley, Modelling spatial patterns, JI R. statist. Soc. 1339, 172-192 (1977). 10. R. Gnanadesikan, Methods for Statistical Data Analysis of Multi~'ariate Obserrations. Wiley, New York (1977).

About the Author---FtONN M t]Rt A~;ttreceived B.A. and B.A.I. degrees in Engineering Science and an M.Sc. degree in Computer Science, all from Trinity College, Dublin, in 1976 and 1978, respectively, and a Ph.D. (Doctorat de Troisidme Cycle) in Mathematical Statistics from the Universite de Paris VI, in conjunction with the Geological and Mineral Research Centre (B.R.GM.), Orleans, in 1981. Since 1980, he has been on the staff of the Computer Science Department, University College, Dublin. Dr. Murtagh's research interests include all aspects of classification, multivariate data analysis, pattern recognition, information retrieval, algorithmics and computational statistics.

About the Author--Born in 1955, ADRIANE. RAFXI~Vreceived a B.A. in Mathematics in 1976, an M.Sc. in Statistics and Operations Research in 1977, both from Trinity College, Dublin, and a Ph.D. (Doctorat de Troisieme Cycle)in Mathematical Statistics in 1980 from the Universite de Paris VI. Since 1980 he has been a Lecturer in Statistics at Trinity College, Dublin. Dr. Raftery has published papers on solar energy and wind power, geometric probability, point processes and non-normal time series, the analysis of cell patterns in plants and social mobility.