he&
171
tn analytual chemistry, vol 2, no 8,1983
5 McCracken, M S , Severance, W F and Jones, T (1982) Anal them 54, 2394 6 McCracken, M S and Severance, W. F (1983)&v Set. Inshwn. 54, 332 7 McCracken, M
S , Severance, W F , Wagner, G S and Volpenhem, M E (1982) Lab Anrm Scr 32, 687
Muhael McCracken recewed a BA &me m Chemzshyfrom Knox Collcgc anda Ph D. m Analgual Chemrstryfrom the Unrverslty of Illtnots tn 1978 He LFa sta~chem~t m the S)ectal Products Group wtth the Procter and Gamble Company, Mtamt ValrCyLuboratwrcs, P 0 Box 39175, Qrutnnatt, OH 45247, USA Hts research tnkrests are m the soluhon of chemual problems with new mstrumentahon
Computer Corner Clustering with a microcomputer from D. L. Massart Microcomputers
are now found
in many
laboratories,
but because they have been Introduced relatively recently, sophisticated scientific software 1s not readily available and the posslblhtles for microcomputers remain largely underestimated. Although contributions in Computer Corner will not be totally confined to microcomputer applications, the mam emphasis will be on these. Indeed, we are convinced that soon every analytical laboratory will have its own microcomputer and that providing microcomputer software will prove a useful service to the whole analytical community. In this article, we will present a clustering program. Clustermg is one technique for which it is generally assumed that a mainframe computer 1s necessary, this 1s mdeed true when the number of objects being dealt with exceeds a few hundred. However, some clustermg techniques allow a more economic use of computer time, MacNaughton-Smlth’s algorithm 1s one of these. For non-specialists, we can define clustering as a mathematical method which 1s used to classic objects characterized by many variables For instance, the objects may be meteorites, for each of which concentrations of 10 trace elements may be given. The purpose of clustering 1s to gather meteorites with the same trace element patterns into groups The typical input of a clustering program 1s a matrix of the type shown m Fig. 1, and the output gives the relationship between the variables Many clustering techniques glvmg different outputs have been described in a recent bookl, the most widely used being the ‘agglomerative hierarchical’ techniques These start with an n X n matrix m which each obJect IS compared with every other object. The most similar pair 1s chosen and the matrix 1s then reduced to (n- 1) X (n- 1) dimensions. This goes on until a 1 X 1 matrix remains It 1s the repeated mampulatlon of large matrices m the mltlal agglomeration steps which makes the technique cumbersome m Vanables ->
The MacNaughton algorithm 1s also hierarchical, but it 1s divisive. The set 1s first split in two subclusters, each of which is agam divided m two, and so forth In the ideal case, when the data set 1s spht mto two equal halves, the first step leads to two (n/2) X (n/2) matrices, which 1s much easier to manage than one (n X n) matrix This algorithm has been implemented on a 48 K Apple II m Applesoft * . The biggest data set we have investigated consisted of 16 data pomts (XRF and X-ray dlffractlon results) on 53 coals. Input was from a file created on a floppy disk and a single disk drive was required Output 1s m the form of a listing of the clusters obtained on a printer The coals were clustered m 90 mm. Taking mto account the fact that an interpreter language was used on an 8-bit machme, the possibility to program with a (faster) complier language and the trend m microcomputers towards 16-bit machmes, one may conclude that the clustermg of 100 or more objects on a microcomputer 1s a very real possibility More generally, since clustermg 1s one of computer the most space- and tlme-consummg calculations that can be encountered by the analytical chemist, the day 1s not far off when microcomputers ~111 suffice to cover all the computation needs of the analytical chemist *More detads about tius program can be obtamed from the author, L Kaufman or P Rousseeuw, Department of Statlstuzs, Vqe Umversltelt Brussel, Plemlaan 2, B-1050 Brussels, Belgum
Reference 1
Massart, D L and Kaufman, L (1983) The Interpretatwn of Analytual Chemzcal Data by the use of Cluster Analyszs Wiley, New York
D L Massart 1s at the Phannaceutual Instrtute, VnJe UnrversrtettBrusscl, Laarbeeklaan 103, B-1090 Brussels, Be&urn Contnbutlons of between 400 and 900 wonls are welcome m the categones described below and should be sent to the appmpnate contnbutmg echtor Information on hardware, general mterfacmg should he sent to
software,
software
tips, and
TrAC ComputerComer B G M Vandegmste, P F A van der Wlel, Department of Analytxal Chermstry, University of Ngmegen, Toernooiveld, 6525 ED Ngmegen, The Netherlands
I
n ObJects (nX m
Values)
Informatron on chetmcal apphcatlons software and mathematical tools for lmprovmg mformatlon content should be sent to TrAC ComputerConur, D L Massart, L Kaufinan, Vqe Umverslteit Brussel, Fakultelt der Geneeskunde en der Farmacle, Farmaceutlsch Schelkunde, Laarheeklaan 103, B-1090 Brussels, Be&urn
1 0165-9936/83/$01 00
Fig I Input matnx for a cl~temzg program @ 1983 Elsewcr Sxncc
Pubhsixn
B V