A parallel image-clustering algorithm on the “HERMES” multiprocessor structure

A parallel image-clustering algorithm on the “HERMES” multiprocessor structure

Engng Applic. Artif. Intell. Vol. 5, No. 4, pp. 299-307, 1992 Printed in Great Britain. All rights reserved 0952-1976/92 $5.00+(}.00 Copyright 6) 199...

706KB Sizes 0 Downloads 66 Views

Engng Applic. Artif. Intell. Vol. 5, No. 4, pp. 299-307, 1992 Printed in Great Britain. All rights reserved

0952-1976/92 $5.00+(}.00 Copyright 6) 1992 Pergamon Press Ltd

Contributed Paper A Parallel Image-clustering Algorithm on the "HERMES" Multiprocessor Structure* FOTIS BARLOS SITE, GMU, Fairfax

NIKOLAOS BOURBAKIS State University of New York at Binghamton

Clustering is a process which partitions input data into groups after a measure of pattern similarity has been adopted. It is used widely in various areas of picture processing, such as pattern analysis, image segmentation, object detection, syntactic pattern recognition, unsupervised learning, etc. A great variety of algorithms are currently used to implement the clustering process; even specific IC chips have been manufactured for it. In this paper, the implementation of a parallel clustering algorithm on a multiprocessor structure, called HERMES, is presented. The HERMES architecture efficiently handles the iterative nature of the clustering algorithm. Since the H E R M E S system architecture was not available, a hypercube was used to emulate the H E R M E S structure. Thus, the clustering algorithm was written on the hypercube and its results were evaluated, based on different criteria. Keywords: Multiprocessor structures, machine vision, parallel clustering.

INTRODUCTION A great amount of research effort nowadays is targeted towards the area of image processing and artificial intelligence. The notion that a computer can see and recognize objects as a human does is a popular topic in computer technology today. Picture information, however, is immense, and the algorithms that process this information and extract recognized objects out of abstract data involve large computational ability, tedioius calculations and real-time response. This is an area where multiprocessor systems and parallel algorithms are used to relieve some problems that arise when uniprocessor power is not enough to produce results. One of the pattern-recognition efforts is to group the input data into clustered domains after a measure of pattern similarity has been adopted. I This process is called "pattern clustering". The word "cluster" in ordinary speech usually denotes a collection of members that are all close together; and further, that the members in the middle are usually more densely Correspondence should be Electrical Engineering, Binghamton, P.O. Box U.S.A. * This work is a part of the grant, 1992.

distributed than those on the periphery. Although the idea of clustering seems very simple and easily comprehended by humans, it is a very difficult task to transform it into an algorithm that the computer can understand. However this is achieved, the concept of clustering requires, in some sense, the idea of closeness. In short, there is a need for a distance function between two points, which is denoted by the symbol d(A, B). This function must obey the following criteria:

1. d ( A , B ) = d ( B , A ) 2. The variables must be scaled using only one distance measure. 3. d ( A , B ) < - d ( A , C ) + d ( C , A ) , where C is a third element distinct from A and B. The definition of cluster is not straightforward. A set of points, or patterns, belongs to a specific cluster when some of them are close to some others, but not necessarily close to all of them. Consider, for example, the points that form a sphere. Each point is very close to its neighbors; but two points on opposite sides of the sphere's surface are very far apart. Nevertheless, all these points form only one cluster. On the other hand, the points on a line have almost the same properties as those on a sphere. Very seldom, though, do they belong to one cluster.

sent to: N. Bourbakis, D e p a r t m e n t of State University of New York at 6000, B i n g h a m t o n , NY 13902-(~00, H E R M E S project, supported by an F R

299

3(10

FOTIS BARLOS and NIKOI.AOS BOURBAKIS: HERMES MULTIPROCESSOR S T R U C T U R E

There is obviously room here for differences of opinion and probably for differences of definitions, according to the circumstances. In this paper, the mean deviation from the center of gravity, as opposed to the distance between different centers of gravity, is used as a measure of the closeness of clustering. Many attempts have been made in recent years to develop parallel pattern-analysis algorithms to take advantage of the high performance of multiprocessor computer systems. It is often realistic to implement clustering algorithms in parallel, and even to build specialized hardware chips to perform clustering. There have been many efforts to develop special VLSI devices for such purposes. Some candidate algorithms that might be suitable for consideration for VLSI implementation are indicated in Ref. 2. A VLSI systolic architecture for pattern clustering has been designed and fabricated by Ni and Jain in Ref. 3. Many algorithms are used for pattern clustering. 4 Some of the most important are the squared-error, the K-means, and the Isodata algorithms. They all require a large amount of matrix or vector operations. They all have an iterative nature that demands substantial CPU time, even for a modest number of patterns. Their iterative nature makes them ideal for multiprocessor implementation. The algorithm developed in this paper is a parallel "square-error" algorithm, s implemented on the hypercube machine. The architecture of the parallel system which was adopted on the hypercube is a 2-D pseudoquadtree structure, called H E R M E S . The H E R M E S structure consists of (N/2) x (N/2) processor-nodes in a 2-D array configuration, where N x N is the size of the input image and "i" is a resolution parameter. 6-9 The H E R M E S structure receives the image data directly from the environment, by using a 2-D photoarray, and processes them in a parallel-hierarchical (bottom-up and top-down) manner, where orders go down and abstracted image information goes up along the H E R M E S hierarchy. This paper is organized into 6 sections. The second section describes briefly the square-error clustering. The third section provides a brief presentation of the H E R M E S structure. The fourth section discusses the parallel clustering algorithm. The fifth section presents the results of the clustering algorithm applied on a variety of inputs. The sixth section summarizes the overall presentation. SQUARED-ERROR

CLUSTERING s

Clustering techniques that minimize a squarederror criterion have been extensively used in clustering algorithms. The key idea is to chose some partition of the input patterns so that a specific distance function between the patterns is minimized. Many heuristic techniques are used to initially select this partition. The input to the algorithm is a pattern matrix P of

N x M. Each of the N rows represents a pattern, and each of the M columns represents a feature or domain. In the most simple case of the two-dimensional space, the pattern matrix has only two columns, one for the x and one for the y coordinate of the pattern, In moregeneral cases, a feature of a pattern can be its color, its shape, etc. Usually, N is in the range of hundreds, and M<30. The ith row of P, P(i), represents pattern i. The (i,j)th entry of P, p(i,j), represents the ]th feature of pattern i. Matrix P is shown below:

p(O, 0)

p(0, M - 1) p ( 1 , M - 1)

p(l,O) p=

p ( U - 1, O)

p(N-1, M-1

P(O) P(1)

P ( N - 1) The output of a partitional clustering is a partition of the pattern set, i.e. labeling of all patterns. Let L(i) be the labeling of pattern i, O<-i<~N-1. Cluster k is S(k) = {ilL(i ) = k} where 0 ~
1

c(k,j) = IS(k)l

p(i,j) itS(k)

where IS(k)l denotes the cardinality of the set S(k). A K × M cluster matrix C can be defined for the cluster centers as below: c(O) c(1, O)

c(O, M - 1) c(1, M - 1)

c ( K - 1, O)

c ( K - 1 , M - 1)

C=

-c(o) co) C(K- 1) The squared-error of cluster k is then defined as the

FOTIS BARLOSand NIKOLAOSBOURBAKIS:HERMES MULTIPROCESSORSTRUCTURE

301

sum of the square Euclidean distances between the member patterns and the cluster center

e2(k) = E

To u p p e r - / level nodes

d2(e(i)' C(k))

itS(k)

t

-

/

where M-I HERMES nodes

d2(p(i), C(k)) = E b(i' j) - c(k, j)]2 j=0

is the Euclidean distance between P(i) and C(k). Finally the squared-error of the entire clustering is

Fig. 1. Organizationof 16 HERMESnodes. Commandsand abstract informationflowalongthe HERMES hierarchy.

HERMES STRUCTURE K-I

E2(K) = E e2(k)" k=0

The objectives of the square-error clustering are to define, for a given K, a clustering that minimizes E2(K) and to find a suitable K ~ N by repeatedly trying different values of K and obtaining the best partition. The square-error clustering algorithm is composed of two major tasks. One is called the label reassignment process and the other cluster center updating. During the first process a label is associated with each pattern. This label is the cluster center that is closest to the specific pattern. This cluster center minimizes the Euclidean distance defined previously. For the ith pattern, its label will be

L(i) = k where d(e(i), C(k)) = min {d(e(i), C(I))}, for 0~
HERMES 6-I° is a heterogeneous, multiprocessor, hierarchical vision architecture. The overall HERMES structure consists of processor-nodes in a 2-D array configuration. The HERMES system operates as a pseudo-quadtree structure as illustrated in Fig. I. The depth of the structure depends on the resolution parameter of the picture under examination. The distinct segments into which the picture is divided are called "regions." Each processor of the lowest level receives in parallel the image data from a region dedicated to it by using a photoarray, as shown in Fig. 2(a). The system processes all the information gathered by the nodes of the lowest level. The nodes of the system are combined into structures called "kernels." A kernel is defined as a structure of four adjacent processors, as shown in Fig. 2(b). In each kernel, (see Fig. 2b), the upper left node is the master

(a) Regions

© © ®

O

® 0

0 0

Nodes

(b) Grand- Master node Level 0

Level 2

Fig. 2. The HERMES structure: (a) distribution of the picture information, (b) the pseudo-quadtreestructure of the HERMESsystem.

302

FOTIS BARLOS and NIKOLAOS BOURBAKIS: HERMES MULTIPROCESSOR STRUCTURE

of the remaining nodes. Thus the processor at the upper left corner is the full master of the H E R M E S hierarchy. A kernel has the characteristic of selfsimilarity, since a kernel can consist of four adjacent nodes or of four other adjacent kernels. The H E R M E S system has a pseudo-quadtree structure. This means that each father node of a kernel can execute operations of its son nodes. Similarly, the grandfather node can execute operations of its son and its grandson nodes. It thus operates on three consecutive functional levels, as shown in Fig. 2(b). Continuing in this fashion, the whole master node of the system can execute operations performed by the nodes of the lowest level. The quadtree organization is shown in Fig. 2(b). Nodes and buses drawn with thin lines indicate that the specific element does not really exist. The system, however, operates like a quadtree multiprocessor structure, since nodes of the upper level execute the operations associated with the left-most non-existing son node of a kernel. The pseudo-quadtree organization gives the system a powerful capability to process and analyze picture information very efficiently. Since each node can operate on different levels, it can process abstract information corresponding to a small set of regions from the picture, as well as recognize objects. The system can therefore direct the processing of the picture towards the areas that have significant information. The pseudo-quadtree structure has another advantage over the standard one. For two systems with the same depth, the pseudo-quadtree structure requires fewer nodes that the conventional quadtree structure does. Each node of the system is identified by its particular geographic location. A geographic location is defined as the identification that distinguishes one node from another. The nodes of the H E R M E S structure are arranged in the 2n × 2n matrix form. The geographic location of the particular node is composed of distinct digits, a l a 2 a 3 . . , ak, where k equals the level of the H E R M E S structure, assuming that the level of the whole masternode is zero. Each digit, starting from the left, indicates the section in which the node, or the structure with the node, exists in respect to its position within its immediate supersection. The H E R M E S system can operate as a neural-type network, since commands are distributed from the top of the structure to the bottom, while abstract information is composed from the bottom to the top. These two d i f f e r e n t operations are called t o p - d o w n and b o t t o m - u p , respectively. ~0 During the t o p - d o w n operation the grandmaster node receives a command from the user, and delegates the work to its three sons, as well as to itself, by transmitting the appropriate commands to them. These nodes execute the commands and distribute the workload to their sons. This process continues until the nodes at the lowest level of the structure receive com-

mands from their fathers. Each node can communicate only with processors one level lower or higher in the hierarchy, except for the ones that can operate on different levels. During the b o t t o m - u p operation the nodes in the lowest level of the structure start sending information to their fathers after they receive the appropriate commands. There, this information is processed, it changes format and contents, and is further transmitted to nodes in the immediate upper level. This operation continues until all the processed information reaches the grand-master node, which is the user interface. The processor tasks at the lower levels of the H E R M E S structure are convolution, segmentation, normalization, recognition of lines, etc. while at the middle and higher levels the tasks become more complicated and Artificial Intelligence techniques can also be used. Through its processing, the abstract picture information processed by the bottom nodes of the system becomes a concept at the higher levels. As it was mentioned previously, the H E R M E S architecture can easily be adapted on to the hypercube system. This can be done by associating one cube with one kernel. For example a 4-node cube is associated with one basic H E R M E S kernel while a 16-node cube is associated with a 16-kernel which consists of four 2 x 2 kernels. It is important to note that although the H E R M E S system is a pseudo-quadtree structure this does not impose any problems for using cubes that are not multiples of four. The H E R M E S structure can execute the clustering algorithm even if its nodes have only two sons instead of four; meaning that it now has a pseudo-binary structure. However other vision algorithms might utilize the quadtree structure of the H E R M E S system. DEFINITION OF THE CLUSTERING ALGORITHM This section presents the pseudo-code of the clustering algorithm implemented on the hypercube machine. The hypercube is a parallel system with 2" processors, and an n dimensional communication network. Before the presentation of the algorithm some notational definitions are required. --Instructions are usually associated with a label on the right-hand side of the line (e.g. ( . . . instruction...)3). The number in the label enclosed within brackets indicates the node that performs this particular operation. Instructions without labels are executed by all the nodes. --Internode assignments are represented by an arrow and a flag next to each variable. The flag specifies the node(s) that participate(s) in the particular assignment. The format of internode assignments is: structl [n]*--struct2 [m] where n and m can take either numeric values

FOTIS BARLOSand NIKOLAOSBOURBAKIS:HERMES MULTIPROCESSORSTRUCTURE or the values "a," " a - 0 " or "h." A numeric value indicates the particular node that participates in the execution of the operation; character "a" indicates that all nodes participate; " a - 0 " indicates all nodes except node zero; character "h" indicates the host. For example, the expression structl[a]*--struct2[h] denotes that the data structure struct2 is sent from the host to all nodes and is stored into the local structure structl. --Assignment within a node is represented by the equal sign. For example the expression: structl = struct2 [0] indicates that the assignment structl = struct2 is performed only by the zeroth node. - - T h e array structures are denoted with parentheses that enclose the index(es). --Variable CUBENODES indicates the number of hypercube nodes. Variable "s" is the node number index. The pattern matrix is evenly distributed among the hypercube nodes. Node "s," where 1 ~
303

estimates of the cluster centers to the node processes. Step 4: Wait for the results from node zero. Step 5: Receive the calculated cluster centers, their variance and their relative distance from node zero. Step 6: Provide the results to the user. Node processes

Step 1 : Receive the partial pattern matrix from the host.

P(i, j)[a] <--P(i, j)[h], sx

N-1 s

N-1 ~
O<~j<~M-1 Step 2: Receive the cluster center matrix from the host.

C(i, j)[a]<--C(i, j)[h], 0<~ i ~< K - 1, s

xN-1 N-l] s , (s+ 1) x - - - ~

of the pattern matrix. The nodes also receive an initial estimate of the cluster centers from the host. The algorithm then enters an infinite loop. Within the loop each node calculates the cluster center that is closest to each pattern and makes this cluster the label of the particular pattern. It then sends to node zero the number of patterns within its range that belong to every cluster and the sum of their values. Node zero receives all this information and updates the cluster center if necessary. If no convergence has been realized and the iteration index is greater than zero, it sends the new cluster centers to the nodes and the loop iterates one more time. Otherwise, it collects the results and sends them to the host. The host provides the results to the user. The pseudo-codes of the host and node processes are presented next.

O<~j~M-1 Step 3: Repeat steps 4 - 7 for MAXITER times, where MAXITER is a user-defined parameter. Step 4: Label reassignment: L(i)=cluster center 'k' with the minimum distance from pattern i [a] dist(i, k)=distance between pattern i and cluster center 'k' [a] if L(i) different from the previous one [a] change = 1. [a] Step 5: Computation of the new cluster centers: count (k)= IS(k) I, O<~k<~K - 1 [a] [a]

L(i)ek

change[O] *-change[a]. Step 6: Collection of the results by node O: count(k) [ 0 ] * - ~

count(k)[s] s

Host process

Step 1 : read(number of cluster centers used) read(the input pattern matrix) read(initial value of the cluster centers) /*The initial value of the cluster centers can be a random number or based on a priori knowledge*/. Step 2: Load the processes that calculate the cluster centers on the nodes of the allocated cube. Step 3: Provide the pattern matrix and the initial

P(i, j)

Sum (k, j~ = ~

sum(k, j ) [ 0 ] * - 2

sum(k, j)Is]. s

Step 7: Convergence check and cluster center updating: if change! = 0 and iteration_hum < MAXlTER C(k, J~= sum(s, j)lcount(k) [0] C(k, j3 [0]--, C(k, j) [a - 0] else

C(k, j3 [0]--, C(k, j) [hi.

304

FOTIS B A R L O S and N I K O L A O S B O U R B A K I S : H E R M E S M U L T I P R O C E S S O R S T R U C T U R E Table 1. Benchmark results (all results are expressed in s) Clusters without noise Number of nodes 1 2 4 8 16

Few elements

Good estimate

Bad estimate

Two centers

Good estimate

Bad estimate

Two centers

0.014 0.012 0.015 0.028 0.(156

(I.738 0.372 0.192 0.110 (1.084

5.122 2.58 1.33 0.712 0.35

0.858 0.433 0. 225 0.112 0.078

25.34 12.7 6.42 3.25 1.7

21.11 10.58 5.35 2.67 1.43

10.74 5.38 2.73 1.48 0.83

the range of thousands, and s is in the range of 1-128. In this case,

At each iteration loop the node process calculates the new values of the count and sum arrays. This process requires O(compute) = 0

Clusters with noise

N

(N)

- - >2~'S, S

K x-- x M s

therefore time. This information is later sent to node 0. The communication time requirement is O(communicate) = O(s ×K ×M). Node 0 updates the value of the cluster centers in step 6. The time requirement of this process is O(update)= O(K×M). The time complexity of one iteration loop is the sum of the above three time complexities.

O(total)=O (K xNx M). The parallel clustering algorithm has a speedup over the uniprocessor implementation proportional to the number of nodes. The numerical results presented in the next section verify this observation. In cases where the input pattern is small (small N values) the communication complexity becomes proportional to the processing complexity. Any increase of the operational nodes results in an increase of the overall complexity of the algorithm.

O(total)=O (K×NxM) ÷O(s×KxM) + O(K x M). Usually the input pattern is relatively big, with N in 10 9 8

0 7 4~ 6

-0

5

0

0

4

U >-

9

1

~

0

~

1

0

2

I

I

I

I

A

I

i

3

4

5

6

7

8

9

10

X Coordinate Pattern o

Points

Init, E s t i m a t e '

3~4 3,10.

Clust. C e n t e r s '

1.25~1.125

Fig. 3. Image information with few input patterns.

7.66,7.33

FOTIS BARLOS and NIKOLAOS BOURBAKIS: HERMES MULTIPROCESSOR STRUCTURE

305

tO0

90 80 70 60

~ 40 30 20 10 0

~ o

I

~o

2o

3o 40 X Coordinate

50

60

70

so

Init, Estimate: 7,16 38,57 65,30 Clust.Centers: 9.5,9.5 29.5,69.5 66.5,22.5 Init, Estimates: 23,34 45,67 15,89 Clust,Centers: 26.7,50.52 29.6,50.52 29.6,76.4

Q

Init.Estimates: 27,34 45,75 Clust.Centers: 29.5,69.5 61.7,21.4 Fig. 4. Image information without noise.

NUMERICAL RESULTS The numerical results presented in this section depend on the clock speed of the benchmark platform. This project was implemented on the SITE IPSC/2 hypercube of George Mason University. This system has 16 nodes, each with an Intel 80386 CX processor, and an Intel 80387 numeric accelerator, running in 16 Mz. Three different input patterns were tested. The first one consists of only 20 elements. The second consists of 3 distinct clusters that form 3 rectangles. By adding normally distributed noise to the second pattern, the third input was obtained. Each of the second and third input patterns was tested with three different initial estimates of the cluster centers. Overall, seven cases were examined. The cells of the input image can either contain a pixel or not. In a binary system, the presence of a pixel is represented by state '1', where the absence of a pixel is represented by state '0'. A hardware platform implements states '1' and '0' with 5 and 0 V respecti-

vely. In a noisy environment two situations can cause the alteration of an input image: • A pixel is '1' and the noise introduces an undershoot with amplitude t>2.5 V, or 0.5 in the binary system. • A pixel is '0' and the noise introduces an overshoot with amplitude 1>2.5 V, or 0.5 in the binary system. The noise introduced to produce the third pattern is normally distributed with mean zero and variance one. Its random variable is X:N(0, 1). The probability of exhibiting any of the above two erroneous situations is: ~-0.5

P[X< - 0 . 5 ]

=

P[X>0.5] =

v z:r j _ ~ e-1~2,-'dx = 0.31.

The way the noise is inserted is as follows: each time a pattern element is examined, a random number is

306

FOTIS B A R L O S and N I K O L A O S B O U R B A K I S : H E R M E S M U L T I P R O C E S S O R S T R U C T U R E

,00

I .,::

-~ 70 l l ~ a : ~ ~ I T ~ ; , ' :

60 l i ~ .

~,%..~le4m..f '

,tt'[ I ~ n l U I ' ~ ~

,0 I * ¢ ~ ' 0 ~ ~ . ~o ~Ig~r ~_,'nb.~2bb.

i II'Y.~.'N M,.' :'.~

~ d [

[..Y~.J'th;.",~!',"

".,

""

. :, ":.;~;.~.

I I / I

0

0

10

20

30

40

50

60

70

80

90

100

110

X Coordinate m

Init. Estimates: 7,16 38,57 65,30 Clust.Centers: 23.3,28.3 41.2,77.42 75.5,33.7

A

Init. Estimates: 23,34 45,67 15,89 Clust, Centers: 25,68 51,18.7 77.6,65.3

O

Init.Estimates: 27,34 45,75 Clust.Centers: 43.2,72.6 54.7,24.3 Fig. 5. Image information with noise.

generated. If this random number is less than 0.31 then the input pattern changes value, zero becomes one and one becomes zero. If the random number generated is greater than 0.31 then the input pattern value remains unchanged. Table 1 presents the execution time of the algorithm for each case. Figures 3-5 present the various input images, and the edges of the shapes identify the locations of the cluster center calculated by the algorithm for various initial estimates. The first column in the benchmark table identifies the number of nodes allocated to the algorithm. The remaining columns specify the case used and the processing time. Three important observations can be made: 1. The algorithm produced intuitively correct results in all cases; the centers that a human can identify are same as the ones calculated by the algorithm. 2. When the input pattern matrix does not have any background error, the response of the

algorithm is instantaneous. On the other hand, when error is introduced the throughput time of the algorithm increases significantly. 3. When the input information is small the communication and synchronization time overwhelms the processing savings of a multiprocessor system; therefore the response of the algorithm drops as the number of processors increases. Conversely, when the input information is high the algorithm exhibits linear speedup per number of processors. The algorithm presented in this paper does not put any restrictions on the number of processors that need to be allocated. Previously presented parallel clustering algorithms4.5 require N x M processors. This algorithm can adjust the range of the pattern matrix according to the size of the allocated cube. It subsequently distributes the appropriate ranges of the pattern matrix ranges to the hypercube nodes. The proposed algorithm can be modified easily to produce the variance of a cluster domain about its

FOTIS BARLOS and NIKOLAOS BOURBAK|S: HERMES MULTIPROCESSOR STRUCTURE

centers. This information reveals the actual shape of the cluster. For example, a cluster with equal variance in both x and y coordinates might infer that it has the shape of a sphere. On the other hand, a cluster with asymmetric variances might infer that the cluster does not have a canonical shape. Another interpretation result is the distance between the cluster centers. This information can show if one cluster center is isolated from the others and thus be removed. There are, of course, numerous other quantitative measures of clustering properties. It is also useful, for example, to know the closest and the farthest points from the cluster center in each domain. The variance matrix of each cluster can also be of value, although it is difficult to interpret in high-dimensionality problems and can add computational difficulties to an iterative algorithm. Whatever measures of clustering performance are used, the information should always be presented in such a way that lends itself to quick interpretation. CONCLUSION The HERMES structure is able to handle the iterative nature of the clustering algorithms in an efficient way. The results of the algorithm are intuitively correct. When the amount of the input information

307

is small, the uniprocessor system's response is faster than the multiprocessor's. Conversely, when the input information is big the multiprocessor system exhibits a speedup proportional to the number of processors. REFERENCES 1. Watanabe S. Frontiers of Pattern Recognition. Academic Press, New York, 1972. 2. Hwang K. and Fu K. S. Integrated computer architectures for image processing and data base management. Computer, pp. 5160 (1983). 3. Ni L. M. and Jain A. K. A VLSI systolic architecture for pattern clustering. IEEE Trans. on PAMI, pp. 79-89 (1985). 4. Tou J. T. and Gonzalez R. C. Pattern Recognition Principle. Addison-Wesley, Reading, MA (1974). 5. Li X and Fang Z. Parallel clustering algorithms. Parallel Comput., pp. 275-290 (1989). 6. Barlos F. Functional modeling, performance evaluation and failures recovery of the HERMES structure. Master Thesis, GMU (1989). 7. Bourbakis N. Design of real-time supercomputing vision system architectures. 1EEE Conf. on Supercomputing, Santa Clara, CA, Vol. 3, pp. 392-398 (1987). 8. Barlos F. and Bourbakis N. Performance evaluation of the HERMES multibit systolic array architecture. IEEE Conf. on Systolic Arrays, CA, pp. 113-124 (1988). 9. Bourbakis N. and Barlos F. Hardware design of the lower level nodes of the HERMES neuromorphic net. Engng Applic. Artif. lntell. 5, 23-31 (1992). 10. Bourbakis N. and Barlos F. Formal modeling of the M - M and B - B intercommunication schemes of the HERMES multiprocessor kernel. Int. J. on Comput. Simulation, 2, (3) (1992).