A cluster computer system for the analysis and classification of massively large biomedical image data

A cluster computer system for the analysis and classification of massively large biomedical image data

PERGAMON Computers in Biology and Medicine 28 (1998) 47±60 A cluster computer system for the analysis and classi®cation of massively large biomedica...

464KB Sizes 3 Downloads 85 Views

PERGAMON

Computers in Biology and Medicine 28 (1998) 47±60

A cluster computer system for the analysis and classi®cation of massively large biomedical image data T. Daggett *, I. R. Greenshields Medical Imaging Laboratory, U-31 Room 19, T. L. Booth Research Center, Department of Computer Science and Engineering, 233 Glenbrook Road, The University of Connecticut, Storrs, CT, U.S.A. Received 25 March 1997

Abstract The current trend in medical image acquisition is towards the generation of image datasets which are massively large, either because they exhibit ®ne x, y, or z resolution, are volumetric, are multispectral, or a combination of all of the preceding. Such images pose a signi®cant computational challenge in their analysis, not only in terms of data throughput, but also in terms of platform costs and simplicity. In this paper we describe the role of a cluster of workstations together with two quite di€erent application programming interfaces (APIs) in the quantitative analysis of anatomic image data from the visible human project using an MRF-Gibbs classi®cation algorithm. We describe the typical architecture of a cluster computer, two API options and the parallelization of the MRF-Gibbs procedure for the cluster. Finally, we show speedup results obtained on the cluster and sample classi®cations of visible human data. # 1998 Elsevier Science Ltd. All rights reserved. Keywords: Image processing; Cluster computing; Parallel computing; MRF-Gibbs; Classi®cation

1. Introduction Medical image datasets, like a wide variety of other image datasets, are increasingly characterizable as massive in terms of their spatial resolution and/or their spectral resolution. Since the lower bound on processing costs for an n  n image grows quadratically (O(n2)) with the on-side dimensions of an image (since most meaningful algorithms have to examine every pixel at least once), higher resolution images bring nonlinear growth in computational costs associated with their processing. On the positive side, many imaging algorithms are easily adapted to the single program multiple data model of parallelism, so that multiple copies of an * Corresponding author. 0010-4825/98/$19.00 # 1998 Elsevier Science Ltd. All rights reserved. PII: S 0 0 1 0 - 4 8 2 5 ( 9 7 ) 0 0 0 3 2 - 2

48

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

algorithm can execute in parallel acting on di€erent subdivisions of the image. Thus, discounting typical parallel overheads, many imaging algorithms are in principle amenable to the p-speedup when distributed over p processors. (Many image algorithms are, of course, not so amenable.) It is therefore often the case that the issue impacting the processing throughput of massive images lies not in complex algorithm design, but in platform availability. It will sometimes be true that a large parallel system's costs can be folded into the cost of the imager itself without apparent impact on total system cost (as might be the case with a modern MR imager), but the average biological or medical laboratory interested in analysing these images is unlikely to wish to acquire a large parallel system, both because of cost and system management problems. However, even the most modest laboratories will have access to inexpensive PC-level workstations, and it has now become a trend in parallel system design to exploit groups of these systems networked together to form what is known as a cluster computer. Cluster computers, unlike the more expensive symmetric multiprocessors which characterize the bulk of non-cluster parallel systems are inexpensive to assemble, easy to maintain, extremely fault-tolerant (in the sense that the failure of a cluster component Ða workstationÐ simply means that the cluster has less processing power than it had before), surprisingly powerful (a tribute to the modem microprocessor) and (with newer application programming interfaces) relatively easy to program. Clusters can be quite small yet still powerful; the architecture we describe below consists of 8 Pentium systems capable of a (theoretical) throughput SpecFP of about 245 [1]. The principal software component for application development on parallel systems is the underlying communication mechanism used for inter-process communications. There are a number of di€erent software communication paradigms that have received signi®cant attention. Of these, the message passing interface (MPI) has been much heralded as the future standard for message passing based communications [2, 3]. A very di€erent and less well known technique for inter-process communications is virtual shared memory. Virtual shared memory allows processes to communicate by directly sharing data as if it existed in a global shared memory space. Processes can access (read or write) information in the memory space without concern or knowledge of external processes, and can therefore be developed in a more sequential program fashion [4]. This conceptually simpler view supports the main advantage usually associated with virtual shared memory, which is that the application programming interface is usually quite simple and therefore the complexity of developing parallel applications is greatly reduced. Paradise (Scienti®c Computing, New Haven, CT) is a widely used virtual shared memory based communications package that o€ers a very simple API. The particular problem we address involves the derivation of quantitative anatomical parameters (such as tissue volume by type etc.) from the visible human dataset [5]. Our medical imaging goal is to segment (by automatic image classi®cation) the anatomical images of the visible human dataset so that quantitative anatomical issues and computational geometric models can be constructed from the dataset. In reference to earlier work from MR images [6, 7] we have elected to use the visible human dataset to extract out and model the bladder and urethra. Both because it is anatomically simpler and because the dataset is homogeneous in x, y, and z resolution (0.33 mm per pixel), we are working with the newer visible female dataset. Even in the small anatomical region of interest we are examining, the dataset is huge: each raw image (2048  1216  3) exceeds 7 MB. The pelvic imagery alone (from the base of the sacrum

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

49

to about the linea terminalis on the sacral promontory) occupies about 4200 MB. The algorithmic strategy involves image preprocessing (to remove background and non-anatomical image details) followed by an unsupervised context-dependent classi®cation of the RGB images using a Gibbs classi®er (described below). Automatic image classi®cation, though dicult, is not a new problem and numerous algorithms have been developed for other problem domains which may aid in the classi®cation process [8]. These algorithms can be grouped into two principal categories: context-dependent and context-independent. Context-independent algorithms perform an image classi®cation based directly on a pixel's intensity and the distribution of the intensity values of an image. In simpler terms they attempt to classify the pixel purely from its intensity with respect to the rest of the pixel intensi®es of the image. Context-independent algorithms classify the pixel values of a random image (one where pixels are randomly scattered) identically to the pixels of an image that contains well de®ned regions (typical digital image), as long as the images have identical intensity distributions. Well known examples of context-independent algorithms include the nearest mean, maximum likelihood estimation and K nearest neighbors methods [8]. In comparison to context-independent algorithms, context-dependent approaches attempt to utilize relationships, or contextual information, between pixels within an image to perform a classi®cation. An important and popular example from the context-dependent category is the MRF-Gibbs algorithm [9, 10]. The MRF-Gibbs classi®cation method utilizes contextual information derived from the relationship that a point (pixel) in an image has with the neighborhood (spatially close pixels) that encompasses it. This relationship is assumed to exhibit the characteristics of a Markov random ®eld (MRF), in which the neighboring pixels can be e€ectively used to classify the pixel. The Gibbs component of the MRF-Gibbs classi®cation method stems from the evaluation of this relationship in terms of an energy function. This energy function can be used to produce the ideal image classi®cation through the maximization of the energy function at each pixel of the image. However, the determination of the ideal classi®cation requires that every possible combination of pixel classi®cations of an image be examined. Therefore an n  n image with L possible classes contained within would require Ln  n combinations to be evaluated. This evaluation is considered computationally intractable and hence is considered not plausible for practical sized images. To address this processing limitation, the determination of the ideal classi®cation can be approximated through the employment of stochastic maximization techniques, such as simulated annealing, to reduce the computational burden of the MRF-Gibbs algorithm. Theoretically, the MRF-Gibbs classi®er iterates hundreds (or even thousands) of times over each image, repeatedly computing energy functions and probability distributions, so that the computational demands are enormous. Faced with this challenge, we elected to construct a cluster computer comprising 8 PC platforms (Pentium 166 MHz systems, 32 MB running Microsoft Windows NT). The machines are connected both by a 25 Mbs ATM switch and 10 Mbs Ethernet, and (as we describe below) support two distinct application programming interfaces (MPI and Paradise) for the implementations of parallel algorithms. In the sections that follow we describe the parallelization of the MRF-Gibbs algorithm in conjunction with simulated annealing, by ®rst identifying the basic operational concepts of the algorithm and then de®ning how it was

50

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

parallelized on the cluster. The speedup of the parallelized algorithm using both MPI and Paradise based communications over a serial version is presented.

2. MRF-Gibbs In the core of the MRF-Gibbs algorithm lies the Markov random ®eld (MRF) assumption. The MRF assumption states that the true interpretation of any pixel Xij given the true interpretation of all image pixels G depends only on the interpretation of its neighboring pixels in a neighborhood Nij (Eq. (1)). P…Xij ˆ o k jG † ˆ P…Xij ˆ o k jNij †

…1†

This interpretation of a pixel can be used for the assignment, or classi®cation of a pixel to a class ok, which represents a labeling of the pixel from a given set of L possible classes. These classes may correspond to tissue types believed to be contained in the image (muscle tissue, bone, etc.). A neighborhood Nij consists of those pixels that fall into a de®ned region surrounding the pixel at location (ij). Various shaped neighborhoods can be used, and their shapes (sizes) are usually determined from the underlying characteristics present in the regions, or structures of an image. A very commonly employed and simple neighborhood system may be de®ned as consisting of those pixels which are within a particular Euclidean distance from the pixel Xij. Typically the relationship between neighboring points diminishes greatly with increasing distance. It has been shown [9, 10] that a meaningful neighborhood may in fact be quite small, and can be de®ned as consisting of those neighbors, which are a Euclidean distance (d) of one from the pixel in question. Thus Nij ˆ f…r,s†:d……r,s†,…ij †† ˆ 1g

…2†

de®nes a four-element neighborhood, consisting of the east, west, north and south adjacent pixels about the site (ij), and it is this simple but e€ective structure that we use. With this concept of a four-pixel neighborhood, the assignment of pixel Xij to a class ok can be evaluated in terms of the posterior probability P(okvXij,Q), the conditional probability that the assignment of class ok is correct given observation Xij and prior information Q. The prior information Q consists of the pixel's neighborhood Nij, global information such as the class probability of the various classes occurring naturally in the image P(ok), and local information de®ning the likelihood of the contents of the neighborhood. In simple terms, the probability of class ok, being assigned to site (ij) depends only upon the observations contained in its neighborhood Nij, Eq. (3). Y P…o k jXxy ,Q† …3† P…o k jXij ,Q† ˆ …xy†2Nij

In Eq. (3), Xxy represents an observation contained in the neighborhood Nij. From a Bayesian perspective, the best classi®cation for Xij is the class ok that maximizes Eq. (4).

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

Y

P…o k †P…Xxy jo k †r

…xy†2Nij

Y

P…o h †P…Xxy jo h †, 8h 6ˆ k

51

…4†

…xy†2Nij

The decision rule, de®ned above, involves maximizing the probability of the classi®cation of an entire neighborhood. The MRF-Gibbs equivalence shows that Eq. (4) can be written in a Gibbsian form P(Xij,ok), P…Xij ,o k † ˆ

1 ÿU…Xij ,o k †=T e Z

U…Xij ,o k † ˆ ÿ

X

log P…Xxy jo k † ‡ U…o k †

…5† …6†

…x,y†2Nij

and that this Gibbsian equivalent can be evaluated using an energy function, U(Xij,ok) used here, a normalizing constant Z, and an arti®cial temperature T. Proof of the equivalence between the Gibbs and MRF representations is given in Refs. [9, 10]. The MRF-Gibbs classi®cation process therefore consists of assigning the pixels of an image those class values that produce a maximum posterior distribution when the energy is minimized. The description above has concentrated on the local expression of the MRF-Gibbs model for classi®cation. Evidently, energy minimization cannot proceed on a purely local basis; instead, we consider the maximum a posteriori (MAP) estimate of the image's class structure as follows. Suppose there are L classes in an n  n image. A con®guration z $ C is an enumeration of the classes attached to each point of the image. In the case described, cardinality card{C} = Ln  n. Let W be the (random) observation of a con®guration, and let X be the (random) observation of the image data; then the MAP estimate is the estimate [8, 9]. WMAP ˆ arg maxz2C fP…W ˆ zjX ˆ x†g

…7†

However, the assignment of the pixels cannot be performed pixel by pixel but instead represents a dynamic programming problem. One can employ simulated annealing to migrate towards an approximate solution to this maximization problem. Simulated annealing is an iterative optimization technique which attempts to minimize an energy function through random excitation [11]. It can be utilized to reduce the overall computation of locating an ``ideal'' classi®cation to a more reasonable amount. Here ``ideal'' now refers to an approximate solution to the classi®cation problem, which may or may not be the true classi®cation solution. Typically, the procedure terminates after a ®xed number of iterations of the annealing process. In the MRF-Gibbs classi®cation approach, simulated annealing functions by periodically reclassifying a pixel to a ``worst'' classi®cation, i.e. a classi®cation that actually produces a higher energy potential. It is this occasional acceptance of a ``worst'' classi®cation that allows the algorithm to jump out of a local minimum in search of a global minimum, consequently producing an improved classi®ed image. The decision rule for the acceptance of a ``worst'' classi®cation is governed by a temperature and a cooling schedule. Numerous cooling schedules have been developed for simulated annealing [11], however it appears that the best schedule is usually determined through ad hoc trial and error. In our

52

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

case, a cooling schedule T(t), (Eq. (8)) T…t† ˆ

T0 log…t ‡ 1†

…8†

where T0 is the initial temperature is employed [12]. The initial temperature T0 was determined from the ®rst iteration of the algorithm, which evaluated the image by summing the total energy encountered (increase in energy) during an attempted classi®cation of the pixels to a new classi®cation with higher energy. The gradual cooling of the temperature produced by the schedule ideally allows the algorithm to settle in the global minimum. With the employment of simulated annealing to ®nd the minimal energy produced by Eq. (6), the remaining complexity of the MRF-Gibbs approach involves determining a useful energy function. The derivation of an energy function typically involves understanding the nature of the image, identifying a meaningful neighborhood shape (and size), and describing the unique attributes of the image regions, or structures. Numerous models have been developed to describe these di€ering attributes in terms of relationships of the pixels of the neighborhoods, contained in the regions or structures. As result, many di€erent energy functions have been developed which attempt to evaluate the likelihood or quality of a neighborhood relative to a set of attribute quali®ers [10]. For our energy function, X X log P…Xxy jo k † ‡ log P…o …x,y† † U…Xij ,o k † ˆ ÿ ‡

X

…x,y†2Nij

…x,y†2Nij

log P…x,y†,…x‡1,y† …o …x,y† ,o …x‡1,y† † ‡

…x,y†2Nij

X

log P…x,y†,…x,y‡1† …o …x,y† ,o …x,y‡1† †

…9†

…x,y†2Nij

we utilized the conditional probability of each neighbor in Nij given the class ok, the probability of the classi®cations of neighbors, and the transitional probabilities associated with the spatial con®guration of the neighborhood [13]. This spatial likelihood encompasses both the horizontal and vertical transitions of the neighborhood. Since our neighborhood consisted of the four-pixel stencil mentioned earlier, the horizontal relationship consisted of evaluating the transitional probability of the west pixel to pixel Xij and Xij to the east pixel. Likewise the vertical relationship consisted of the vertical transitional probability of the north pixel to Xij and Xij to the south pixel. It should be noted that a transitional probability of ok to oj is the likelihood, or probability, that ok, is followed by oj in the direction of the transition type. Simulated annealing is employed in conjunction with Eq. (9), as de®ned by the following series of steps. 1. Randomly select a new class for the pixel xij. 2. Evaluate the new classi®cation in terms of the neighborhood relationships and the likelihood of the neighborhood via the energy function (Eq. (9)), DU. 3. If the re-classi®cation produces a lower energy then the current classi®cation then accept it. 4. Else accept it conditionally based upon an annealing schedule derived probability Ai>r. The probability Ai, Ai ˆ eÿDU=T…t†

…10†

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

53

is determined based upon the magnitude of the increase in energy DU from the current pixel classi®cation to the new one, and the annealing schedule T(t). The acceptance guide r is de®ned as a uniform random variable between 0 and 1. The series of steps de®ned in the re-classi®cation rule were applied to each pixel of the image in succession. Upon the completion, the temperature controlling the annealing schedule was updated and the next iteration of the algorithm was begun. The number of iterations required to produce the ``ideal'' classi®cation is not predictable. A classi®cation resulting from a context-independent classi®cation approach (K nearest neighbors) was used to estimate the transitional probabilities, the probabilities of the classes, the class statistics (means and covariances) and as an initial classi®cation of the image.

3. Implementation 3.1. Parallel design description The parallelization of the MRF-Gibbs algorithm was approached through domain decomposition in which both data and computation was partitioned. This led to the principal employment of a single program multiple data (SPMD) design, in which n child processes (n = 1 to 7) cooperatively work on di€erent portions of an image (W wide by H high) under the direction of a single parent process. The principal responsibilities of the parent process are the partitioning and distributing of the image data amongst the children, performing collective calculations, and communicating important global classi®cation information. Child processes are tasked with performing the required image classi®cation operations on their image portion. Information used by the child processes, obtained from the parent process, consists primarily of image data and class related information, while information obtained from neighboring child processes consists of the conditional probabilities of the image partition edges and the classi®ed values associated with the same edges. This edge information is required in order to generate the maximum number of complete neighborhoods utilized in the MRF-Gibbs classi®cation process. Evidently, the outcomes of the serial and parallel algorithms are identical. Each child process is given an approximately equal size portion of the image to work on. This image section consists of a rectangular segment of the image, which was W=n in width (the last process receives extra columns when required by the magnitude of n) with a height of H. Therefore the partitioning of the data is basically performed by mapping the columns of the image to particular child processes. Image data (both raw image data and initial classi®ed data) is sent by the parent to the children in row by row fashion. Hence, each image message consisted of a W=n image pixel values with a total of 2nH messages sent from the parent to the children. In addition, a total of nH number of messages is sent from the children to the parent at the end of a classi®cation, to return the classi®ed portions of the image to the parent for re-uni®cation. Each process, child and parent, were mapped to a separate processor of the cluster. This was chosen in attempt to both maximize processor utilization and load balance the overall system. In addition, the principal communications required between child processes consists of the

54

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

exchanging of classi®ed edge data at the beginning of each iteration. During a given MRFGibbs iteration, each child can operate simply on their image portion without interference by neighboring processes. Both the parent and child processes exhibit synchronous communications in that parent and children will wait for expected information to arrive. The MRF-Gibbs parallel application was designed and implemented with respect to scalability, future software modi®cation, and software reuse. Both the child and parent processes were created to function without concern for either the number of processors or processes employed. A high-level communications interface (C + + class) was designed and implemented to abstract the actual algorithm above the communications mechanism utilized, therefore supporting the rapid employment of other forms of underlying inter-process communications. This class also provided overloaded operations, which abstracted the development of the algorithm above the complexities of the communication of various data types, thus further simplifying code development.

3.2. Image dataset The MRF-Gibbs algorithm was customized to improve its performance due to the images used and the parallel environment employed. Sample images from the visible human project female dataset were used for all classi®cations. The raw images have dimensions of 2048  1216  3, but are reduced by background elimination to 1300  700  3.

3.3. Algorithm realization The implementation of the MRF-Gibbs algorithm was approached through a series of phases. The phases were data loading, transitional probability determination, class conditional probability construction, and MRF-Gibbs re-classi®cation iteration. The ®rst phase, data loading, consists of loading in the raw image data, the contextindependent classi®ed version of the image, and the statistics of the classes determined from the context-independent approach. The context-independent classi®ed image contained a single classi®ed value (0 to L ÿ 1 classes) for each pixel. The class statistics consists of a class mean g, covariance matrix Sk and probability Pk for each class. The next phase consists of determining the transitional probabilities of the classes de®ned in the context-independent classi®cation of the image. The class conditional probability construction phase followed the transitional probability phase. This phase consists of determining the class conditional probability of each class for each pixel of the image. The class conditional probability was modeled as a maximum likelihood estimation posterior probability, which is the posterior probability of a class ok given a pixel value X. Eq. (11) de®nes the class conditional probability P(okvXij) of class ok, of L total classes based upon the probability of Xij given class ok, which is identi®ed as the conditional probability p(Xijvok), and the overall probability of class ok, P(ok).

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

P…o k jXij † ˆ

P…o k †P…Xij jo k †

L X

55

…11†

P…o z †P…Xij jo z †

zˆ1

In Eq. (11), the conditional probability p(Xijvok) is modeled as a normal distribution using the class mean and covariance matrix de®ned in the class statistics. Overall this phase consists of calculating and storing W  H  L conditional probabilities, where W, H, L refers to the image width, image height and number of classes of the non-contextually classi®ed image, respectively. The ®nal phase, MRF-Gibbs iteration, involves the actual re-classifying of the pixels through a series of re-classi®cation iterations. During the ®rst iteration the initial temperature, T0 is set based upon the total energy encountered during the attempted re-classi®cation of all pixels. During this initial iteration no pixels are re-classi®ed which produced a ``worst'' classi®cation, but instead the increase in energy encountered during an attempted re-classi®cation of a pixel is accumulated, and used to set T0. During the second and subsequent iterations this phase attempts to re-classify each pixel to a random class value. The energy of each new classi®ed pixel value is evaluated and accepted based upon the guidelines de®ned in the re-classi®cation rules for MRF-Gibbs. This phase iterates through the MRF-Gibbs re-classi®cation algorithm for a ®xed number of times. At the end of each iteration, the temperature is adjusted in accordance with the annealing schedule de®ned earlier. Overall the implementation of the MRF-Gibbs algorithm can be examined as a series of steps. These steps can be separated and listed as the following. 1. 2. 3. 4. 5.

Load in the raw image, the class statistics, and a previously classi®ed image. Determine the transitional probabilities from the previously classi®ed image. Determine the class conditional probability for each class at each pixel. Evaluate the raw image and set the initial temperature. For each pixel in the image choose a new class value at random, and evaluate the change in energy resulting from the energy function (using class probabilities, conditional probabilities, and transitional probabilities). Accept the new pixel classi®cation if the resulting energy is lower, else accept it conditionally base upon an annealing probability. 6. Once all the pixels have been attempted to be re-classi®ed, then lower the temperature based upon the annealing schedule. 7. Go to step 5 until the desired number of iterations has been reached. In a typical image classi®cation the majority of the execution time of the program is spent in either the conditional probability construction phase (phase 3) or in the re-classi®cation phase (phase 5). The other phases do not constitute a signi®cant amount of computation. Fig. 1 represents a graphical view of the parallel version of the MRF-Gibbs algorithm. Steps are 1, 8, and 10 of the MRG-Gibbs parallel implementation are considered to be principally communication steps. Steps 1 and 10 occur only once during an image classi®cation, while the classi®ed edge exchanging step (step 8) will occur during each iteration of the MRF-Gibbs algorithm.

56

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

Fig. 1. Depiction of parallel version of the MRF-Gibbs algorithm.

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

57

Fig. 2. Original image after background removal.

Hence, step 8 is perhaps the true communications cost of the parallelized version of the algorithm, since it is the only communications that results from various numbers of iterations. The other steps (other then 1, 8, and 10) which require communications represent relatively insigni®cant communication amounts and also only occur once per image classi®cation. In Figs. 2 and 3, Ixy represents the image partition and Cxy represents the classi®ed partition of the same image segment. The computationally most intense step is the determination of the class conditional probabilities (step 5), though this step is only required to be performed once per image classi®cation. In step 5 the class conditionals are computed once and stored for re-use during the MRF-Gibbs classi®cation phase (steps 8±9).

Fig. 3. Classi®ed image produced by MRF-Gibbs algorithm.

58

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

Once the program entered the MRF-Gibbs classi®cation phase (steps 8±9), most of time is spent in the evaluation of random class values at each pixel in the image (step 9). This step requires a fairly large amount of computation due primarily to the evaluation of the energy function using both the current pixel classi®cation value and the randomly selected value. Note that the current energy at pixel Xij would not be accurate if the classi®cation of pixel Xij + 1 changes during the attempted re-classi®cation of pixel Xij + 1, hence the energy potentials of the neighborhoods change continuously as neighbors, are subsequently re-classi®ed. At the end of the required number of iterations each child process returns the classi®ed partitions of the image to the parent (step 10), who combines the image sections into a ®nal classi®ed version of the original image. This signals the completion of the classi®cation process.

4. Performance results Table 1 shows the performance of the algorithm for varying number of child processes in conjunction with a single parent process. The number of iterations was ®xed at 200 with 6 classes for all trials. Both MPI and Paradise, as indicated in the table, were utilized for all inter-process communications. The information exchanged consisted of the class statistics, the raw image data and classi®ed partitions of the images, along with the class conditional and classi®ed image edge values. The table lists speedup relative to a single sequential process and the timeline employed started with the distribution of the image data and ended with the returning to and combining of all classi®ed image sections by the parent.

5. Experimental framework All of the parallelization experiments were performed on the University of Connecticut CoW (Cluster of Workstations) facility located inside the Image Processing Laboratory (IPL), Booth Research Center. The machines contained in the cluster were identical and consisted of 32 MB

Table 1 Speedupa of MRF-Gibbs resulting from using MPI and Paradise Number of Child Processes 1 2 3 4 5 6 7 a

Speedup with MPI 0.14 1.32 2.39 2.37 2.43 2.65 2.56

Speedup generated from estimated single process performance.

Speedup with Paradise 0.21 1.86 2.79 3.67 4.47 5.15 6.02

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

59

166 MHz Intel Pentium based personal computers executing the Windows NT 3.5.1 operating system, connected together via an IBM ATM 25.5 MBps Switch. The ATM switch was con®gured to provide LAN Emulation support. All software was written in C + + using the Visual C + + software development environment. Version 0.9 of MPI was used.

6. Conclusion Automatic image classi®cation is an important and computationally challenging task. We have taken the MRF-Gibbs algorithm and have parallelized it using both message passing and virtual shared memory based communications. We demonstrated that the algorithm can be parallelized and implemented on a common cluster using standard inter-process communications. Our results show the following. 1. Classi®cation algorithms can be e€ectively parallelized by partitioning the image between cooperating child processes and then reconciling the local attributes of the image partitions. This coarse grain partitioning approach could perhaps be employed for other classi®cation algorithms, either context-dependent or context-independent. 2. Both MPI and Paradise generate an overhead but can still be e€ectively used if the algorithm computation burden is large enough to compensate for additional communications cost. For relatively computational simple algorithms, such as nearest mean, communications costs may greatly limit the speedup experienced. 3. A small cluster can be con®gured and utilized for the classi®cation process. However, the real-time acquisition and display requirements of future medical image visualization systems are currently far greater then can be realistically supported with current small scale systems. 4. In general, Paradise far out performed MPI for the MRF-Gibbs application developed under the hardware and software suite utilized. In summary, the parallelization of the MRF-Gibbs algorithm demonstrated that statistically based classi®cation algorithms can be parallelized using a coarse grain approach based upon partitioning the image data. Partitioning the data does require that statistics associated with the image partitions be reconciled. However, the amount of data required to be communicated is typically quite small and can be performed using standard communication mechanisms. It should be noted that both MPI and Paradise do generate a processing overhead. The impact of these overheads could potentially be reduced by increasing the processing capacity of the cluster, or perhaps by employing a lower level communication level then TCP/IP for MPI or Paradise. In addition, both MPI and Paradise provide an e€ective application programming interface for the classi®cation problem domain.

Acknowledgements This work was funded by State of Connecticut under its Critical Technologies Program. We also wish to gratefully acknowledge Scienti®c Computing Associates, New Haven, CT for the

60

T. Daggett, I.R. Greenshields / Computers in Biology and Medicine 28 (1998) 47±60

delineation of the Paradise Software for the duration of the project and Dr Michael J. Ackerman, Visible Human Project, National Library of Medicine for providing the Visible Human Female dataset. References [1] SPECfp_rate95, Spec Benchmarks, http://open.specbench.org/osg/cpu95/results/rfp95.html. [2] Snir, M., Otto, S., Huss-Lederman, S., Walker, D. and Dongarra, J., MPI The Complete Reference. MIT Press, Cambridge, MA, 1996. [3] Gropp, W., Lusk, E. and Skjellum, A., Using MPI Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, MA, 1996. [4] Amza, C. and Cox, A., TreadMarks: shared memory computing on networks of workstations, IEEE Computer, February, 1996, 18±28. [5] Visible Human Dataset, National Library of Medicine, National Institutes of Health, Bethesda, MD. [6] Greenshields, I. R. and Chun, J., Simulation of ¯uid ¯ow through biological inhomogeneous elastic tubes, Proc. of IASTED Int. Conference on Modeling and Simulation, Pittsburgh, PA, 1993, pp. 66±68. [7] Greenshields, I. R., Peters, T. J. and Chun, J., Parallel strategies in the reconstruction of surfaces from contour data, Proc. of 4th ISMM Conf. on Parallel and Distributed Computing Systems, 1991, pp. 355±358. [8] Fukunaga, K., Statistical Pattern Recognition, 2nd edition. Academic Press, San Diego, CA, 1990. [9] Li, S. Z., Markov Random Field Modeling in Computer Vision. Springer, Berlin, 1995. [10] Winkler, G., Image Analysis, Random Fields and Dynamic Monte Carlo Methods. Springer, Berlin, 1991. [11] L. Ingber, Simulated annealing: practice versus theory, Mathematical and Computer Modeling 18 (11) (1993) 29±57. [12] Geman, S. and Gemen, D., Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. on Pattern Analysis and Machine Intelligence, 1984, PAMI-8(6), 721±741. [13] Zhu, J. and Greenshields, I., Classi®cation of multiecho magnetic resonance images of brain using MRG-Gibbs classi®er, Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, 1993.

Thomas A. Daggett is a Ph.D. candidate at the University of Connecticut. He received a BSE in computer engineering from the University of Connecticut in 1985 and his MS in computer science from Rennselaer Polytechnic Institute in 1989. He is currently serving as an operability and display engineer in the Advanced Display Research Facility (ADRF) at the Naval Undersea Warfare Center (NUWC) division Newport, Newport Rhode Island. His research interests lie in parallel/distributed processing, image processing, small-scale cluster computing, and advanced display technologies. Ian Greenshields is an Associate Professor of Computer Science and Engineering at the University of Connecticut. His research interests lie in the areas of biomedical computing, biomedical image processing and biodynamical modeling.