11893-6(180/90 $3.00 + .00 Copyrigh! ' , 1990 Pergamon Press plc
Neural Networks, Vol. 3, pp. 593-6(13. 1991) Primed in the USA. All rights rcservcd.
ORIGINAL CONTRIBUTION
Image Processing and Pattern Recognition in Ultrasonograms by Backpropagation RONALD H. SILVERMAN Cornell Medical College ANDREW S. NOETZEL Polytechnic University
(Received 30 August 1989: revised and accepted 21 March 1990)
Abstract--Ultrasonic images of human eyes in which choroidal tumors were present were applied as input to a neural network utilizing the backpropagation algorithm. Paired with each image was a teaching image in which only elements corresponding to the tumor position were active. During training, the network was stimulated by successive local regions at the original image scale and by one-half and one-fourth scale image representations. The training set consisted of 10 image pairs. Following training, the network was capable of defining the position of tumors in both reference attd nonreference scans. A second network was trained to classtfv tumors based on the height, base dimension, and acoustic backscatter spectra of a set of 138 cases of known pathology. Based on the output of the image analysis network, tumor dimensions and backscatter spectra were computed and then served as inputs to the classification network. The combined networks were capable of localizing and classi~ving tumors and were tolerant of d i f Jerences in size, con Cbrmation, and orientation.
Keywords--Backpropagation, Neural networks, Ultrasonics, Discriminant analysis, Image processing, Classification. INTRODUCTION
tificial intelligence (Mantas, 1987). Although much effort has gone into the attempt to develop artificial intelligence systems based on predicate logic to simulate human pattern recognition, entirely successful results have remained elusive. In recent years, neural networks have been shown to be capable of higher-level pattern classification by self-organizing in response to examples. The backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986) provides a mechanism for the development of multilayered networks composed of units with sigmoidal activation functions approximating the behavior of linear threshold units. Such networks can simulate any logical operation, a capability not shared by linear or single layer networks, and can be trained, unlike multilayered networks of threshold elements. In terms of architecture and processing, such hierarchical networks can be thought of as nonlinear filters. However, these networks are capable of recognizing more complex patterns than can be recognized through the classical image processing operators. Because hierarchical networks can be trained by examples, the solution to the pattern recognition problem need not be specified or even be apparent in advance. In this article, we describe a procedure for defining a network capable of recognition of tumors within ultrasonic images. This procedure uses the backprop-
Image segmentation may be performed by a variety of methods. Edge or boundary enhancement operators include Laplacian, Sobel, or Kitsch transforms, among others (Ballard & Brown, 1982; Duda & Hart, 1973: Rosenfeld & Kak, 1982). Segmentation by texture operators has also been demonstrated (Weszka, Dyer, & Rosenfeld, 1976). These operators all process consecutive small regions of the overall image. Fourier methods are commonly used for selective enhancement by suppression of undesirable frequency components. The relatively simple transforms performed by the above operators, although useful alone or in combination for image segmentation, are generally not capable in and of themselves of classifying objects within complex images. Such higher-level classification falls into the domain of ar-
The ultrasonic tissue characterization portion of this project was supported in part by NIH grant EY03183. Computations for this project were supported in part by the Cornell National Supercomputer Facility, a component of the Cornell Theory Center, which is supported by the National Science Foundation, New York State, the IBM Corporation, and members of the Corporate Research Institute. Requests for reprints should be sent to Dr. R. H. Silverman, Department of Ophthalmology, Cornell University Medical College, 1300 York Avenue, New York, NY 10021.
5~
594
agation algorithm to determine link weights in a network consisting of an input layer, two hidden layers. and an output unit. In addition, we describe use of backpropagation networks as classifiers and compare their performance with linear discriminant analysis for the classification of tumors of different types. based on their geometry and spectral properties as obtained from ultrasonic scans. Combination of the two networks just described provides the capability of automatic localization and classification of ocular tumors.
PRINCIPLES OF ULTRASONOGRAPHY Ultrasonography is one of the standard radiological procedures used for biomedical imaging (Kremkau. 1989). Although a detailed discussion of acoustics is beyond the scope of this article, some understanding of the underlying principles may be helpful to the reader. In pulse-echo ultrasonography, a voltage applied across a piezoelectric crystal results in the emission of a pulse of high-frequency compressiona] waves through the biological media. Acoustic re flection occurs at interfaces between media of differing acoustic impedances. The reflected wave induces a voltage across the piezoelectric crystal. The crystal is a transducer that acts as both transmitter and receiver. Objects with dimensions of one-half wavelength or more are referred to as ' s p e c u l a r " (mirrorlike) reflectors. Smaller objects are referred to as "'scatterers." It can be shown that scatterer size. concentrauon, and relative impedance will result in predictable shifts in the power spectrum of the pressure waves, called "backscatter.'" seen at the transducer (Lizzi. G r e e n b a u m . Feleppa, Elbaum. & Coleman, 1983). Particles much smaller than the wavelength of an acoustic wave will scatter the higher-frequency components of the incident acoustic pulse most efficiently, just as fine particles suspended in the atmosphere scatter the higher frequencies of light, resuiting in a blue sky. As scatterer size increases. backscatter amplitude increases, and the lower frequency c o m p o n e n t s b e c o m e relatively more prominent in the backscatter spectrum. These properties are of significance for acoustic tissue characterization. The internal reflectivity of tumors, for example. is due primarily to backscatter, as opposed to specular reflection (Lizzi, Feleppa. & Coleman, 1986). When reflected ultrasonic energy reaches the transducer surface, compression and rarefaction induce proportional voltages. These voltages are referred to as " r a d i o f r e q u e n c y " data. The envelope of the rectified radiofrequency data is referred to as the "'video" signal. Transducers used for biomedical imaging typically have frequencies of 1 to 20 MHz. The particular
R. H. Sih,erm~>~ :rod
frequency used in a given application dependents on the amount of intervening tissue anti the required resolution: ultrasonic absorption and resolution bofl'~ increase with frequency. Most ophthalmic tlltrasonography is p e r f o r m e d at frequencies in the ~- to II~MHz range. An A-scan is a plot of the ampiimdc of rcturnm~ echoes as a function of time The -~ime interval between the emitted pulse and remH~m~ echoes is interpreted as a distance (with the speed of sound ~alues assumed to be equal in all ti~,~ucsL Ultrasonic B-scan images are produced by s¢:mning the transducer across a tissue region. The )m)c to each echo>, converted to linear distance, anu the sHnultancous angular orientation ot the transdu,:cr determine *he coordinates of the display pixel, which is brightened in proportion to the echo amplitu,tc. This proccs.~. when repeated over numerous angular positions over a sector, produces a two-dimensicma/ B-scan ullage. Acoustic spectrum analysis ol i,ackscattcred ,~enals can be accomplished bv Four.)cr anal}sis of the signals from a gated region corresponding to the region of interest. After subtraction ,~| the backscattcr spectrum from the transducer and ,~stem characteristic spectrum, the resulting normalized power spectrum has the form of a pseudotincal curve showing a characteristic slope ( d B / M H z , ~ MHz mtercep) ~dB) and statistical deviation ~dBl for the tissue under interrogation. These parameters have been found to be useful in characterizme and differentiating different classes of intraocutar tumors t Coleman. Lizzi. Silverman. Rondeau. ";mith. & Torpev. 1982: Coleman & Lizzi. 1983). f-:~mrc i illustrates the normalization process, inctudmg generation ot the slope and intercept of the linear best tit of the normalized power spectrum.
Image Analysis H u m a n mtraocular tumors were ~canned using a broad-band, weakly focused transducer operating in pulse-echo mode. with a center frequency of 10 MHz. The inslrumentatlon consisted of an analog scanner adapted for acquisition of digitized data. DigitiZation of radiofrequency data took place at a sample rate of 5t) MHz, with 2048 8-bit samples per line, which covered an anterior to posterior length of about 3t) mm. Each B-scan image consisted of 128 scan lines acquired over a sector angle of 22 degrees. The digitized data were converted to image format by rectification and averaging. The images p r o d u c e d for analysis consisted of 128 × 128 8-bit pixels. Pixel grey scales were then linearly recoded to range from l to - 5 . This recoding resulted m a m e a n pixel intensity of approximately zero. Ten images of eyes with intraocular tumors ot various types and sizes (2.0 to 5.0 m m in height, 7.() to
595
lmugu Proces.s'ing and Pattern Recognition Transducer and Tissue Spectra 1O 0
Transducer Spectrum Tissue Spectrum (+52 dB)
95
90
85 "10 :3 Q..
80"
E 75-
70
J
i
i
7
9
11
13
Frequency (MHz) Normalized Power Spectrum 45-
50"
dB = - 62.404 + 0 5 3 0 5 8 M H z
"O
55"
-I
60"
E
G5,
70"
75 7
g
11
13
15
Frequency (MHz) FIGURE 1. The difference between the acoustic power spectrum of transducer output and the tissue backscatter spectrum defines the tissue normalized power spectrum.
12.7 mm in diameter) were processed as described. These images constituted the training set for the tumor localization problem. For each image in the training set, a 128 × 128 teaching image was manually constructed in which positions corresponding to the location of the tumor were set active (0.8) and all other positions were set inactive (0.2). Figure 2 (upper right) provides an example of the manual demarcation of tumor position in a teaching image. Each pair of original training and teaching images was averaged by factors of 2 and 4 to produce 64 × 64 and 32 × 32 images. A simulated neural network consisting of a 9 x 9 pixel input layer, two hidden layers of 10 units each, and a single output unit was trained for localization of tumors. The simulations were written in vectorized IBM VS-FORTRAN and were implemented on the IBM 3090-600E of the Cornell National Supercomputer Facility. At the outset of the training procedures, the link weights were initialized to random values ranging from 0.5 to 0.5. During processing, the 9 x 9 input layer of the network was stimulated by successive local 9 × 9 regions of the input image, whereas the network output was paired with the value of the element in the teaching image at the center of the corresponding local region. The 81 input units processed their individual inputs using the sigmoidal transfer function )',r
~
1/(l
+
exp(-x))
(])
FIGURE 2. Upper left: Ultrasonic image of eye with choroidal tumor (crosshair). Upper right: Teaching image, in which positions corresponding to tumor are active. Lower left: Output of neural network after training. Lower right: Output after thresholding.
R. tt 5Uvcrma, ,'rod el, S e~oeg,'U
596
where x, is the input and y~ the output for unit i of the network input layer. An extra unit. which always had an output of unity, was included in the input layer to serve as a bias unit for the next layer of the network. O n e of the 10 units in each internal laver also acted as a bias unit. The input to units on the subsequent layers of the network was /2}
x. = ~, y,w, i
where the output of unit i on network level n is y,, and the link connecting unit i to unit j on level n 1 has weight w,s. The output of each unit is determined by the transfer function given in equation ( 1). The output of the network was that of the single unit on the output layer. For each input/teaching pair, the error was defined as. E,,. - 0.5(y,
- t,~l
(3}
where y ..... was the (scalar) output of the network when centered over image coordinates ( x , y ) and t, was the value of the teaching image at position ( x , y ) . This definition of error, which is to be minimized by the backpropagation algorithm, implies that the network is to operate as a filter, taking as input some region of a scan. and providing as output a real. positive value that will be most active when an input region has characteristics consistent with the properties of tumors as demarcated in the training set. After accumulation of the error, link weights throughout the network were modified according to the backpropagation algorithm aE/Ox, = (OE/Oy,)y~(1
y,)
(4)
OE/Oy, - ~, (OE/Ox }w,,
t5}
aE/Ow~, - (OE/Ox lY
(6)
and where OE/Oy at the top layer of the network was defined as (y,.~ t,.v). Given aE/Ow~j, as defined in equation (6), link weights were modified using Aw.{n} -
- k l O E / O w , , ( n ) - k 2 A w , ( n - 1}
(7)
where k~ and k2 are constants and n and n - 1 refer to the current and previous weight modification cycles. respectively. This is a form of gradient descent with a m o m e n t u m term. A modification of equation (7). in which the value of k, was varied according to the sign of the error change between cycles, was used to achieve improved convergence time (Jacobs, 1988: Vogl, Mangis, Rigler. Zink. & Alkon. 1988). When E ( n ) was less than E ( n - 1), the value of k2 was increased additively. If E ( n ) was greater than E ( n 1), then k, was decreased multiplicatively.
For the initial 50 passes over the training set. the error was accumulated over each scan line prior to link weight modification. For the next 50 passes, the error was accumulated over each scan, and foi ihc final 200 passes, error was accumuiated over the eutire set of scans. At each x.v coordinate within each scan. the network viewed a 9 × 9 region ot the original t28 128 image 1{t.5~.~ of the image i> area), a u ,4 region of the 64 ,, 64 rcpresent
Image Processing and Pattern Recognition
the inflection point of the sigmoidal activation function. In this image, almost all pixels are inactive, with the exception of those corresponding to the tumor. The procedure of gradually increasing the number of accumulated input/output pairs prior to link weight modification has two advantages: 1. As training is initiated, modification of the naive network occurs rapidly (1280 modification cycles per pass), so that a crude solution is reached in a reasonable number of passes. 2. By increasing the accumulation of error first from each line to each scan (10 modification cycles per pass), and then from each scan to the entire scan set (1 modification cycle per pass), a gradual increase in constraints occurs, forcing the network toward a global solution. This is particularly important in the tumor localization problem because features are not evenly distributed throughout the images. Because only some horizontal scan lines actually have active teaching elements, learning on a line-by-line basis results in modification of link weights to capture features present in the last scan line. As the network proceeds to succeeding lines, the constraints imposed on link weights by patterns present in previous lines must diminish. If, however, weights are only adjusted after the entire scan set, then link weights must adapt to conform to global constraints. To illustrate the foregoing process, a single image/ teaching pair from the tumor localization pattern set was used to train a neural network with architecture as just described. The network was allowed to adapt to this image over 150 passes using three different link modification schedules. In the first experiment, the network modified link weights after error accumulation over each scan line. In the second experiment, the weights were modified after error accumulation over each pass over the entire image. In the third experiment, weights were modified after each line for the first 75 passes, and then after the entire scan for the next 75 passes. The results of these experiments, shown in Figure 3, indicate that, as expected, link modification after each line results in a far more rapid decrease in error than modification after each pass. After the initial rapid decrease in error, however, the rates of decrease in error obtained by either method are similar. In the third experiment, when the network switched from modification by line to modification by scan, a further rapid decrease in error was realized. This result demonstrates the advantage of gradually increasing the number of input/output pairs prior to weight modification until weights are modified only after accumulation over the entire set. Tumor location may be automatically determined
597 i 000
-
9O0
.....
MOdify after each line
-
Modify after each pass
-
800
Firsl 75 ilerations by I~rle then b~' pa35
7OO 6OO 5oo4 uJ
7~00
'0c
100
C
2 ~,
SO
>'~
':,,
,,
,~
Iteration
FIGURE 3. Effect of link modification schedule on error during training.
by computing the mean x and v values at each position weighted by the corresponding thresholded pixel intensity. In fact, the location of the crosshair in the central tumor in the upper left of Figure 2 was drawn in this manner. Figure 4 is an example of the application of the trained network to an image not included in the training set. The original image is on the upper left, the network output on the lower left, and the thresholded output on the lower right•
Tumor Classification
There are a number of different types of intraocular tumors. These include metastatic carcinomas from various primary sites, hemangiomas, and malignant melanomas. At times it may be difficult to differentiate between these tumor types. Accurate diagnosis is of critical importance because these lesions differ in terms of treatment and prognosis. Melanomas are primary ocular tumors that have significant metastatic potential. Treatment consists of either enucleation (removal of the globe) or local irradiation. Intraocular melanomas also fall into two broad histologic classes: those composed of spindle cells and those containing epithelioid cells. The latter group is more lethal. Metastatic carcinomas may occur in the eye after spread from some other primary site, usually the lung or the breast. Treatment, in this case, normally centers on eradication of the primary by surgery, radiation, and/or chemotherapy. Hemangiomas of the choroid are vascular tumors that are usually self-limiting in terms of growth and do not have metastatic potential. Diagnosis of intraocular tumors will nearly always include an ultrasonic evaluation. Most commonly, the degree of ultrasonic attenuation and level of internal backscatter are assessed qualitatively by Ascan evaluation. More recently, evaluation of normalized power spectra by Fourier analysis of tissue backscatter has been used for this purpose. It was
598
found that the three tumor types previously described had different spectral characteristics: they can be differentiated by application of discriminant analysis to the spectral slope, intercept, and standard error of the linear best fit of their normalized power spectra (Coleman & Lizzi, 1983: Coleman, Lizzi, Silverman, Rondeau, Smith, & Torpey, 1982). Spindle cell and epithelioid melanomas can also be (partially) differentiated by this technique. In discriminant analysis (Cooley & Lohnes, 1971), one or more functions of the form
are computed based on statistical considerations. where D~ is the discriminant score for function i, uk, are the coefficients for function i and feature k, and fk is the value of feature k. (v0~can be considered a bias term, where f0 = 1.) Classification of cases is based on determination of Z 2 distances to the centroids of each group and a priori probabilities for group membership. It is interesting to note the similarity of the forms of equations (2) and (9). In essence, the underlying classification procedure of discriminant analysis is comparable to that of a linear neural network (Ko, honen, 1984). A similar observation has been made in regard to multiple regression analysis (Stone, 1986), a related statistical technique. As such, discriminant analysis has limitations inherent in any linear system.
R. H. Sih'erm~u~ ~,md A. 5. ,Voetz~~
It is therefore of interest to determine if a neural network utilizing backpropagation for self-organization m~ght prove to be a more accurate classifier than discriminant analysis (see Huang & Lippman. 1988). For this purpose, we investigated the effectiveness of backpropagation to classify tumors based on a small set of features. A training set of digitized ultrasonic scans of I38 tumors was made available from the data library of the Department of Ophthalmology. Cornell University Medical College. The tumor m each case was characterized bv five parameters: tumor height (ram) and base (mm), and. from the spectrum of the backscatter from the tumor interior, the spectral slope {dB/MHz), intercept (dB), and standard error (dB). These parameters were used as the input vector of a neural network for tumor classification Associated with each input vector was a training vector of length 4. consisting ol a~ single active element corresponding to the known diagnosis of the tumor The diagnostic classes were spindle cell melanoma, mixed/epithelioid melanoma_ metastatic carcinoma, or hemangioma. The training set was applied iterative|y to networks of various architectures. We considered networks with no hidden layers, a single hidden layer (having two modifiable layers of connections), and two hidden layers (with three modifiable layers of connections). Each network had four output units that corresponded to the four possible classifications. The network without hidden layers used the delta
599
Image Processing and Pattern Recognition
ultimate accumulated error was 10.7 and the percentage of correct classifications was 73.9% (88.4%). Results for networks containing hidden layers are shown in Table 1. The result using linear discriminant analysis (with a priori probabilities for group membership set equal) was a classification accuracy of 82.1% (94.2%). The results indicate that a neural network with a single layer of hidden units will provide retrospective classification accuracy of about 87% (96%). The use of two hidden layers resulted in a significant improvement, with a maximal classification accuracy of 100%, both with and without melanoma subclassification. For the two hidden layer case, the larger the size of the hidden layers, the smaller the error after training. Convergence time, however, tends to increase as a function of hidden layer size. These trends were not notable in the case of the single hidden layer networks, where results did not seem to improve with additional hidden units. Thus, the performance of a neural net of one hidden layer is slightly superior to that of the traditional classifier (discriminant analysis), but performance of two hidden layer networks is markedly superior for retrospective classification. The next set of experiments was designed to estimate how well backpropagation networks can recognize cases not included in the training set. First, 10 randomly selected cases were excluded from the training set. Two network configurations, one consisting of a network with two hidden layers of 20 units each and another consisting of one hidden layer of four units, were then trained for 5000 iterations, as previously described, to recognize the 128 remaining cases. This procedure was repeated 20 times, with a different set of 10 cases excluded each time. The results, aggregated over the 20 runs, are shown in Table 2. During training of the networks, the error decreased asymptotically and the percentage of correct classifications of cases included in the training set increased, reaching mean values of 86.0% (95.9%) and 95.0% (99.0%) for the one and two hidden layer networks, respectively, after 5000 iterations. In con-
rule with a momentum term, where Awij(n) = k l ( t i - y~)y, + k 2 A w # ( n - 1), whereas the networks with hidden layers used the backpropagation algorithm. The OE/Ow,~'s were accumulated over each iteration of the data set, and the constants kl -- 0.025 and k2 = 0.9 were used in equation (7). Link weights were initialized to random values between - 0.5 and + ().5. Both procedures used the sigmoidal activation function defined in equation (1). Each teaching vector had one active element (t~ - 0.8) in the position corresponding to the actual case classification, and other teaching vector elements were inactive (t, = 0.2, i ¢ j ) . Each element of the feature vector was normalized to range between - 1 and + 1. The values of these elements were then treated as the excitations of the individual units at the input level of the network. Each layer of the networks contained a single unit whose output was fixed at unity, thus providing a bias for units on the next higher layer. The sizes of the hidden layers were varied from 4 to 20 units (excluding the bias unit). Five thousand iterations of the training set were run for each configuration, The network performance was judged by the magnitude of the error, the percentage of correct classifications, and the convergence time, which was defined as the number of iterations required until the error was within 5% of the ultimate error. In the first set of experiments, the entire set of 138 tumors was used in training the network, and retrospective classification accuracy was determined. Classification accuracy was measured both in terms of correct classifications into all four tumor classes (which includes subclassification of melanomas into spindle cell and mixed/epitheloid categories) and into the three major tumor types (malignant melanoma, metastatic carcinoma, hemangioma), without consideration of melanoma subclassification. In the following paragraphs and tables, percentage classification accuracies with and without melanoma subclassification are reported, with the latter enclosed in parentheses. For a network with no hidden layers, the convergence time, T, was approximately 200 iterations. The
TABLE 1 Comparative Performance of Neural Networks for Retrospective Classification.
One Hidden Layer
Hidden layer size
Error
4 8 12 16 20
6.4 6.1 6.0 6.3 6.0
% 87.0 87.0 83.3 86.2 84.1
(96.4) (96.4) (95.7) (95.7) (95.7)
Two Hidden Layers T
Error
1800 3500 3500 2800 3400
8.0 3.1 4.2 1.6 0.7
% 71.0 96.2 90.6 98.5 100
(95.7) (97.8) (98.6) (100) (100)
% = percent correct classifications with and (without) melanoma subclassification. T = number of passes for convergence.
T 1800 4800 4000 >5000 >5000
600
R. t t
Sih,ermat; m d .~,.
,~. \~oct:,,'i
TABLE 2 Comparative Classification Accuracy for Cases Included and Excluded from the Training Set in One and Two Hidden Layer Networks. Percent Correctly Classified*
One Hidden Layer Iterations
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
T.S. 21.6 73.4 80.2 83.3 85.3 85.7 85.8 85.8 85,9 85.9 86.0
(32.9) (88.5) (93.0) (94.9) (95.5) (95.7) (95.7) (95.7) (95.8) (95.8) (95.9)
Two Hidden Layers
U.K. 26.0 71.1 76,0 76.4 78.3 76.9 76.0 76.0 76.5 76.6 76.9
T.S
(31.5) (85.3) (90.4) (90.9) (90.4) (89.5) (89.5) (89.8) (90.5) {90.6) (90.9)
26.2 78.1 84.1 84.7 86.3 87.5 89.3 90.8 92.2 93.5 95.0
(40.5) (89.9) (94.2) (95.5) (96.0) (96.4) (97,4) (98.0) (98,3) (98.6) (99.0)
UK 30.0 72.6 72.7 73.1 72.5 72.2 73.1 72,9 71.2 71.7 71~1
(45.0) (87.1) (86.6) (88.0) (87.6) (87.7) (88.6) (89.1) (87.8~ (87.3) (87.5)
*Percent correct classificationswith ana without) melanoma subclassification
T.S. = Cases included in training set. U K :~ Cases treated as unknowns.
trast, the percentage of correct classifications of cases excluded from the training set reached a maximum of 78.3% (90.9%) after about 2000 (1500) iterauons with one hidden layer, and 73. t% (89.1%) after about 1500 (3500) iterations with two hidden layers. The above results show that the networks arc highly proficient at learning to recognize individual cases in a training set of modest size (in relation to the number of network connections), but somewhat less successful at recognizing cases not previously encountered. Discriminant analysis was then applied to the same sets as were used above. The discriminant analyses produced a mean of 79.7% (89.1%) correct classifications for cases included in the training set (as compared to 86% (96%) and 95% (99%) for neural networks with one and two hidden layers, respectively) and 72.0% (84.5%) correct classifications for excluded cases (as compared to 78% (91%) and 73%. (89%) for neural networks with one and two hidden layers). These figures suggest that. at least under the conditions of this experiment, neural networks arc definitively superior to discriminant analysis in retrospective accuracy. The networks, particularly thc single hidden layer network, also appear to be somewhat superior to discriminant analysis in terms of recognition of unknown cases. An interesting observation is that predictive accuracy seems to plateau or decrease after about 2000 passes, whereas retrospective accuracy increases indefinitely. Additionally, predictive accuracy, unlike retrospective accuracy, is higher in the single hidden layer network than in the two hidden layer network. These observations suggest that as retrospective accuracy increases, the network becomes more and more like a nearest neighbor classifier, in that the
individual cases in the retrospecuve data base arc being ' m e m o r i z e d . " This specificity is obtained at the sacrifice of generality as the network becomes less successful in terms of correct classification of unknowns. These results are consistent with the observation that generalization by feed-forward neural networks decreases with excess hidden units (Anshelevich. Amirikian. Lukashin. & Frank-Kamenetskii. 19891. Combined Localization and Classification Determination of tumor power spectra requires the intervention oi a well-trained technician who would manipulate a cursor on the displayed tumor tmage and define a region of interest by placing an analysis box in the central tumor. Tumor dimensions could be determined by cursor positioning at appropriate positions in the displayed image as well. The neural network's ability to localize a tumor m an ultrasonic image permitted automated determination of the tumor's boundaries, measurement of its dimensions, and. ultimately, characterization oi its power spectrum. First. as previously described. the position of a tumor in a scan was determined by finding the average x and y coordinates in the image output by the network after thresholding and weighting by pixel intensity. In rare instances, the cursor initially fell just outside the tumor boundary due to the presence of superthreshold pixels at positions far from the tumor location. This condition was easily detected if the pixel intensity at the cursor location m the thresholded image was zero, In this case, the threshold was increased and the cursor repositioned. This technique always succeeded in finding the tumor.
Image Processing and Pattern Recognition
601
FIGURE 5. Neural network output used for automatic selection of a region in the central tumor (highlighted area) for spectrum analysis. The normalized power spectrum, plotting amplitude (dB) as a function of frequency (MHz) is shown at right.
Following automatic localization of the tumor, the tumor's boundaries were defined by determining the boundaries of the region of active pixels surrounding the cursor. Based on geometric considerations, the tumor dimensions both along major and minor axes were then computed. Finally, the central half of the active region was selected for spectral analysis. (Tumor boundaries, which represent specular reflectors, should not be included when determining backscatter properties.) The boundaries defined for spectral analysis were used to access radiofrequency data from the equivalent scan region. Figure 5 shows an example of automatic selection of an analysis region in an ocular tumor and the normalized power spectrum of this region. The linear best fit of the normalized power spectrum provided values for spectral slope, intercept,
and statistical error. These parameters, in addition to the tumor base and height dimensions determined from the localization network's output, then acted as input to the single-layer classification network. The output unit of the classification network with the highest activity was then considered to correspond to the most probable tumor classification. Table 3 provides a statistical comparison of the tumor height and base dimensions and the spectral slope, intercept, and statistical error of the intercept as determined manually versus the output of the neural network. The statistic used for this comparison was the paired Student's t test. For tumor dimensions, no statistically significant differences were found between the manual and automatically determined measurements. The correlation coefficients between manual and automated measurements were .52 and .57 for height and base, respectively. For the three
TABLE 3 Statistical Comparison of Manually and Automatically Determined Tumor Dimensions and Spectral Parameters (N = 10). Parameter Height Base Slope Intercept Intercept Error
Manual Mean
Network Mean
,5
Std. Err. ~
T
P
R
4.12 10.25 .179 -58.30
3.86 10.72 .183 -59.28
.259 - .46 -.004 .98
.140 .700 .037 .631
1.86 - .67 -.11 1.55
.096 .522 .916 .156
.520 .569 .969 .978
2.268
2.295
- .027
.029
- 1.41
.193
.899
602
R. H. Sih, erm,;~ , m a 4
spectral parameters, no statistically significant differences were found between manually determined and automated measurements. The correlation coefficients for all three parameters were .9 or more. Eight out of the l0 cases in the tumor localization reference set were given the same classification by the network (using the automatically generated tumor spectra and dimensions) as was determined using discriminant analysis (using the manually determined tumor analysis regions). Differences in classification are due to (1) differences between the manually and automatically selected regions of interest, which are used to generate spectra, (2) differences in tumor measurements, and (3) the class i f i c a t i o n m e t h o d ( i . e . . n e u r a l n e t w o r k versus discriminant analysis). Because none of the l0 cases in this set were histologically confirmed, the actual accuracy of either method for this set ts not known. In summary, the results of this analysis were: 1. Creation of an image with enhanced activity at positions corresponding to tumor location. 2. Localization of the tumor by thresholding of the network output. 3. Automated measurement of the tumor's base and height dimensions. 4. Automated selection of a central region m the tumor for spectral analysis. 5. Classification of the tumor based on the properties given in (3) and 141 by use of a classification network. CONCLUSION In this report, we have described use of neural networks to localize and analyze ultrasonic images. It was demonstrated thai a filter can be defined by application of the backpropaganon algorithm, where input fields are subfields of the image (at various scales and locations) and the appropriate output is defined by the programmer. From this procedure emerged a network capable of recognizing tumors at various scales, orientations, and locations. In addition, one and two hidden laver networks were defined for classification of tumors based on their dimensmns and ultrasonic backscatter spectra. Although two hidden layers provided nearly perfecl retrospective classification, the predictive accuracy of the single hidden laver network was found to be superior to either the two hidden layer network or linear discriminant analysis. Combination of the tumor localization and classification networks provided a means for automatic localization and diagnosis of ocular tumors. Although training of networks is highly computationally intensive ( 100 passes over the image training set consumed the equivalent of approximately 1
5. ~N~el;'e't
Cray-XMP CPU hour), once trained, onl~ a slngtc forward pass through the network is required to define output. For the case of image processing, this may still be fairly expensive because the number ot image locations that must be individually processcd is very large. In a 1024 x 1024 ima~c, for Instance there are over one million positions for whk'h the network must define output values. However the link weights, once defined b~ supercomputer pr~,cessing, could be implemented in hardware which could then process image locations m parallel. Using chips designed for efficienl, parallel neural cell computations, one can foresee real-rune m3agc recognition in a variety of fields.
REFERENCES Anshelevich. V \, Amirikian. B. R.. Luka~,hm. A. V.. Frank~ Kamenetskii. M. D. 119891. On tile abdit~ of neural network~ to perform generalization by induction Biofoe, ieal Cvbernctic.~. 61. 125-128 Ballard. D. H. & Brown. (. M [1982). ( {,m?uwt vision Lrml,.'wood Cliffs. "4.1- Prentice-Halt ¢ arpcnter. (i \ Grossberg, S e,. Mcl Lm,m. t . 1989). h> variant recogmtion of cluttered scenes by a self-orgamzintt ART architecture C O R T - X boundar~ ~e~mcntation. Neural .~'ctwork.~. 2. 11,9--18] ( ' o l e m a n . D. ~ & l.izzt. P I t 1~J83~ c . , m p a t e r i z e d u h r a s o n t c tissue characterization el ocular tumors American Journal ,>; OphUmhnology. % . 166 !75 ('olcman. D..1 . {.izzi. F. L Silvcrman ~.t H.. Rondeau. M. !.. Smith. M. F & Torpcy I. [I , 1992i \coustic hiopsy as a nleans lot characterization of lntraocutar minors. In P. Henkind IIEd. l. 4('1]4 .¥XIV [nternationo[ (_?m~,,ress ~2! Ophthalmolog~ tPP i I<_118) Philadelphia: ' B tAppincott ('oolev. W. W.. & Lt)hnes. P R ( I t173 ,. %hdtlvariate data atmlwi~. New York: Wiley. Duda R.. & Hart P. E. 11973J f~atter#: I,lfstttcattott atta see#to anah,sls. No~ York: Wile',. Huang. W. Y.. & [,ippman. R. P. 119881 Neural net and traditional classifier,,, in D Z Anderson t Ld. i. Neuralinformation nrocesstng systemv Ipp. 387-396/. N~:~. York: American I n ,,mute of Physics Press. Jaeobs. R. A t19881. Increased rates ox convergence through learning rate adaptalion. Neural %'eln:.rk.~. 1. 295-~)7 Kohonen. T 1t984i Selt=or,~anizatton and associative rtlemola" 12nd cal.3 Bertm: Springer-Verlag Kremkau. F. W. ~1989) Diagnostic ultras~mnd: Principles. msrruments and exercises. Philadelphia Saunders. l.lzlt. E I .. Pelcpga. E..I ~ Coleman. D J II986i. t.;ttrasonic ocular tissue characterization. In J, F I ;reenleaf lEd. i. "ri.,.su<, characterization with ultrasound, V¢,~ l/ (pp. 41-60) Boca Raton_ F L CRC Pres~. Lizzl. F. L Greenbaum. M.. Feleppa, E . . I . . Elbaum. M.. & Coleman. D. J (1983). Theoretical framework for spectra! analysis in ultrasonic tissue characterization Journal o f the Acoustic Society of America 73. 1366-1373, Mantas, J. 19971 Methodologies in pattern recognition and image analysis. Pattern Recognition. 20. 1-6 Rosenfe[d. A.. & Kak. A C (19821. l)t~ital pwture processing. (2nd ed.1 New York: Academic Pres~ Rumelharl. D E Hinton. G. E.. & Williams. R. J. (198~L Learning representations by hack-propagating errors. Nature. 323. 533-536
Image Processing and Pattern Recognition Stone, G. O. (1986). An analysis of the delta rule and the learning of statistical associations. In D. E. Rumelhart and J. L. McClelland (Eds.), Parallel distributed processing (pp. 444459). Cambridge, MA: MIT Press. Vogl, 3". P., Mangis, J. K., Riglcr, A. K., Zink, W. T., & Alkon, D. L. (1988). Accelerating the convergence of the back-propagation method. Biological ()'bernetics, 59, 257-263. Weszka, J. S.. Dyer, C. R., & Rosenfeld, A, (1976). A comparative study of texturc measures for terrain classification. IEEE Transactions on Sy~'tern,~, Man, and @bernetics, SMC-6,269285.
NOMENCLATURE y~ output of unit ui x, input to unit u,
603 wij E~.,, y,,, t~., k~ k, n D, v~.;
link weight from ui to u i error at image coordinate x,y network output at image coordinate x,y teaching value at image coordinate x,y momentum constant learning rate constant iteration discriminant function value for function i discriminant coefficient for feature k, function i ,t'k value of feature k t, teaching value for unit u i T convergence time p statistical probability