0031-3203/90 $3.00 + .00 Pergamon Press plc © 1990 Pattern Recognition Society
Pattern Recognition, Vol. 23, No. 6, pp. 563-568, 1990 Printed in Great Britain
A NEURAL NETWORK APPROACH TO ROBUST SHAPE CLASSIFICATION LALIT GUPTA, MOHAMMAD R. SAYEH and RAVI TAMMANA Department of Electrical Engineering, Southern Illinois University at Carbondale, Carbondale, IL 62901-6603, U.S.A. (Received 23 March 1989; received for publication 21 June 1989) Abstract--A neural network approach for the classification of closed planar shapes is described. The primary loci are the development of an effective representation for planar shapes which may be used in conjunction with neural nets, the selection of a suitable neural network structure, and determining training methods to increase the degree of robustness in classification. A three layer perceptron using backpropagation is initially trained with contour sequences of noisefree reference shapes and its generalization capability is demonstrated. The network is then gradually retrained with increasingly noisy data to improve the robustness of the classifier. The advantages and improvement in robustness using this extended training scheme are shown and typical classification results are presented.
Neural networks Robust classification
Planar shapes
Representation
l. INTRODUCTION Many applications of shape recognition require an object to be classified in varying positions, orientations, and dimensions in the image plane. Additionally, the classifier is required to be robust (tolerant to random variations in the shape). Such robust classifiers prove to be useful in many military, biomedical, and industrial applications. Examples include aircraft classification, characterization of biomedical images, and the identification of industrial parts by robots for product assembly. Although several robust classifiers have been developed, an improvement can be made in the degree of noise that the classifiers can tolerate. An improvement in robustness will have a dramatic impact in advancing the performance of the classifiers used for the applications described above. This paper focuses on the neural network approach for providing the desired improved performance.
Invariant classification
presented to a neural network is an important factor in the design of neural net shape classifiers. For any neural network, the input and thus the representation, must be in a vector form. Clearly, this vector representation should carry essential shape information and must also be normalized with respect to basic shape transformations such as translation, rotation, and scaling. For closed planar shapes, contour sequence representations have the desired format and properties.~s-l°~ A contour sequence is an ordered sequence that represents the Euclidean distance between the centroid and all the boundary pixels of the shape ll°'l 1). If the contour (boundary) of a closed planar shape in the discrete plane is described by C = {c i = ( x i , Y i ) l i = 1,2 . . . . . N},
where (xi, yl), i = 1, 2,..., N are the coordinates of the N contour pixels ordered in a clockwise sense, then, vi = d(ci, cg),
i = 1, 2 . . . . . N
is the contour sequence obtained by computing the Euclidean distance d(.,.) between each of the N contour pixels and the centroid cg = (xc, Yc) of the shape. From the requirement of contour closure, v~ is a circular sequence. Therefore, shape rotation corresponds to a circular shift in the ordering of the contour sequence. Normalization with respect to rotation is achieved by selecting the pixel on the major principal axis as the start point on the shape contour. Contour sequences are translation invariant because the distance between the centroid and the boundary is invariant to shape translation in the image plane. Scaling a shape results in the scaling of ,the samples and the duration of the contour sequence.
2. NEURALNETWORK APPROACH Interest in neural networks is rapidly growing and recently, several neural network models have been proposed for classification problems, tl-9~ These models are composed of highly parallel building blocks that are interconnected to construct highly complex systems. Unlike traditional classifiers which tend to test competing hypotheses sequentially, neural net classifiers test the competing hypotheses in parallel, thus providing high computational rates. Additional advantages include robustness and the ability to adapt or learn. Deciding how an object is 563
564
LALIT GUPTA
Therefore, scale normalization involves both amplitude and duration normalization. Amplitudes are normalized by dividing the samples by the standard deviation of the contour sequences. An appropriate curve is fit to the contour sequence and a uniform resampling operation is performed to obtain the desired normalized duration. Details of these normalization procedures are described in references (8, 10).
et
al.
ation of the backpropagation training algorithm for this network is briefly described below.
Input: Normalized contour sequence of class r ui,,,
i = 1,2 . . . . . N,
Desired net output: {:
3. THE NEURAL NET CLASSIFIER The samples of the contour sequence are continuous valued, hence, the input to the neural net is continuous valued. The classification problem being considered requires supervised training. That is, the classifier requires class labels to be specified during training. Among many different neural net classifiers,"~ a multilayer perceptron satisfies the above requirements. In particular, a three layer backpropagation network with N neurons in the first hidden layer, H neurons in the second hidden layer, and M neurons in the output layer was selected (see Fig. 1). Here, N is the duration of the scale normalized contour sequences, M is the number of shape classes, and H is an appropriately selected number (the mean of the number of neurons in the first hidden layer and output layer for the pyramidal structure used). The network is fully connected between adjacent layers. The oper-
U
r = 1,2,...,M.
d,=
forclass=r for c l a s s # r '
r = l , 2 . . . . . M.
Actual node outputs (forward propagation): First hidden layer:
ujl = f
wi,jui,r
,
1 < j < N.
i
Second hidden layer: u2 = f
wj.ku~ ,
_
_
--
--
J Output layer: Ym =
Wk,mUk
,
!
I and w~,m are Here, f(a) = 1/(1 + e-"), and w~,j, wj.k, the weight connection matrices between the input and
1
1 •
u 1,r U
2,r U
3,r U
4,r
INPUT
~.OUTPUT
U
N,r U
1 N
SECOND HIDDEN
LAYER
FIRST HIDDEN LAYER Fig. 1. Neural network architecture.
Robust shape classification
565
the first hidden layer, first and second hidden layers, and the second hidden layer and the output layer respectively.
Weight adaptation (backpropagation): wL,(t + 1) = wL.(t) + with
Sl
,7,5.u~,
6 m = y,.(1 - y.)(d. - ym).
Sl
S3
14
Fig. 2. Reference shapes.
r/is a gain term and 6,. is the error term for node m in the output layer. Similarly,
5. SIMULATIONS
w).k(t + 1) = wJ.k(t ) + ~lfikuj, with
6 k = u 2 ( l - - U k2)
k2m •
wl.j(t + 1) = wi.j(t) + rlfju~,,, with
~j = u~(1 - ul)
~kWJ.k . =!
fig and 6j are the error terms for nodes k and j in the second and first hidden layer respectively. All interconnection weights are initialized to small random values with zero mean and the normalized contour sequences belonging to the M reference shapes are presented cyclically until the net converges to the desired outputs. In the testing stage, the normalized contour sequence of the test shape is presented to the network and the test shape is assigned to the class of the output that yields the maximum value.
4. E X T E N D E D T R A I N I N G
In order to improve the performance of the above neural network, the network could be trained with noisy data. However, training methods that incorporate noisy data tend to have convergence problems due to the large amount of data that is used during training. That is, the connection weights have to be adapted to produce the desired network response for an input set that has a large variance in each shape class. Even if the network does converge, the time required for convergence may be too prohibitive for practical purposes. To aid network convergence when noisy representative data is used, the network is initially trained with the noisefree reference shapes (as described in Section 3). The network is initialized with the weights obtained with noisefree training and then retrained with noisy shapes that are generated by adding a small degree of noise to the reference shapes. Because the network is already trained to respond correctly to the reference shapes, it converges easily with respect to the new training data which contains only small variations. The degree of noise added to the reference shapes is gradually increased and the network is retrained starting with the weights obtained in the previous stage. This process is repeated until a reasonable degree of noise has been used for network training.
The performance of the proposed neural net classifier is demonstrated by considering the classification of shapes belonging to four classes. The reference shapes are shown in Fig. 2. These shapes are digitized in 32 by 32 image plane. A series of tests were conducted to obtain classification results for both noisefree and extended training.
Noisefree training From each reference shape, the reference contour sequence is extracted while the boundary is tracked in a clockwise direction. The reference contour sequences are amplitude normalized and duration normalized to have 48 samples (half of the mean duration of the four reference contour sequences). The three layer perceptron selected had 48 neurons in the first hidden layer, 26 neurons in the second hidden layer, and 4 neurons in the output layer. Using these normalized contour sequences in conjunction with the noisefree training scheme described in Section 3, the neural net is trained to recognize the four reference shapes. From each reference shape, a noisy test set is generated by adding various degrees of random noise to all the contour pixels of the shape. Each contour pixel is assigned a probability p of retaining its original coordinates in the image plane and probability q = (1 - p ) of being randomly assigned the coordinates of one of its eight neighboring pixels. The degree of noise is increased by increasing the noise level q. Noise may be further increased by repeating the process several times. Introducing noise through this procedure distorts the contour and thus changes the amplitudes, duration, and the overall shape of contour sequence. Typical noisy shapes with q = 0.5 are shown in Fig. 3. For the low resolution image plane used, it can be seen that q = 0.5 introduces considerable distortion. Using the above noise model, 100 test shapes were generated from each reference shape to give a total of 400 noisy shapes to be tested at each noise level q. The number of shapes misclassified
Fig. 3. Typical noisy shapes with q = 0.5.
566
LALIT GUPTA et al.
divided by the total number of shapes tested gives the probability of misclassification which is used as a measure of performance of the classifier. The probability of misclassification is specified as a function of. the test noise level q. It should be noted that the results obtained from noisefree training demonstrate the generalization capability of the neural net classifier.
special case of extended training with qt = 0. Therefore, the performance for noisefree training is given by the q+ = 0 curve. Figure 4 also clearly demonstrates the improvement in performance with extended training. The probability of misclassification decreases at each test noise level when the training noise level is increased from 0 to 0.3. However, the performance at training noise levels above q, = 0.3 starts deteriorating, as indicated by the qt = 0.4 curve. This is due to the small number of representative samples used during training. That is, the network weights have not adapted sufficiently to the wide degree of shape variations at higher noise levels. The performance at higher noise levels is improved by increasing the number of training samples Nr seen in Fig. 5. In Fig. 5, the performance is presented as a function of the test noise level for different values of Nv For comparison, the qt = 0.3 curve from Fig. 4 is also shown in Fig. 5.
E x t e n d e d training
The network is initialized with the weights obtained with noisefree training. Ten noisy training shapes per class were generated using a small noise level qr = O. 1. qt denotes the noise level used during training whereas q denotes the degree of noise in the test shapes. The network is retrained with the 40 low noise level shapes and the performance is evaluated for test shapes with various values of q. The network is then initialized with the weights obtained with qt = 0.1, retrained with 40 shapes generated with qt = 0.2, and re-tested with different q values. The process of re-initialization, re-training, and re-testing is repeated for qt = 0.3, and 0.4.
7. CONCLUSIONS
It is clear that the three layer perceptron using backpropagation training and contour sequences yields low misclassification even for the noisefre¢ training case. Random tests indicate that there is a great difficulty in visually discriminating between the four shapes when test noise levels above q = 0.4 are used in a 32 by 32 image plane. The degree of
6. DISCUSSION
In Fig. 4, the probability of misclassification is plotted as a function of the test noise level for various training noise levels. Note that noisefree training is a
0.25 qt" 0.0 / /
0.20
F qt" 0.1
t+ / J
¥,
/
I
/
I
i0.18
i /
i
~" /
O.OS
/
i
I /*
.,.
.-
.,d
I
• qt i 0.4.
/
/ /
,a qt i, 0.3
f
..
d ...Y I.. "" / ° " // /
I
I
I I
,/
1
o.2
z
/
,/
I
e qt
/
/ /
I/
f
/
I
/
I I
0,10
0.00 -~ 0.00
/
~v
I
.,e" ""
-
!
i
v Y
!
!
!
i
|
!
!
!
!
i
i
0.20 ~T
!
i
i
|
!
0.40 xo, s[ ~
Fig. 4.
!
!
!
!
v
!
!
!
j
|
o.eo Cq)
i
i
II
|
!
r
i
!
|
0.80
Robust shape classification
567
0.100 /Oqta 0.*, Nt" 10
/
O.OeO
/
/ / r-- qt" 0.4, Nt" 20 / //~*
i0,060
/
|
N / / / IF... e" / it" / /./ / / / / I I ii i /t" / ° ' ~ -''-~/ ... ~i/ , t "~
i 0.040
0.020
0.000
r,
O.OOO
/
//
..a
/
/
/
P qt" 0.4, N t" 4O
/
.../
,,,I"
//m. //
/.,t..qt e o.3, N ts 10
/
J
/
0.200
0.400 ~E ~
0.800
0.800
C~
Fig. 5.
robustness is dramatically improved through the use of extended training. Network convergence is also enhanced by the systematic inclusion of noise by small increments in the training data. Although this paper focused on shape classification, the neural net approach described may also be used in the problems of waveform and target signature discrimination. In general, the principle of extended training is applicable in classification problems where the noise model is known.
and the generalization capability of the neural network is demonstrated by training the network with noisefree shapes. To improve the degree of robustness in classification, an extended training scheme is developed in which the network is gradually retrained sequentially with shapes containing small increments of noise. The improvement in robustness through the use of this extended training is demonstrated and typical results of a four class classification problem are described.
SUMMARY
REFERENCES
This paper describes a neural network approach for classifying noisy shapes in varying positions, orientations, and dimensions in the image plane. The primary issues focused upon are the development of an effective representation for shapes which may be used in conjunction with neural networks, the selection of a suitable neural network structure, and determining training methods to increase the degree of robustness in classification. For planar shapes, it is shown that contour sequences have the desired properties and format required for neural network inputs. It is also shown that among many different neural network models, a multilayer perceptron using backpropagation training is an appropriate choice for the classification problem considered. The backpropagation training algorithm is briefly described, PR 23:6-C
1. R. P. Lippman, An introduction to computing with neural nets, IEEE ASSP Magazine (April 1987). 2. E. Gullichsen and E. Chang, Pattern classification by neural network: An experimental system for icon recognition, IEEE First Ann. Conf. Neural Networks, San Diego, California (June 21-24, 1987). 3. S. E. Troxel, S. K. Rogers, M. Kabrisky, The use of neural networks in PSRI target recognition, IEEE Int. Conf. Neural Networks, San Diego, California (July 2427, 1988). 4. R. P. Gorman and T. J. Sejnowski, Analysis of hidden units in a layered network trained to classify sonar targets, Neural Networks 1, 75-89 (1988). 5. K. Fukushima, Neocognitron: A hierarchical neural network capable of visual pattern recognition, Neural Networks 1, 119-130 (1988). 6. M. R. Sayeh and J. Y. Han, Pattern recognition using a neural network, Advances in Intelligent Robotics Systems, SPIE's Cambridge Syrup. Optical and Optoelectronic Engineering, Cambridge, MA (1987).
568
LALITGUPTAet al.
7. J. Y. Han, M. R. Sayeh and J. Zhang, Convergence and limit points of neural network and its applications in pattern recognition, IEEE Trans. Syst. Man. Cybern. 19, 1217-1222 (1989). 8. L. Gupta and M. R. Sayeh, Neural networks for planar shape classification, IEEE Int. Conf. Acoustics, Speech, and Signal Processing, New York (April 11-14, 1988). 9. L. Gupta, M. R. Sayeh and R. Tammana, Training neural nets for robust shape classification, 27th Ann.
Allerton Conf. on Communication Control and Computing, pp. 1060-1061 (27-29 September 1989). 10. L. Gupta and M. D. Srinath, Invariant planar shape recognition using dynamic alignment, Pattern Recognition, 21, 235-239 (1988). 11. L. Gupta and M. D. Srinath, Contour sequence moments for the classification of closed planar shapes, Pattern Recognition, 20, 267-272 (1987).
About the Autbor--LALIT GUPTAreceived his B.E. (Hons) degree in electrical engineering from the Bida Institute of Technology and Science, Pilani, India (1976), his M.S. degree in digital systems from Brunel University, Middlesex, England (1981), and his Ph.D. degree in electrical engineering from Southern Methodist University, Dallas, Texas (1986). Since 1986, he has been with the Department of Electrical Engineering, Southern Illinois University at Carbondale, where he is currently an Assistant Professor. His research interests include computer vision, pattern recognition, neural networks and digital signal processing. About the AUtbor--MoHAMMAD R. SAVEtt received his B.S., M.E., and Ph.D. from Oklahoma State University, in 1981, 1982, and 1985, respectively. He has been with Southern Illinois University at Carbondale since 1986, where he is currently an Assistant Professor of electrical engineering. Presently, he is working on the design of associative memories and optical pattern recognition using neural networks. Dr Sayeh is a member of the IEEE Computer Society, International Neural Network Society and SPIE. About the Author--RAvx TAMMANAreceived his B.E. degree in electronics and communication engineering from Osmania University, Hyderabad, India, in June 1985, and his M.S. degree in electrical engineering from Southern Illinois University at Carbondale, in 1989. He is currently working on a Ph.D. degree in Engineering Sciences at Southern Illinois University at Carbondale. His research interests are in the areas of computer vision, pattern recognition, and neural networks.