Mathl. Comput. Modelling Vol. 25, No. 12, pp. 105-117,
1997 Copyright@1997 Elsevier Science Ltd Printed in Great Britain. All rights reserved 08957177/97 $17.00 + 0.00
PII:SO8957177(97)00098-8
Object Recognition Using a Neural Network with Optimal Feature Extraction JIANN-DER Department Chang
Gung College Tao-Yuan,
LEE
of Electrical
Engineering
of Medicine Taiwan
and Technology
333, R.O.C.
jdleeQcguaplo.cgu.edu.tw
(Received February 1995; revised and accepted March 1996)
Abstract-In this paper, a neural network using an optimal linear feature extraction scheme is proposed to recognize two-dimensional objects in an industrial environment. This approach consists of two stages. First, the procedures of determining the coefficients of normalized rapid descrip tor (NBD) of unknown 2-D objects from their boundary are described. To speed up the learning process of the neural network, an optimal linear feature extraction technique is used to extract the principal components of these NRD coefficients. Then, these reduced components are utilized to train a feedforward neural network for object recognition. We compare recognition performance, network sizes, and training time for networks trained with both reduced and unreduced data. The experimental results show that a significant reduction in training time can be achieved without a sacrifice in classifier accuracy. Keywords-Neural
network, PCA,
Feature extraction,
Image processing.
1. INTRODUCTION Computer vision has been widely applied to industry automation such as parts inspection and object recognition. With advancement in manufacturing technology, production has become more automated and speedier than ever before. For real time applications, the inspection speed and accuracy are major concerns in object inspection. To meet this requirement, boundary analysis and classification in the field of computer vision attracts a great deal of attention from researchers. The principal advantages of boundary analysis methods are their speed, economy, and precision (although not necessarily their accuracy). Generally, there are two broad classes of techniques in two-dimensional boundary analysis: global statistical approaches and syntactic approaches. Numerous applications in a variety of domains can be found in literature, such as invariant moments [l], Hough transformation [2,3], normalized rapid descriptor [4], normalized Fourier descriptors [5,6], and the circular autogressive (CAR) models [7,8]. In general, machine vision systems have separated the object recognition task into two independent subtasks: feature extraction followed by recognition. The feature extraction task begins with an object and a procedure for extracting significant features. The features are chosen based on the consideration that they are to be invariant to the object’s position, scale, and orientations. The output of this task, which is a vector of feature values, is then passed to the recognition subtask. The recognizer then determines which are the distinguishing features of each object class. Recently, some distortion invariant object recognition systems have been developed which This research is supported
by the National Science Council, R.O.C.
105
under Grant NSC83-0117-C182-OOlE.
106
J.-D. LEE
combine the tasks into a single system. The advantage of this approach is that two subtasks can share the information and improve the recogniser’s separating ability by extracting the useful features. On the other hand, the drawback is that the system requires a longer training period since it has no prior information about the relationship between the set of training patterns. Object recognition systems based on neural networks are an example of this approach. In this case, the selection of features must be based on the criterion that they should be most useful in most situations, independent of what exactly the desired input-output relationship will be. The quality of a set of features can be determined by information-theoretic measures. Good features will reduce dimensionality with only a minimal loss of information [9,10]. Baaed on thii requirement, principal component analysis (PCA) has optimal properties among many linear feature extraction methods and then is utilized to extract the dominant features from the NRD coefficients of each object shape. Next, these dominant features (i.e., compressed data) are used to train a feedforward neural network (multilayer perceptron, or MLP) for object recognition. To investigate the performance of the system, we compare recognition performance, network sizes, and training time for networks trained with both compressed and uncompressed data. The flowchart of the proposed pattern classification system is shown in Figure 1.
r-7
Boundaryextraction __._ _,.___.._..._ .._
/ Object recognition 1 Figure 1. The flowchart of the proposed
object recognition
system.
The remainder of the paper is organized as follows: Section 2 illustrates how to determine the NRD coefficients of unknown 2-D objects from their boundary. Section 3 describes the feature compression algorithm using principal component analysis (PCA). Section 4 presents the structure of the neural network for object recognition using multilayer perceptron. Section 5 shows the experimental results with a testing set of fifteen two-dimensional objects. The conclusion is included in the final section.
2. ALGORITHM
TO DERIVE
THE NRD
COEFFICIENTS
For the purpose of recognition, many approaches have been proposed to extract the features of an unknown object shape [l-7]. Among them, normalized rapid descriptor (NRD) [4] has the advantage of being scale, rotation, and translation-invariant, and becomes a useful tool to represent the characteristic of an object shape. Generally, the implementation of NRD needs large memory space since it must use each boundary point of an object. This may slow down the
Object Recognition
107
system’s performance. Therefore, we propose a new sampling method using the scale information to derive the desired NRD coefficients for an input object. The detail of this sampling method is described below. Step 1. For each point Pi(zi, yi) on the contour, define a distance function d[i] 8s d[I] = [(Xi - W)2 + (Yl - Yd2] 1’2 ; d[i] = [(Zi - zi_l)2
+ (yi - y&l)2]1’2
(1)
)
(2)
where i = 2,. . . , N and N is the total number of contour points. Step 2. Find the perimeter L of the extracted contour by L = 2
d[i].
i=l
Step 3. Sample the object’s contour with scale factor L/IV1 as the sampling rate, where Ni = 2m, m is an integer. There are Ni sample data (in our case, i’Vr = 128 and m = 7). Step 4. For each sampled datum, find the distance D[i] from the sampled point Pi to the centroid (20, yo) , where Nl xc = c xi,
Yo = 2Yi
i=l
i=l
and i=l,...,h+.
D[i] = [(xci - xo)2 + (yi - yo)2]1’2,
Step 5. Use Nr sampling data to determine the corresponding NRD coefficients for each object shape. The NRD formula is shown ss m N1/2 c c Dj(2 * i - l] = @-l[i] j=l i=l @2*i]
+ @-’
=ABS{D~-l[i]-D~-l
i+ $ [
1
(3)
[i++]}.
Using the above procedure, we can obtain 128 NRD coefficients for each input object. These data are called the uncompressed features of each object. In the next section, we will illustrate how to compress these NRD features using principal component analysis (PCA) to obtain the dominant features which are the actual input to the neural network classifier.
3. OPTIMAL FEATURE EXTRACTION USING PRINCIPAL COMPONENT ANALYSIS As one knows, in machine vision, the primitive features, such as pixels, corner point, curvature, etc., are often used in the application of neural networks to problems in pattern recognition. However, the feature space corresponding to the raw data is of such large dimension that the memory required is large, learning speed is slow, hardware implementation is expensive, and the required number of training examples is too great. This problem can be solved by compressing the input data with a low dimensional set of features. This compression can reduce the dimension of the input space and shorten training times. Here, we employ one neural network implementation of principal component analysis (PCA) to reduce the dimension of NRD features which are computed in the previous section. flm25:11*
108
J.-D.
LEE
As described in Ill], high dimensional data
PCA is a linear, orthogonal transformation for extracting features from distributions. It transforms the source distribution into a coordinate system in which the coordinates are uncorrelated and maximal amount of variance of the original distribution is concentrated on only a small number of coordinates. In this transformed space, the number of variables can be reduced by taking only the coordinates with significant variance and leaving out the coordinates with small variances. The basic vectors of this new coordinate system are the eigenvectors of the covariance matrix, and the variance on these coordinates are the corresponding eigenvalues. Therefore, the optimal projection from m to n dimensions derived by PCA is the subspace of the n eigenvectors with largest eigenvalues. Since the inside information of a normally distributed variable relies on its variance, maximization of the variance by PCA is equal to maximization of the amount of the information in the n variables. According
to Oja [12], a model
consisting
of a single linear
unit
with
a local,
Hebbian-type
modification rule can be used to extract the largest principal component of a stationary input vector sequence. This principal component is the single eigenvector of the covariance matrix with the largest eigenvalue. the inputs Ii weighted
From Figure 2, the output by the connection strengths
v =2
V of the processing
unit
(PU) is the sum of
wi,
(4
w&.
i=l
The PU is trained connection during
on vectors from an input each training step is:
distribution,
and the rule for the modification
of the
Awi = a(IiV - wiV2),
(5)
where a is the learning rate and IiV is the Hebbian term that makes the connection stronger when the input and the output are correlated, i.e., when they are active at the same time. The This term makes second term of equation (5), i.e., -wiV2, is used to stabilize the weights. C%, w: approach 1. After training, the unit maximizes the variance of its output subject to the analysis because constraint that Czi 20: = 1. However, this is not a full principal component the unit finds only one component, the one with the largest variance. If there are several units available for signaling and they follow the same rule and no noise is added to the outputs, then their output values will be identical in the ideal case. This is no more useful than the value of a single unit.
Therefore,
the transmitted
information
will be less than what could be achieved
by
PCA.
Figure 2. The output of a processing unit trained on a stationary
sequence of input.
To obtain more than one principal component of a distribution, many alternative methods [13,14] have been proposed to change connection strengths in a linear connectionist network. Sanger [13] has developed an extension of Oja’s algorithm which uses a projection-like process to extract as many principal components as desired. The network consists of a single layer of weights connecting the N input channels to M (< N) output nodes. At the presentation of each input pattern, the weight matrix is updated according to Aw = 0 (VIT - u (VVT) w) ,
(6)
where I is the input vector, w is the M x N matrix (with rows equal to the weight vectors), V is the vector of activity on the output layer, and /3 is the learning rate. The operation u sets
Object Recognition the upper
triangular
part
of its argument
equal to zero.
to make the weight vectors converge to unit effectively modify the input to the ith neuron ith weight vector. For a local implementation, links to the input layer and additional nodes all of the weight vectors are trained on each
109 The diagonal
terms
of VVT are used
magnitude. The off-diagonal terms (Vi,,Vj), j < i by subtracting the components (wj ~1)~~ along the this projection-like procedure requires feedback to carry out the subtraction. In their formulation, iteration. However, until the first weight vector is
close to convergence, the learning in the second and succeeding nodes is meaningless. To improve the efficiency of Sanger’s algorithm, we use a sliding-window [15] to minimize wasted computation time in learning procedures. This sliding-window maintains a window of input nodes to be trained and gradually moves forward when these nodes begin to converge. That is, initially, only the first three input
nodes are in the window.
As these nodes begin to converge,
the leading
edge of the
sliding-window is advanced to the next node in sequence. With this method, the computation time can be reduced, and the compressed features are prepared as the input of the neural network classifier described in the next section.
4. NEURAL NETWORK CLASSIFIER FOR OBJECT RECOGNITION Recently, neural network approaches to problems in the field of pattern recognition and signal processing have led to the development of various neural network classifiers using feedforward networks (16-191. They consist of associate memory networks such as bidirectional associa tive memory (BAM), Hopfield memory, and pattern recognition networks such as multilayer perceptron (MLP), neocognitron, counterpropagation network (CPN), adaptive resonance the ory (ART), etc. Here, we use pattern recognition networks to recognize the unknown objects. In general, three types of training methods can be used in the training phase for classification and clustering:
supervised
training,
unsupervised
training,
and combined
supervised/unsuper-
vised training. Classifiers trained in the supervised manner require data with extra information that specifies the correct class during training. Clustering or vector quantization algorithms use unsupervised training and group unlabeled training data into internal clusters. Classifiers that use combined supervised/unsupervised training first use unsupervised training with unlabeled data to form internal clusters. Labels are then assigned to clusters with supervised training. Since multilayer perceptrons (MLP) are widely used in a number of applications such as speech recognition and image recognition [20-231, we use a three-layered structure of MLP feedforward neural network to accomplish the recognition task. Figure 3 is an example of the MLP structure used. It consists of one input layer, one hidden layer, and one output layer and is trained with supervised manner. The original algorithm [17] used for training a multilayer perceptron is the backpropagation (BP) algorithm, which is an iterative gradient algorithm designed to minimize the mean-squared error between the desired output and the actual output for a particular input to the network. However, the backpropagation algorithm suffers from two shortcomings. The first is that the learning rate is generally very slow. The second is the existence of local minima. The reasons for these are the complex nonlinear shape of the cost function in the weight space and the lack of optimality in choosing the learning rates associated with the different weights. To improve these problems, many modified algorithms have been proposed. One of them [24] may work well for small input dimensions and is applied to our case. The detailed algorithm is stated as follows. From Figure 3, the input units are by-pass units which distribute the input signals to the hidden units. Equations which describe the signal flowchart are given by:
xhj
=
WjiXi,
(7)
i=l
!/hj =
f(xhj)v
(8)
J.-D. LEE
(a) Three-layered feedforward structure used in the multilayer perceptron.
net (b) Sigmoid transfer function used in the units of the hidden/output
layers.
Figure 3.
Nj
.3&k
=
c
wkjyhjv
(9)
j=l
!/ok =
fbok),
(10)
where Zi is the input value of the ith input unit, and also is its output value, wji is the strength of the linking weight between the J‘th hidden unit and the ith input unit, NI is the total number of the input units, Zhj is the net input to the 3‘th hidden unit, f(e) is a nonlinear transfer function (activation function), Yhi is the output value of the 3*th hidden unit, wkj is the strength of the linking weight between the kth output unit and the jth hidden unit, Nj is the total number of the hidden units, zok is the net input to the kth output unit, &,k is the output value of the kth output unit. Typical transfer functions are the sigmoid characteristic function which consists of two forms: the unipolar continuous function and bipolar continuous function. In our case, the unipolar continuous function is used as a transfer function and is given by:
l
f(P) = 1 +exp(-X-p)’
(11)
Object F&cognition
111
where p is a dummy variable, representing the net summation input for a unit, X is proportional
to the neuron gain determining the steepness of the continuous function f(p) near net = 0, X > 0, and X is set to less than 1. In the standard backpropagation algorithm, the total error of the network, E, is defined as _
E= ;
NL.
N,
c2 C=l k=l
(?hk,c
- tk,c)2,
(12)
where &,k,e is the actual value of the lath output unit for the cth input pattern, i?k,cis the desired value of the kth output unit for the cth input pattern, Nk is the total number of the output units, NC is the total number of the input patterns. Hereafter, the script c will be suppressed. To minimize E over weights by the gradient decent method [17], the weights are updated according to the following
rule:
Lhkj(s + 1) =
Nc dE + PAWcj(S), -vcaWkj
(13)
==l
Nc dE + 1) = -7 C s
Awji(s
+ PAWNS,
(14)
jr
c=l
where dE
awkj
= (Yak - tk)Yok(l
g
= 5ttY.k
- Yok)Yhj 7
- tk)Yok(l
- Yok)Wkj)
* yhj(l
-
Ykj),
k=l
s represents the sweep number (i.e., the number of times the network has been through the whole set of the input patterns), the learning rate 7 is a small positive value, and is set constant here, p is the momentum factor. This method is not guaranteed to find a global minimum of E since gradient decent may get stuck in local minima. When the actual value approaches either extreme value, the factor y&(1 - &,k) in equations (15) and (16) makes the error signal very small. In other words, an output unit can be maximally wrong without producing a strong error signal with which the weights could be significantly adjusted. This retards the search for a minimum in error. Thus the convergence tends to be extremely slow. According to [24], instead of using the error function defined in the standard backpropagation algorithm, we utilize the following error function: E = - F
?(tk,.
ln(Yok,c)
+ (1 - tk,c)
ln(l
-
fhk,c)),
(17)
c=l k=l
and the weights are then updated according to the above rules (equations (13)-(16)) placement of equations (15) and (16) by the following two:
e=
(Yak - tk)
g
= 2((Yok 3’
(18)
’Yhjc
- tk)Wkj)
with re-
. Yhj(l
- Yhj)Si,
09)
k=l
Thus, the error signal, propagating back from each output unit, is now directly proportional to the difference between the target value and the actual value, so the true error can be measured.
where xi is the input value of the ith input unit.
112
J.-D.
LEE
One important issue with respective to MLP is how to determine the number of the hidden units required to perform classification. Many papers [25-271 are concerned with thii problem and some of them used a pyramid structure directly, that is, the number of the hidden units is the mean of the number of the input and output units. Reference [26] shows that for an arbitrary training set with M training patterns, a multilayer neural network with one hidden layer and with M - 1 hidden layer neurons can exactly implement the training set. Note that it is only sufficient and not necessary to use M - 1 hidden neurons to exactly implement the training set. Basically, how many neurons are required in the hidden layer is still an open question and varies with applications. In this paper, we will investigate the performance of the neural network with a hidden layer of 8, 16, 32, and 64 nodes.
5. EXPERIMENTAL The experimental
system
consists
RESULTS
AND
of an image preprocessing
DISCUSSION unit,
feature
extraction
unit (i.e.,
feature compression unit), and object recognition unit. Methods of feature extraction using PCA and object recognition using MLP have been presented in previous sections. Before discussing experimental
results,
the image preprocessing
procedures
are described
as follows.
In our approach, the image system is composed of a TV camera and the image frame grabber OC-F64 connected to an IBM PC486. The objects are placed on a black and opaque plate in order to obtain good binary images. The CRT screen has a resolution of 512 x 480 pixels. Fifteen object shapes are used in the experiment. The training data consists of 50 examples of each of these objects with different scale, translation, and rotation. The test data consists of 30 examples of each object under arbitrary orientations. Before extracting the contours of the objects, the following image preprocessing techniques are applied. caused in the picture-taking process. The method Step 1. Correct the geometrical distortion uses a linear transformation function [28] to describe the mapping between the ideal picture and the distorted picture (i.e., the grabbed image). A bilinear interpolation technique is then applied to obtain the corrected picture. Step 2. Perform edge detection and thinning on the corrected picture. The thinned pictures of six experimental objects are illustrated in Figure 4. The edge detection process used the Sobel edge operator [29] to detect edge points. The thinning method uses the SPTA algorithm [30]. direction to derive the boundary Step 3. Trace the thinned picture in the counterclockwise curve of an unknown object. After obtaining the required boundary contour, their NRD coefficients can be derived using the algorithm presented in Section 2. Figure 5 illustrates the NRD spectrum of six experimental objects. In the feature compression unit, we use various configurations for the compression network with 10 to 30 output nodes. For comparison, the same recognition task is performed feeding the full complement of 128 NRD coefficients directly into the pattern recognition unit. Table 1 shows the performance of each of the recognition networks on the test data. Each row gives the correctness rate of object recognition for an MLP network with a single hidden layer of 8, 16, 32, and 64 nodes. Columns indicate the number of input nodes feeding the object recognition network. Each of the input columns represent averages of 20 training runs. We note that a mere 20 principal components give recognition performance comparable to that of any other configuration, including using all 128 NRD coefficients. That is, a six-fold reduction in the dimension of the input to the recognizer is achieved without loss in classifier accuracy. The other primary effect of this dimension reduction is a sharp drop in time required to train the object recognition network. Since there are fewer input nodes, there are fewer weights; hence, the training process is faster. As the hidden layer in the pattern recognition network becomes Table 2 and Figure 6 show this behavior. In larger, the speedup becomes more significant. Figure 6, the ordinate gives the amount of cpu time (seconds) used to train the object recognition
Object Recognition
113
(4
(4 Figure 4. The thinned pictures of six experimental objects. network. The abscissa gives the number of inputs to the object recognition network. The four curves (from top to bottom) correspond to 64, 32, 16, and 8 hidden nodes. Each curve shows a roughly linear increase in training time with increasing dimension of the input space and in the number of weights in the object recognition network. We can note that the time required to train the compression network is generally much smaller than the time required to train the uncompression network.
From the experimental results, the effort devoted to computing the reduced representation is more than made up for by the acceleration in the recognizer training. Smaller dimensional representation also reduces memory and storage requirements, and hardware implementation
114
J.-D. LEE 1200 1000 2
800
i4
400 200 0
Item number(n) (a) 140 120
3 g
3
tit)-
40
-
20
-
0
Item number (n) 400
04
350 300 i% 9e 250 $! 3 $a
200 150 100 50 0
Figure 5. The NRD spectrum of six experimental objects.
Object Fk 1000
900 800
f
Item number (n) (4
600 500 9 E 3 ” E <
400 300 200 100 -
(4
250
200 e g
150
8E
*00
50
0
lw
(0 Figure 5. (cont.)
115
J.-D. LEE
116 Table 1. Recognition
rate for the neural network with different configuration.
Number of
Number of Input Nodes
Hidden Nodes
10
15
20
30
50
80
128
74.3%
84.8%
95.6%
99.3%
99.3%
99.5%
99.5%
16
1 76.7%
1 85.7%
1 96.3%
1 97.5%
32
77.6%
86.1%
95.9%
97.6%
99.4%
99.5%
99.7%
64
77.9%
86.4%
97.1%
98.1%
99.5%
100%
100%
8
I L
I
1 99.1% 13%
14
Table 2. Total training time (seconds) for 8, 16, 32, and 64 hidden nodes, Number of Hidden Nodes 8
I
16
Number of Input Nodes 10
15
2.20 E+04
3.50 E+04
20
30
50
80
128
5.50 E+04
7.10 E+04
7.60 E+04
8.50 E+04
1.10 E+O5
1 4.80 E+O4 1 6.20 E+O4 1 7.80 E+04 1 1.20 E+05 1 1.80 E+05 1 2.30 E+05 1 2.70 E+05 1
32
8.20 E+04
8.90 E+04
1.10 E+05
1.90 E+O5
3.50 E+O5
4.30 E+O5
5.10 E+O5
64
9.10 E+04
1.20 E+05
1.90 E+05
3.10 E+05
4.30 E+O5
6.50 E+O5
1.30 E+O6
+8
1.4OE+O6
+I6
-A-32
--w-64
1.2OE+O6
6.00E+05
-
4.00E+O5
-
2.OOE+O5 O.OOE+OO 10
15
20
30
50
80
128
The number of input nodes Figure 6. Total training time for 8, 16, 32, and 64 hidden nodes.
costs.
Neural implementations of principal component analysis allow the reduced features to be calculated in parallel. This will be particularly important as neural computers become available.
6. CONCLUSIONS In this paper, a neural network using optimal linear feature extraction scheme, based on the concept of principal component analysis (PCA), was proposed to recognize two-dimensional objects in industrial environments. This approach consisted of two stages. First, we described the procedures of determining the NRD coefficients of unknown 2-D objects from their boundary. To speed up the learning process of the neural network, an optimal linear feature extraction technique was used to extract the principal components of these NRD coefficients. Then, these reduced components were utilized to train a feedforward recognition network (MLP) for object recognition. We compared recognition performance, network sizes, and training time for net-
Object Recognition
117
works trained with both reduced and unreduced data. The experimental results showed that a significant reduction in training time could be achieved without a sacrifice in classifier accuracy.
REFERENCES 1. M.K. Hu, Pattern recognition by invariance, In Pruc. IREq9, pp. 14-28, (1961). 2. I. Sklansky, On the Hough technique for curve in picture, IEEE fins. on Computer 27, 923-926 (1978). 3. R.O. Duta and P.E. Hart, Use of the Hough transformation to detect lines and curve in pictures, Communication PACM 15, 11-15 (1972). 4. J.M. Chen, C.K. Wu and X.R. Lu, A fast shape descriptor, CVGIP 84, 282-291 (1986). 5. T.P. Wallace and O.R. Mitchell, Analysis of 3-D moment using Fourier descriptors, IEEE i%ans. on PAMZ, PAMI6, 583-588 (1980). 6. T.P. Wallace and P.A. Wintz, An efficient 3-D aircraft recognition algorithm using normalized Fourier descriptors, CVIP Proceedings 13, 96-126 (1980). 7. P.F. Singer and R. Chellapa, Classification of boundaries on the plane using the stochastic model, CVZP Proceedings, Washington, DC, June 1983, pp. 146-147. 8. S.R. Dubios and F.H. Glans, An autogressive model approach to 2-D shape classification, IEEE Trans. on PAMZ, PAMI1, 55-66 (1986). 9. R. Linsker, Self-organization in a perceptual network, IEEE Computer 21, 105-117 (1988). 10. B.A. Pearlmutter and G.E. Hinton, G-maximization: An unsupervised learning procedure for discovering regularities, In Proceedings of the Conference on Neural Network, American Institute of Physics, (1986). 11. P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach, Prentice-Hall, (1982). 12. E. Oja, A simplified neuron model as a principal component analyzer, J. Math. Biology 15, 267-273 (1982). 13. T. Sanger, Optimal unsupervised learning in a single-layer linear feedforward neural network, Neuml Networks 2 (6), 459-473 (1989). 14. E. Oja and 3. Karhunen, On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix, Journal of Mathematical Analysis and Application 106, 69-84 (1985). 15. J.-D. Lee, Object recognition using neural network with optimal feature extraction, Technical Report, NSC83-0117-C182-OOlE, (1994). 16. J.M. Zurada, Introduction to Artificial Neural Systems, West Info Access, (1992). 17. J.A. Freeman and D.M. Skapura, Neural Networks: Algorithms, Applications and Programming Techniques, Addision-Wesley, (1991). 18. R.P. Lippmann, Pattern classification using neural networks, IEEE Communication Magnize (November 1989). 19. R.P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Magnize 4, 4-22 (April 1987). 20. D.P. Morgan and CL. Scofield, Neuml Networks and Speech Processing, Kluwer Academic, (1991). 21. D.J. Burr, Experiments on neural net recognition of spoken and written text, IEEE Trans. on Acoustics, Speech, and Signal Processing 36 (7) (July 1988). 22. H. Bourlard and C.J. Wellekens, Multilayer perceptrons and automatic speech recognition, Proc. IEEE Conf. on Neural Networks, (1990). 23. A. Bendiksen and K. Steiglitz, Neural networks for voiced/unvoiced speech classification, Proc. IEEE ASSP, (1990). 24. A. Van Ooyen and B. Nienhuis, Improving the convergence of the backpropagation algorithm, Neural Networks 5, 465-471 (1992). 25. R.P. Gorman and T.J. Sejnowski, Analysis of hidden units in a layered network trained to classify sonar targets, Neuml Networks 1, 75-89 (1988). 26. M.A. Sartori and P.J. Antsaklis, A simple method to derive bounds on the size and to train multilayer neural networks, IEEE tins. on Neural Networks 2 (4) (July 1991). 27. G. Mirchandani and W. Cao, On hidden nodes for neural nets, IEEE lens. on Circuits and Systems 36 (5) (May 1989). 28. A. Rosenfeld and A.C. Kak, Digital Picture Processing, 2 nd edition, Academic Press, New York, (1982). 29. O.D. Faugerss, Fundamentals in Computer Vision, Cambridge University Press, New York, (1983). 30. N.J. Naccache and R. Shinghal, SPTA: A proposed algorithm for thinning binary patterns, IEEE tins. on SMC (SMC-14), 409-418 (1984).