Modeling and classification of shape using a Kohonen associative memory with selective multiresolution

Modeling and classification of shape using a Kohonen associative memory with selective multiresolution

Neural Networks, Vol. 6, pp. 275-283, 1993 0893-6080/93 $6.00 + .00 Copyright © 1993 Pergamon Press Ltd. Printed in the USA. All fights reserved. O...

741KB Sizes 0 Downloads 42 Views

Neural Networks, Vol. 6, pp. 275-283, 1993

0893-6080/93 $6.00 + .00 Copyright © 1993 Pergamon Press Ltd.

Printed in the USA. All fights reserved.

ORIGINAL C O N T R I B U T I O N

Modeling and Classification of Shape Using a Kohonen Associative Memory With Selective Multiresolution M I C H A E L SABOURIN* AND A M A R M I T I C H E ~ *Bell-NorthernResearchand qNRS-T61&ommunications (Received 13 December 1991; revisedand accepted 13 July 1992) Abstract-- We propose the use of a Kohonen Associative Memory (KAM) with Selective Multiresolution ( SMR ) for

the task of modeling and classification of shapes. We demonstrate this technique for Optical Character Recognition (OCR). Modeling is performed on characters selected from 200 fonts as well as documents containing characters with topological shape mutilations, fragmentations (broken characters), and fusions (touching characters). A total of l lO, O00 training samples are used. This large training set attempts to represent the variation of character shapes due to different font styles, document skew, noise, photometric effects, etc. A n omnifont classifier produced using the SMR modeling procedure outperforms a state of the art OCR system. Comparisons to state of the art and benchmark model building procedures are provided.

Keywords--OCR, Model building, Selective multiresolution, Isodata, Kohonen Associative Memories. 1. M O D E L I N G AND CLASSIFICATION

of vector set partitions. The goal is then to seek optimal partitions. In clustering, models are exemplar feature vectors (cluster centers). Structural Methods describe shape using constituent primitive parts and the relations between these parts. Models of classes can be grammars, where class membership is decided by parsing. Models can also be general relational structures, where class membership is decided by graph matching. Neural Networks implement rather simple forms of category formation and associative memory. In category formation, the neural network learns a set of categories and classifies incoming inputs according to these. The function of an associative memory is to encode, store, and selectively recall relevant information: By extension, associative memory systems can therefore be used as classifiers when they store exemplar patterns (models). Modeling information is associated with the collective state of the network rather than the state of a particular neuron: Models are implicitly represented by the interconnection weights. Several neural network architectures have been proposed: Among the most popular are the multilayer perception (Lippman, 1987; Rumelhart & McClelland, 1986), the Hopfield network (Hopfield, 1984), the Carpenter-Grossberg network (Carpenter & Grossberg, 1987), and of particular interest to our study, the Kohonen Associative Memory (KAM) (Kohonen, 1988). The KAM builds represeniatives by presenting vector patterns, determining the winning model (closest

OF SHAPE

Methods for classifying visual shapes can be organized into three broad categories: statistical methods, structural methods, and neural networks (Mitiche, Laganiere, & Henderson, 1990). Statistical Methods can be divided into two broad classes: Bayesian methods and clustering. Bayesian methods prescribe decision rules by which a sample is assigned to the class with the highest a posteriori probability. The relevant probabilities must either be known explicitly, or estimated using Parzen windows or other techniques. The conditional probability distribution functions can be seen as the shape models in this case. If the class-conditional probabilities have a known form and can be parameterized, models reduce to parameter vectors (learned using maximum likelihood estimation, Bayesian learning, etc.). Clustering is a procedure by which unlabeled data vectors are divided into groups which, ideally, form natural "clusters." Clustering involves defining a similarity measure between data vectors, and a criterion function which evaluates the quality

Acknowledgement:Wewish to thank Bell-NorthernResearchfor its fundingand use of computerresources.This work is supported in part by NSERC grant #OGP0004234. Requestsfor reprintsshouldbe sent to MichaelSabourinor Amar Mitiche, Bell-Northern Research, 16 Place du Commerce, lie des Soeurs, Verdun, Que., Canada H3E 1H6. 275

276

M. Sabourin and A. Mitiche

to the training pattern), and pulling this model and its neighbours towards the training data. Each model induces a partition of the space of pattern vectors (clustering). The model building procedure produces a minimum distance classifier which utilizes the same distance measure used during training. The classifier functions by determining which model is closest to the unknown input pattern. A KAM is of interest since it has been used successfully in (a) image processing (coding of images [Le Bail & Mitiche, 1989; Nasrabadi & Feng, 1988]); (b) optimization (travelling salesman problem [Ang6niol, Vaubois, & Le Texier, 1988; Luttrell, 1988]); and (c) storage and retrieval problems (images of human faces [Kohonen, 1988]). From this broad range of successful applications, it would appear that the KAM may have a significant impact on an important application such as Optical Character Recognition (OCR). Here, KAM is used in building models for an OCR application. In general, KAM produces satisfactory models. However, we have detected a few shortcomings: (a) KAM requires a fixed number of nodes which need to be specified prior to model building, causing a problem symptomatic with any memory of fixed size: The memory could be too large or too small; (b) training often produces overloaded nodes (data collision) or empty nodes; and (c) inefficient training method focuses on the continuum of training data rather than on portions of data where data collisions are likely to occur. To alleviate the shortcomings of KAM, we propose a KAM with Selective Multiresolution (SMR). SMR is a hierarchical KAM, which is dynamic: The number of models (size of memory) need not be specified a priori. Nodes in which memory collisions occur are expanded locally. This has the effect of concentrating the training effort on the portions of the training data where overload occurs. SMR overcomes the shortcomings of KAM while preserving the desired properties. The remainder of this paper is organized as follows: In Section 2, the KAM is described. In Section 3, we present a novel approach to model building, namely SMR. Section 4 describes experiments in model building as applied to OCR. Section 5 presents concluding remarks and discussions. 2. K O H O N E N

ASSOCIATIVE

MEMORY

The KAM is a neural network which associates to each input pattern a representative output pattern (Figure 1 ). This method of model building can be seen as performing vector quantization in that it seeks models which minimize the quantization error. ~ Models are

However, few theoretical results are available (Kohonen, 1988; Koml6s & Paturi, 1988; Ritter & Schulten, 1986).

adjusted incrementally as new data is presented. An interesting aspect of the KAM is that some ordering takes place: Adjacent models in pattern space are near each other in model space. 2.1. Kohonen Algorithm

1. Select initial conditions for model patterns, y~, i E [ 1 . . . N] using a random number generator. 2. Present training samples, x i, j E [1 . . . M]. (2a) Determine y*, which minimizes d(xj, yi ), d being a measure of the similarity of two patterns. (2b) Update the network: y i ( t + I) = y i ( t ) + % , i * ( t ) 6 ( l ) ( x ) - y i ( t ) ) ,

where i* is the index of model y*, i is in the neighborhood of i*, t is the training iteration, 3' is the neighbourhood gain function, and 6 is a gain function. The gain functions 3'( t ) and 6 (t) are decreasing in t, and 3' decreases as I i - i*1 increases. . Evaluate the stopping condition: d ( y i ( l + I ), yi(/)) < ~,

i.e., the amount that model y~ has changed from iteration t to t + 1 is smaller than a specified ~. If the stopping condition is not satisfied for all models, Yi, go to 2. A typical neighbourhood gain function, 3"i,i*, is a Gaussian window around i*, of the form "yi.i.(l)

-

e i i*12/2~,2

where \ ~r0/

'

where ao is the initial variance of the neighbourhood function, af is the final variance, t is the training iteration, and Imaxis the number of training iterations. The Gaussian window, 7, relates to the neighbourhood of the model indexed by i*, in that the impact of the training sample on neighbouring models decreases as Ii - i*1 increases, according to a Gaussian distribution. Note that the variance of the Gaussian decreases in t, which means that the dependency of adjacent models decreases in time (although the ordering is preserved). Based on experience, the value of o~r is chosen to be two orders of magnitude smaller than ao, and the value of a0 is selected to be a fraction of the number of models, i.e., ao = aN, where we select c~ = 0.5. A typical gain function, ~(t), is an exponential decay in t, of the form

~o]

Modeling and Classification of Shape

277

x

x

xX

xx

x x x xXxX

x Xx

x

xXxxxX

x x

x

x

Pattern

]l(Xxx x Jl JJ x x xxxXxxX~ x" x x

x

x X x X~Xx x x

x xX X

XX )kXx

x

X

x

xXXXXX

x

C ~x_X,.x. ~ x x x

X x~x

x

x x X

.~x

x~X x ~( x x x x ~(xxX

xXxx

x x

X x x XxX x X X Xx xx x x

~xx~

Ys

xx

x X

Xx

Pattern Space

N xx

x xx

)X

xx

Space

xx ~x~

X x~l~(" X X x ~ X X x XX x xxx x Xx X X xX X x x x

x

.XXXXxX ,~x, X x

x

xxx

xXxX x x xX~xx x

yo

x

~x~x

3x

xXx~

x XxxX xx X xx x

Ylo

FIGURE 1. Kohonen Associative Memory (KAM). Training produces an ordered set of models: Similar models, Yl, in pattern space have similar index, i . ( a ) Training samples. (b) Ordered models.

Training the KAM produces an indexed set of models. The indexing is significant in that models which are near each other in index space are also near in pattern space. Each specific indexing value is sometimes called a node, and each node is associated with a model, forming a pair (i, y; ). With respect to OCR, ordering of the models produced using KAM will produce adjacent models (in index space) with similar shapes in pattern space. For instance, it has been noted the KAM procedure produces sequences of models such as {... ITCG...}, or { . . . D O O Q R A . . . } . The models, and the distance measure used to generate the models, induce a partition, {S; }, of the pattern space, specified as S; = {xj/d(xj, y i ) < d ( x j , yk) V k ~ i } .

(1)

Thus, to each node i we associate a partition Si. We state that region S; is homogeneous, if

d*(l(xj), l(y;)) < e*,

(2)

xjESi

where l(xj) is the label of training pattern xj, d* is a labelling distortion function, and ~* is either a constant or a fraction (percentage) of the number of samples in the region. Note that each xj was manually labelled before model building. Also note that the labelling distortion function used here is the "delta" function, i.e., d*(a, b) = 1 i f a = b, d*(a, b) = 0 i f a ~ b. 3. SELECTIVE M U L T I R E S O L U T I O N In general, KAM produces satisfactory models. However, we have noted the following shortcomings: (a) it is necessary to perform several experiments in order to determine a satisfactory value of N (number of models); (b) model building using KAM can produce data collisions (overloaded nodes), or nodes with no entry; and (c) the training procedure requires a large amount

of processing time since the distance between input and model patterns needs to be computed for each model. Training KAM attempts to partition the entire pattern space, not only the portions of space which contain colliding data. We propose SMR, a multipass model building algorithm based on the KAM. SMR is dynamic." It is not necessary to specify a priori the number of models required. SMR is hierarchical." Training effort focuses on that portion of pattern space in which data collisions occur. 2 In addition, the training procedure is computationally efficient since fewer distortion computations are required. The SMR algorithm is summarized as follows: The pattern space is divided into N partitions using the KAM algorithm described above. The homogeneity of each node is then evaluated using labelling information. All nonhomogeneous nodes are expanded to form a local associative memory, which has the effect of dividing the induced partition in pattern space to form subpartitions. Nodes in these local associative memories can themselves be expanded at a later stage (see Figure 2).

3.1. SMR Algorithm 1. Create a KAM of size N. 2. Check the node homogeneity, eqn (2). If for node i, its associated model Yi, satisfied Zj d*(l(xj), l(yi)) > ~*, then create a new KAM of size N~ to expand this node. 3. Check the node homogeneity for the newly created models. Repeat step 2 recursively until the homogeneity condition is satisfied.

2 Training concentrates on nonhomogeneous regions of the pattern space, e.g., decision boundaries.

278

M. Sabourin and A. Mitiche

X X

X X X

X

X X

X X

X

X

× X

× × × X X xXX

X

X X

X

XX

×

X

×X

X X X X

×

X X

xXx

X X X X X x X X X



xX X X X

X

X X

X

X

XXx X

X

X

×

xXx ×

X

(a)

(b)

(c)

(d)

S 1

(e) FIGURE 2. Selective Multiresolution (SMR). A Kohonen Associative Memory (KAM) is trained using the patterns in (a) to produce models (b). The model $2 contains data collisions and it is expanded in (c) by utilizing the data represented by model $2. This procedure continues in (d) until each model is homogeneous. The final position of the models is shown in (e). Note that data in the shaded regions of (c) and (d) was not used in generating models $21, S~, S . , S.1, and S.2.

Modeling and Classification of Shape

279

The node homogeneity is measured by summing the labelling distortion, Zj d*(l(xj), l(yi )), where d* is as defined in Section 2. Usually, this sum is simply the number of samples which do not agree with the majority label, l(y~ ). If this distortion exceeds the threshold, ~*, then the node is nonhomogeneous. A value of e* = 0 will force each cluster to be 100% homogeneous. A value of ~* -- 0.01n/ (where n~ is the number of samples in cluster i) will force each cluster to be at least 99% homogeneous, and so on. This latter value is sometimes useful, particularly when labelling errors occur in the data. The value of N~ (i.e., size of Kohonen subnetwork) is typically smaller than the number of nodes in the initial KAM partition. After only two or three levels of recursion, the desired partition is usually found, i.e., each node is homogeneous.

3.2. Discussion SMR concentrates on the nonhomogeneous regions of the pattern space, and SMR generally represents classes of distinct patterns with few models, while representing classes of similar patterns with many models. In the context of OCR, few models are needed to represent "w", "k", or "z", while many models are required to represent similarly shaped characters such as "t", " f ", or "D", "O", "0". In contrast, KAM concentrates on the resolution of the entire grid and requires a large number of models to distinguish distinct or similar patterns. In other words, KAM focuses an equal amount of effort on partitioning the entire space. Figure 2 illustrates the SMR procedure. In the first pass, KAM performs a coarse partitioning of the pattern space. Then, nonhomogeneous regions, Si, are isolated. These regions are subpartitioned using a new KAM, to form regions, S o . This procedure is repeated recursively until a suitable total distortion is reached. The final partition is shown in Figure 2(e). As a classifier, the SMR procedure functions in a similar manner. The nearest partition is first determined

--•Pre-ProcessorH Feature Extractor

(as in Figure 2 (b)), and if no subpartition exists, the character is labelled. If a subpartition does exist, then the nearest subpartition is determined. This procedure repeats until a unique subpartition is isolated. 4. O M N I F O N T PRINTED TEXT RECOGNITION In this section, the model building techniques described above are used to determine reference templates in an omnifont OCR application. These neural network based models are compared to models built using conventional methods. Note that all the model building procedures described below will utilize a large sample of labelled patterns. These patterns are derived using conventional feature extraction software. An interesting OCR application of neural networks to both feature extraction and classification is given in LeCun et al. (1990).

4.1. Optical Character Recognition In optical character recognition, models are used as reference patterns. An unknown character is processed to produce a set of features which form a pattern. This unknown pattern is mapped onto the reference patterns and the classifier determines the most similar pattern. The unknown character is then labelled accordingly. A typical OCR system (Sabourin & Mitiche, 1991 ) consists of scanning, preprocessing, feature extraction, classification, and context processing (Figure 3). A document is scanned via a facsimile machine (200 dots per inch). The document is preprocessed to isolate individual objects and to segment text and graphics regions. Feature extraction is the process of computing high-level information about individual characters. Individual characters are classified and assigned an ASCII label. Context processing is used to correct errors in classification (and earlier processing) by utilizing language statistics and spelling rules. Features are used in classification because when properly selected they tend to be less sensitive to char-

Neural Network Classifier

Context Processor

Neural Network Substitution Rules FIGURE 3. Optical Character Recognition (OCR). Connected objects are isolated and text is separated from graphics. Features are extracted from characters and the representative models are used in classifying these characters. Finally, the labelled text is corrected for language context using special handlers and substitution rules determined during model building.

280

M. Sabourin and A. Mitiche

acter variations (due to font differences or scanning effects) than the original bit map of the character. In this system, the primary feature used in modeling and classification is the character shape, as represented by the tangent field of its contour. The tangent field is derived by smoothing the chain code description, then uniformly sampling the contour to 64 points, and determining the angles between adjacent samples. Smoothing reduces noise influences and uniform sampling makes the tangent field scale invariant. Other features, such as coreline classification (the relative position of a character on a line), and genus (the number of white spaces in a character), are used to prescreen the data in order to reduce computation time. The coreline feature is computed using algorithms which utilize character statistics. The genus is computed by comparing the subsuming rectangle coordinates of white and black objects. The use of screening features induces a hierarchical character classifier similar to that shown in Figure 4. 4.2. Model Building in Optical Character Recognition

etc. In addition, poor image quality can cause some characters to (a) fragment, such as an "h" splitting into an 'T' and an "i"; (b) merge, such as an " f " merging with and 'T' to become an "fl" ligature; or (c) mutilate, such as a "b", which opens up causing the characters topology (contour) to change. The large variation in shapes makes it difficult to determine the number of models that are required prior to model building. This supports the use of a dynamic model building procedure. Modeling was performed on characters selected from a large number of fonts, thereby approaching an omnifont environment. Samples were selected from 200 fonts ( i.e., 12,000 training patterns), and 50 documents containing characters with shape mutilations, fragmentations, and fusions (i.e., 98,000 training patterns), for a total of 110,000 training patterns. This large training set attempts to represent the full variation of character shapes which exist. In the following results, two distance measures were considered: Euclidean distance, and Dynamic Contour Warping (DCW) distance. The Euclidean distance between two k-dimensional vectors, a and b, is defined as K

The main challenge in building models in OCR is the large variation in shapes within a class of characters. This variation exists for a myriad of reasons: font styles, document noise, photometric effects, document skew,

dE(a, b) = ~ (ak

-

b/,-) 2.

i:1

The DCW distance, docw, is computed using a dynamic programming method:

HIERARCHICAL CHARACTER CLASSIFIER

I DESCENDER ASCENDER I

DESCENDER 1o-.41o-,I

I NORMAL

ASCENDER 1=,,,,4 I°-.,i

IPUNCTUATII ON Io=,.ql..°--°l

FIGURE 4. Hieramhical Character Classifier. The classifier is divided into several sul~lassifiera, each specialized to recognize a portion of the ASCII character set. The specialist classifier is selected based on reliable screening features: the coreline class (position of a character with respect to baselines) and genus (the number of white spaces in a character).

Modeling and Classification of Shape

281

TABLE 1 Summary of Results for Several Model Building Procedures. Performance of the Kohonen Associative Memory, Selective Multiresolution, and ISOOATA Model Building Procedures are Compared to Both the Multilayer Perceptron (MLP) and Existing Operational Models Develpecl Using an ad hoc Approach. Each Procedure Utilized the Same Training Data Composed of the Tangent Field of 110,000 Characters. All the Procedures Result in Explicit Models (i.e., Feature Vectors) Except for the MLP, where Models are Stored Implicitly in the Interconnection Weights. Two Distance Measures are Considered: the Euclidean Distance, and the Dynamic Contour Warping Distance.

Comparisons of Several Modeling Procedures Procedure KAM SMR ISODATA Current models Multilayer perceptron

dDcw(a, b) = minp Z lai

--

Distance Measure

# Models

Modeling Performance

Euclidean Contour warping Euclidean Contour warping Euclidean Contour warping

1232 1120 1181 1150 1180 1180

97.1% 97.0% 97.0% 97.7% 95.5% 96.3%

Contour warping n/a

700 special

96.1% 97.0%

bjl Wp(i,j),

p

where p is a path through the warping grid, and Wp is a stretching penalty.3 For more information about DCW, see Duffle (1985) and Tappert (1982). Modeling was performed on the training set using (a) KAM; (b) SMR with three-layers of resolution; and (c) ISODATA (Duda & Hart, 1973; Linde, Buzo, & Gray, 1980). ISODATA is a benchmark conventional model building scheme. The modeling results will be compared to models generated using multilayer perceptrons (Sabourin & Mitiche, 1991 ), and the current models employed in our OCR system (derived using ad hoc methods). 4.3. Discussion ISODATA model building had several difficulties arising from the selection of initial conditions. (a) The initial conditions were first set to random, which resuited in most nodes being empty. This is due to the nature of the training data (which has large regions of "emptiness"). (b) Alternately, the initial conditions were selected as the center of all training patterns (average value), perturbed by a small random value. This method eliminated most of the empty node difficulties, although several data collisions occurred. (c) Finally,

3 The "warping p a t h " can be any sequence, { ( t ,•J ).t } , =Ml , where M is allowed to vary, such that (i, j )0 = (0, 0), (i, j ) u = (k, k ), where k is the dimension o f the features, and the path is connected, i.e., (it - it-l) = 0 or 1, (Jl - Jt-~) = 0 or 1. The warping penalty, Wp, is used to discourage variation from the diagonal (where the diagonal isdefined as i ~ - i t _ t = l , j l - j l - t = 1).

a dynamic method suggested in Andrews (1972) was tried, where the training pattern furthest from every model was selected as the next model. In several cases, this method produced good results, although several singleton models were produced (under-representation of regions in feature space of high density). In the tables, the second method of selecting initial conditions was utilized. Note that none of the ISODATA methods produced models as well as KAM or SMR. KAM usually produces satisfactory models, although several empty nodes may occur, as well as data collisions (depending on the number of nodes selected). The KAM invariably outperforms the ISODATA method in terms of training speed and quality of the models. Unlike KAM and ISODATA, the SMR model building algorithm is able to resolve data collisions by expanding the appropriate nodes. In addition, few empty nodes occur due to the coarseness of the initial partitioning. 4.4. Results

In the appendix, Table A1 provides detailed results of model building using selective multiresolution with a DCW distance measure. Notice that a total of 26 possible screening feature combinations occur, which represents all the possible screening combinations observed in the training data. This model building procedure required l, 150 models to partition the feature space with a modeling performance of 97.7%. The overall training performance using KAM, SMR, and ISODATA for two distance measures is presented in Table 1. Each modeling procedure used screening

282

M. Sabourin and A. Mitiche TABLE 2 Test Results. The Test Set Consists of Five Typical Office Documents (Total of 10,500 Characters). Models Build Using Selective Multiresolution and a Dynamic Contour Warping Distance Measure Had the Highest Recognition Performance.

Test Results for Various Modeling Procedures Procedure KAM SMR ISODATA Current models Multilayer perceptron

Distance Measure

Performance

Euclidean Contour warping Euclidean Contour warping Euclidean Contour warping

96.0% 96.5% 96.3% 97.5% 94.9% 95.5%

Contour warping Output supremum

95.9% 96.7%

features to simplify the modeling procedure. For a fair comparison, the number of models, N, is approximately the same for each method. The best modeling procedure tested is SMR using DCW, which has an overall modeling performance of 97.7% (see Table 2). In other words, 97.7% of the 110,000 training patterns were partitioned uniquely. This is excellent considering the similarity in several patterns such as ( k K ) , ( 0 0 ) , ( l l I ) , ( ' ' ) , and so on. The second best modeling procedure was KAM using a Euclidean distance, with a 97.1% modeling success. 4 For reference, note that previous model building for the OCR system had a performance of approximately 96.1%. Also note that the multilayer perceptron reported in Sabourin and Mitiche ( 1991 ) had a modeling performance of approximately 97.0%. This MLP used the same screening mechanisms described in Section 4.1, with the input consisting a set of 64 dimensional feature vector (tangents) and each output corresponding to a specific category. Another modeling procedure of interest is the SMR using Mean Squared Error (MSE). Although this procedure yielded a lower modeling performance of 97.1%, it generates a much faster classifier. This is due to the large amount of computation needed to perform the dynamic programming necessary for DCW. Computationally, the MSE classifier is approximately 100 times faster than DCW. Note that for certain screening possibilities, some methods (not necessary SMR-DCW) were able to outperform others. By combining the results, we can choose a more "optimal" classifier by selecting the best model building method for each screening feature. This hybrid classifier would have a combined recognition rate exceeding 97.7%.

4In a highperformanceOCR system,the differencebetween97.1% and 97.7% is consideredvery significant.

4.5. Test Results

A test was performed on a set of five documents which were not used in developing the models. The results, given in Table 2 indicate that the better models produced using SMR result in a superior character classifier. Note that SMR using the MSE distance did not generalize as well as the SMR using contour warping. This is due to the property of contour warping which produces a small distortion for tangent field misalignment, while the MSE distance may produce a large distortion. 5. CONCLUSIONS We have presented a novel modeling and classification procedure based on a KAM using selective multiresolution. This procedure (a) is dynamic in the number of models it produces; (b) has a hierarchical structure; and (c) is computationally efficient. SMR was tested on an OCR problem, in which 200 fonts and 110,000 training samples were used to represent the inherent variability in character shape. Test results using the neural network based SMR approach outperformed both the conventional ISODATA approach, the KAM and the multilayer perceptron of Sabourin and Mitiche (1991). REFERENCES Andrews, H. C. (1972). Mathematical techniques in pattern recognition. New York:Wiley-lnterscience. Ang~niol, B., Vaubois,G., & Le Texier.J. Y. (1988). Self-organizing featuremapsand the travellingsalesman.NeuralNetworks, 1,289294. Carpenter, G., & Grossberg, S. (1987). A massivelyparallel architecture for a self-organizingneural pattern recognitionmachine. Computer Vision, Graphics and Image Processing, 31, 54-115. Duda, R. O., & Hart, P. E. (1973). Pattern classificationand scene analysis. New York:John Wiley& Sons. Duffle,P. K. (1985). Contour elastic matching for omni:font character

Modeling and Classification o f Shape

283 Luttrell, S. P. (1988). Self-organizing multi-layer topographic mappings. IEEE International Conference on Neural Networks, 1, 93. Mitiche, A., Laganiere, R., & Henderson, T. C. (1990). Multisensor information integration for object identification. In Multisensor Fusion for Computer Vision: Proceedings of the NATO Advanced Research Workshop on Multisensor Fusion for Computer Vision, Grenoble, France, June, 1990. Berlin: Springer Vedag. Nasrabadi, N. M., & Feng, Y. ( 1988 ). Vector quantization of images based upon Kohonen self-organizing feature maps. IEEE International Conference on Neural Networks, l, 101. Ritter, H., & Schulten, K. (1986). On the stationary state of Kohonen's self-organizing sensory mapping. Biological Cybernetics, 54, 99106. Ritter, H., & Schulten, K. (1988). Kohonen's self-organizing maps: Exploring their computational capabilities. IEEE International Conference on Neural Networks, 1, 109. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing, Vol. 1 and H. Cambridge, MA: MIT Press. Sabourin, M., & Mitiche, A. (1991 ). Optical character recognition by a neural network. Neuro-Nfmes 91. Nfmes, France, 135-148. Tappert, C. C. ( 1982 ). Cursive script recognition by elastic matching. IBM Journal of Research and Development, 10, 765-77 I.

recognition. Masters of Engineering Thesis, Department of Electrical Engineering, McGill University, Montreal, Canada. Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences USA, 81, 30883092. Kohonen, T. ( 1988 ). Self-organization and associative memory (2rid ed.) Berlin: Springer-Verlag. Koml6s, J., & Paturi, R. (1988). Convergence results in an associative memory model. Neural Networks, 1, 239-250. Le Bail, E., & Mitiche, A. (1989). Quantification vectorielle d'images par le r6seau neuronal de Kohonen. Traitement du Signal, 6, 529539. Le Cun, Y., Matan, O., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., Jackel, L. D., & Baird, H. S. (1990). Handwritten zip code recognition with multilayer networks. IAPR Proceedings of the International Conference on Pattern Recognition, Atlantic City, NJ, 35-40. Linde, Y., Buzo, A., & Gray, R. M. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28, 8495. Lippman, R. E ( 1987, April). An introduction to computing with neural networks. IEEE ASSP Magazine, 4-22. APPENDIX:

TRAINING

RESULTS

FOR

SELECTIVE

MULTIRESOLUTION

TABLE A1 Results of Selective Multiresolution. Characters in Each Set of Screening Features Were Trained Separately. The Screening Features Used Are Coreline, Genus and Number of Components. A Genus of 2* Refers to a Character with Two Holes Aligned Horizontally. The Performance is Measured by Determining the Pudty of Each Sub-Network, i.e., the Fraction of Patterns Associated with Each Model with Consistent Labelling Information. Note that All Character Confusions, Such as O0, and ", Are Included in the Error Count, Even Though They Have Similar Shapes.

Selective Multiresolution Training Results Network 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 TOTAL

Screening Features 0 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 5 5 6

0 0 1 0 2 0 3 0 4 0 2* 0 0 2 0 0 1 0 2 0 3 0 2* 0 0 2 0 0 1 0 2 0 2* 0 0 2 0 0 1 0 2 0 0 2 0 0 0 0 1 0 0 0

# Patterns

# Models

# Confusion

Performance

31467 26167 1253 16 13 243 220 25900 7797 673 35 77 6383 1821 3171 1200 60 24 370 154 26 236 2091 489 12 467

118 98 24 6 4 14 23 275 111 20 10 22 83 36 20 10 11 6 39 13 5 3 22 14 5 24

433 157 18 3 4 13 30 1104 316 17 11 17 82 25 56 6 4 1 19 6 2 3 42 9 1 99

98.6% 99.4% 98.6% 81.3% 69.2% 94.7% 86.4% 95.4% 95.9% 97.4% 68.6% 76.6% 98.7% 98.6% 98.2% 99.5% 93.3% 95.8% 94.9% 96.1% 92.3% 98.7% 98.0% 98.1% 91.7% 81.7%

110403

1150

2623

97.7%