Int. J. Man-Machine Studies (1990) 33, 41-61
Font recognition by a neural network MING-CHIH LEE and WILLIAM J. B. OLDIIAM
Computer Science Department, Texas Tech University, Lubbock, T X 79409, USA (Received 8 November 1988, attd accepted in revised form 12 May 1989) Two neural network models, labelled Model H-H1 and Model H-H2 by Hogg and Huberman have been successfully applied to recognize 26 English capital letters, each with six font representations. These two models are very similar, but Model H-H2 has the capability for modification of the basins of attraction during the training phase, whereas Model H-Ill does not. This appears to be a desirable feature for a neural network. It is shown in this work that this is indeed true. In either model, it is difficult to find a single set of parameters for one network or memory that can distinguish all of the characters. Therefore, a cascade of memories is utilized. Thus, in the training phase, a decision tree is built by cascading the memory matrices that represent the models. That is successive layers of refinement in selection of basins of attraction are used to generate output patterns unique to each input pattern. In the recognition phase, the subject characters are recognized by searching in the tree. Model parameters such as memory array size, Sm~,Sma,, and Mm~, Mm= were varied to elucidate the models' behavior. It is shown that there exist parameter values for both models to achieve a 100% recognition rate when all six fonts are used both as the training and the recognition set. Model H-H2 significantly outperformed Model H-HI in terms of recognition rate, use of memory space, and learning speed when all six fonts were used as the training set.
1. Introduction Machine recognition of characters continues to be a problem even when the n u m b e r of characters is limited and the characters restricted to machine printed characters (Kahan, Paulidis Baird, 1987). Although much research has been directed at this problem, improved methods are still required. Much effort has been directed at machine recognition of handwritten characters. This body of literature has been reviewed in Mori, Y a m a m o t o and Yasuda (1984), and in Suen, Berthod and Mou (1980). The purpose of this paper is to present results of the recognition capability of two neural networks on machine printed fonts of the 26 English capital letters. The success on this set should encourage investigation into larger and more complicated sets of data. From among the various neural net models, two models that were developed by Hogg and H u b e r m a n (1984) and Hogg and H u b e r m a n (1985a), (Model H - H I and Model H-H2) were chosen for this effort. These models are used here because of their simplicity, flow forward, synchronous properties and their pattern recognition capabilities. These models are dissipative dynamic systems and they map many inputs to a few final stable states. T h e final states are called fixed point attractors. The set of inputs that map into a given output defines the basin of attractor for that output. Our purpose is to further understand the properties of those models and to investigate the usefulness of the ability to manipulate the basins of attraction of Model H-H2. The behaviour of Model H - H I has been investigated by several studies. 41 0020-7373/90/010041 + 21503.00/0
9 1990 Academic Press Limited
42
M. LEE AND W. J. B. OLDIIAM
These studies revealed that this model has a self-repairing capability (Huberman & Hogg, 1984) and conditional learning capability (Hogg & Huberman, 1985b). Model H-H2 can dynamically modify the basins of attractions to include or exclude a particular set of inputs by using a coalescence process or a dissociation process. The capability of dynamically changing the basins of attractors opens a new research area. As Hogg and Huberman (1985a) suggested, the coalescence and dissociation processes provide a flexible way to transform desired groupings of inputs into specific outputs. These processes are particularly useful in font recognition. Each English letter has many font representations. Although fonts for a particular letter differ from one another, the overall images are recognized by humans as the same letter. In other words, letters in various fonts are treated as being in the same equivalence class. In this study, six fonts were selected for the 26 capital letters of the English alphabet. The fonts, which included (1) Courier, (2) New York, (3) Chicago, (4) Geneva, (5) Times and (6) Venice, were generated for each letter by a Macintosh computer. Figure 1 lists the 156 target characters. Hogg and Huberman models, which are flow forward and synchronous networks, were developed in 1984 (Model H-HI) and 1985 (Model H-H2). The architectures of Model H-HI and Model H-H2 are the same and each can be represented by rectangular matrices (memory matrices) that consist of M rows and N columns of identical processors, each of which is locally connected to its neighbours. Each element (i,j) has a value stored in it which allows it to adapt to its local environment. The overall input and output take place at the edges of the matrix, with the upper edge of the matrix (the first row) for input and the lower edge (the Courier A B C D E F G H I J K L M N O P QRS T U V W X Y Z
New York ABCDEtK3HIJKLMNOPQRSTUVWXYZ
Chicago ABCDEFGHIJKLMNOPQRSTUVWXYZ Geneva ABCDEFGHUKLMNOPQRSTUVWXYZ Times
ABCDEFGHUKLMNOPQRSTUVWXYZ Venice ABCDEFGHIJ KLMNOPQRSTUV wxyz FIGURE 1.
The 156 target characters.
FONT RECOGNITION
43
last row) for output. Each element receives two integer inputs from its neighbours along the memory's diagonals in the preceding row and produces an integer output which in turn becomes an input to the nodes along the diagonals in the following row as shown in Figure 2. The hard limiter, of the form -- x
x
= Xm=
X > Xm~
= Xmi.
X
is used to constrain the output values to lie within a range, namely [Smln,Sm=]. Likewise, the memory values are limited within [Mini,, Mm=]. Those output values that are equal to the extremes of the range are said to be saturated. Let ILq(k) and IRq(k) be the inputs to the element in the ith row and the ith column after the kth time step and let Oq(k) be the output value for the ith row, jth column element. The connections between elements are definedby these relations. zLi ( ) = o , _ i . j _ , ( k ) , IRq(k) = O,_,.j+z(k).
where l <- i <- M and i <-j <- N
12
I1
(A)
(B)
(D)
FIGURE2. The Hoggand Huberman model.
44
M. LEE AND W. J. B. OLDIIAM
The boundaries of the matrix, for the top, bottom, and side edges are specified respectively as Ooj(k) =
O,.,,i(k) = Ri(k), =
o,.(k).
O,..+,(k) = O,,(k). where S(k) is the external input signal to the matrix at step k, and R(k) is the resulting output vector. The two models employ different output functions and learning rules which are described next: 1.1. MODEL H-ttl The output from each element for the k + I step is computed as
Oo(k + 1) = max {S.~n, MIN (Sm~, Mo(k)[ILo(k ) - IRo(k)])} This rule enhances the differences of the inputs by multiplying them by the memory value. The limiting process keeps the values within the specified interval. The memory values are updated by the following rule: if Oo(k ) > Oi.j-l(k) and Oo(k ) > Oi.i+,(k) then Mo(k ) = max {Mmin, MIN (Mmax,
Mij(k-l)+l)} else if Oo(k ) < O,j_~(k) and Oo(k ) < O,.j+~(k) then Mo(k ) = max {Mmin, MIN (Minas, Mo(k-l)-l)} else Mo(k ) = else Mo(k - 1) 1.2. MODEL It-tt2 The output function for the second model is:
Oo(k + 1) = max (Smi,, MIN (S,,a~, S(1Lo(k ), IRo(k))* (I/R,;(k)l + I/Lo(k )l) + Mo(k ) ) ) where for even rows, if ILo(k ) is zero, S(1Lo(k), IRo(k)) is the sign of 1Ro(k), otherwise the sign of ILo(k); and for odd rows the roles of ILo(k ) and 1Ro(k ) are reversed. Learning rules for the second model are coalescence rule and dissociation rule. The coalescence rule is capable of dynamically associating the basins of two or more attractors to produce the same attractor. Figure 3 pictorically demonstrates the basins of two attractors before and after the process according to the rule listed below: if at least one of Oo(k - 1) and Oo(k ) is not saturated and Oo(k ) * Oo(k - 1) < 0 then change M o by 1, with the sign of the change given by the sign of the output with largest magnitude else M~/is unchanged. The dissociation rule is used to separate the inputs which initially map into the same output. The expanding rule operates opposite to the coalescence rule: if at least one of Oo(k - 1) and Oo(k ) is not saturated and Oo(k ) * Oo(k - 1) > 0 then change M o by 1, with the sign of the change opposite of that of their output else Mij is unchanged.
45
FONT RECOGNITION
(b)
(a)
9
~
I
FIGURE 3. Basins of attractor change after the coalescence process.
In the training phase, a set of training patterns are submitted to the models, periodically, as a pipeline. Then the outputs and the memory values are updated according to the various rules. The model is said to be stabilized if the output values for the input patterns do not change. This change is evaluated over two full submissions of the patterns. Once trained, the memory states will be fixed and locked. The official outputs are then obtained and stored by running the inputs through the model one more time. This has the effect of fixing the output patterns after the memory is locked and barred against matrix changes to the output patterns due to one or more memory element changing in the later training phase. In the recognition phase, a character is sent to the model for classification. By comparing the outputs with the official outputs, the model is capable of determining whether the input is one of the trained patterns. In this effort, exact matches of the output patterns are required for classification.
2. Approach One principal difficulty encountered in the pattern recognition problems is finding a way to preprocess or represent the patterns in a formalized manner to the network. In character recognition systems, a character is first normalized (e.g. aligned in position), then preprocessed for feature extraction, and then classified. This is our approach as well. The preprocessing here is very similar to that used in Fujii and Morita's (1971)study except that we used the pseudoinverse rather than the algebraic formulism of that work. The preprocessing is described below: 2.1. NORMALIZING FONTS AND G E N E R A T I N G C H A R A C r E R MATRICES
One hundred and fifty-six capitalized English characters were translated into 18 x 18 character matrices. In the character matrices, O's represent the background and l's represent the character image. The thickness of the line of the character is one unit. 2.2. SELECTING C H A R A C T E R PROPERTIES AND G E N E R A T I N G PROPERTY MATRICES:
The inputs to models H-H1 and H-H2 are one-dimensional vectors. Therefore, the character matrices (18 x 18) have to be transferred to vector representations. To
46
M. LEE AND W. J. B. OLDIIAM
accomplish this, fourteen selected properties of Fujii and Morita (1971) that constitute the basic building blocks of a character were extracted from each character matrix. Each property was represented by a 3 • 3 matrix. The selected character properties and their corresponding property matrix are shown in Figure 4. The fourteen properties are extracted from each character to build a 14-tuple (the character vector C) in which ci is associated with the ith property. A combined property matrix X is a 14 x 9 matrix which represents the 14 selected character properties9 Each row of X is a 1 • 9 vector x~ that represents a character property (see Figure 5). The nine elements in x~ are obtained by chaining the three rows of the 3 x 3, ith property matrix into a 1 • 9 vector. For example:
E10] 0 1 0 0
=>xi=[1
0
0
0
1
0
0
0
1]
To be specific, X is constructed by putting the vector corresponding to the first property (xt) in the first row of X, the vector corresponding to the second property (x2) in the second row of X, and so on. That is, Xl
000
1
1
1 0
X2
010
0
1 0
X3
100
0
1 0 0 0 1
XI3
00.1
0
0
_x14 _
000
0
1 0
0
0
0-
1 0
X=
1
2
3
4
5
1 0 0
1 0 0
0
6
7
O0
0 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1
91 1
1 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1
0 0 0
01
8
0
001
9
10
1
1
10
1001 11
011
01 12
1 1
1 1 1 1
13
1 0 0 0 0 1
14
111
11
000
100
001
001
100
001
0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
1 0 0 1 0 0 0 0 1 0 1 0
FIGURE 4. T h e selected properties and their matrix representations.
47
FONT RECOGNITION
Window 0 -~0
0
0 -~0
0
0
0
0
010
0
0 I 0
1
1
1
1
0
0
0
0
0
0
0
0
0
~)
0
0
0
0
0
OtO
0
0
0
0
0
0
0
o
o
o
o
o
o
o
I
0
0
0
0
0
OlOLLO_l_o_,o
o
o
o
o
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
r:,ool
0
=
:
J
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
O" 0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
, , , , ,
o
o
o
I
I
|
olo;2_?1~ I
Search
0 0
I
:
/
matrix
FIGURE 5. Thc search matrix and the windows.
2.3. COMPUTING TIIE H L T E R MATRIX
The property recognition matrix (Y) and the filter matrix (W) are developed below. The property recognition matrix (Y) is a 14 x 9 matrix which represents a recognition matrix for the property matrix. Matrix Y is an arbitrarily chosen matrix that is constructed as simply as possible. We used
Y=
Ytl
1 0
Y21 Y31 ii
0
0
0
0
0
0
0-
0
1 0
0
0
0
0
0
0
0
0
1 0
0
0
0
0
0
0
0
0
1
1 0
0
0
0
0
0
0
0
1
0
0
0
=
!!
Y=3[ I
_YI4]
1
The filter matrix (W) is a 9 x 9 matrix which maps X (the combined property matrix) to Y. Their relationship is Y=XW
48
M. LEE AND W. J. B. OLD|IAM
Given X and Y, W is equal to X - ~ Y , if X and Y are square and non-singular matrices. In this case, X is a 14 x 9 rectangular matrix (there are more equations than unknown). Therefore, W is over-determined and no unique solution exists. The minimum squared error (MSE) technique was adopted to approximate W. The MSE procedure minimizes the squared error between Y and XW. Using this procedure, W = X + Y where X § is the pseudoinverse of X. After W is computed, Y is recomputed as the product of X and W. 2.4. EXTRACTING PROPERTIES AND CONSTRUCTING C t t A R A C T E R VEC'I'ORS
The purpose of this step is to extract the selected properties from the character matrix and to construct the character vector C. To extract character properties, let A be the search area (character matrix) and let w be a 3 • 3 window matrix as shown in Figure 5. The window starts moving from the left upper corner of the search area and slides across the pattern. Let z' be a 1 x 9 vector whose elements are obtained by chaining the 3-row elements in w. For example: w--
0 1 0 0
~z'=[1
0
0
0
1 0
0 0
1]
Then z' is recognized as one of the selected properties if it maps to y ' (y' = z ' W ) which is one of the rows in Y. If z' is recognized, say y ' =Yk, then the kth element (ck) in the count vector is incremented by a weighting factor. The weighting factor is a function of property as well as the property location. For those properties that occur rarely, such as property cross, their locations in the character matrices were identified by assigning different weights. The values of the weights are arbitrarily assigned and have no meaning other than that they were found to be useful for distinguishing purposes. To determine the weighting factor, the character matrix was divided into nine 6 x 6 portions with the left upper portion assigned score 1, 2 to its right and so forth (see Figure 6). Table 1 lists the weighting factors. On the other hand, for those character properties that occur frequently, their locations were ignored. One was assigned to every occurrence of the property no matter where it is. The character vector C is a 1 x 14 vector (C = [ct, c2. . . . , ct4]) and each element in C is an accumulated score for its corresponding property. The character vector for each character matrix is extracted according to the following algorithm: (1) Put the window at the upper left corner of the character matrix. (2) Repeat the following processes until the third row of the window coincides with the last row of the character matrix (3) Repeat the following processes until the right margin of the window coincides with the right margin of the character matrix (4) z' is constructed by chaining the 3 rows in the window to form a 1 x 9 vector (5) y' = z' W (6) i f y ' exactly matches one of the rows in Y, say y ' =Yk, then Ck is incremented by the weighting factor of the kth property
FONT RECOGNITION
49
Location
1
Location
2
Location
3
Location
4
Location
5
Location
6
7
Location
8
Location
Location
9
FIGURE6. Location assignmcnt in the character matrix. (7) move the window one column to its right (8) go to step 3 (9) move the window one row below and start form the left margin of the character matrix (10) go to step 2 The encoding scheme is affected by the relative position of the properties. Also, the thickness of the line of the character is restricted to be one unit. Only the sign distribution of the inputs affects the output of Model H-H2. To TABLE 1
Assignment of weightingfactor Lo~tion
1
2
3
4
5
6
7
8
9
Property 1 Property 2 Property 3 Property 4 Property5 Property 6 Property 7 Property 8 Property 9 Property 10 Property 11 Property 12 Property 13 Property 14
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 2 2 2 2 2 1 1 1 1 1
1 I 1 1 3 3 3 3 3 1 1 1 1 1
1 1 1 1 4 4 4 4 4 1 1 1 1 1
1 1 1 1 5 5 5 5 5 i 1 1 1 1
1 1 1 1 6 6 6 6 6 1 1 1 1 1
1 1 1 1 7 7 7 7 7 1 1 1 1 1
1 1 1 1 8 8 8 8 8 1 1 1 1 1
1 1 1 I 9 9 9 9 9 1 1 1 1 1
50
M. LEE AND W. J. B. OLDIIAM
increase the variabilities of Model H-H2 output, the character vector (all of its elements are positive before the process) is modified according to the following rules: do i = 1 to 14 by 1 if threshold (i) is zero and C(i) is also zero then assign - 4 to C(i); if threshold is zero but C(i) is not zero then C(i) is unchanged; otherwise, if threshold (i) is not zero, then C ( i ) = C(i)-threshold (i) end i where threshold=(7 7 4 4 0 0 0 0 0 0 0 0 0 4). The values of the threshold vector were chosen according to the frequency occurrence of the selected character properties for the 156 characters. The codes that go through the above processes are the inputs to the models. During the first part of the simulations, the training set was composed of the vector representations of 78 characters: three out of the six fonts for each letter of the alphabet were randomly selected. During the second part of the simulations, all of the 156 characters were used as the training set. For both simulations, all of the 156 character were the candidates for the recognition set. For Model H-HI, the training set was fed to the model, repeatedly, during the training phase. The model computed the outputs and adjusted the memory matrix values, according to its output function and the learning rule until there was convergence. After the model stabilized, the memory matrix values were fixed and the official outputs were generated by running the model one more time, using the training set. The model recorded the outputs as well as their associated letters. Thus the inputs were divided into several categories. In general, several letters fell into the same categories. This happened when more than two characters created the same output. Figure 7 shows an example of the decision tree. In this example, character E and character F produced the same output. The learning process was started again to create a child model using the vector representations of E's and F's as inputs. Another memory matrix was created to distinguish E's from F's. This process continued recursively, to build the decision tree until either each output was associated with only one character or the depth of the tree reached 7. It was learned through experience that a depth of six of the tree was adequate most of the time. If further depth was required, then it was unlikely that convergence would be obtained. Therefore, the depth of the decision tree was limited to 6. For Model H-H2, three characters (three fonts for the same letter) of the training set were submitted to the model at one time. The model computed the outputs and adjusted the memory matrix values based on the output function and coalescence rule. After Model H-H2 converged, three vectors that represented another letter were sent to the model and the previous process was started over again. After all of the 78 characters were learned, the values of the memory matrix were fixed. The training set was submitted to the model to obtain the official outputs. The model recorded the outputs as well as their associated letters. Normally, the model divided the inputs into several categories with those that produced the same output in one category. If two or more letters fell into the same category, the vector representations for those letters were again submitted to the model. The child model was created to discriminate the different letters that fell into one category by adopting the dissociation rule. These processes were run recursively to build the decision tree until either each output was associated with only one character or the depth of the
FONT RECOGNITION
51
Memory
Matrix
I Memory Matrix Memory Matrix
9 I
Memory Matrix I
Memory
Matrix
i
C
G
0
Q
FIGURE 7. An example of the decision tree.
tree reached 7. When building the decision tree, the coalescence rule and the dissociation rule were used alternatively, with the odd depth for the coalescence rule and the even depth for the dissociation rule. Again, the depth of the decision tree was limited to 6 as explained above. In the recognition process, all 156 codes were input to the models. The recognition set was submitted to the memory matrix in the root of the decision tree. If the output exactly matched one of the outputs previously recorded and the output was only associated with one letter, then the model indicated that the input was the recorded character. On the other hand, if the output matched one of the outputs but more than two letters were associated with it, the input code was sent to its child memory matrix for further process. The searching started from the root of the tree, and it stopped either when the memory matrix was a leaf node or the output did not match any of the outputs. The parameters, such as memory matrix size, Smin-Sm~,, Mmin-Mm~, w e r e manipulated to determine the models' performance. The memory matrix sizes were set to be 4 x 14, 6 x 14, 8 x 14, and 10 x 14. Sm~n-Sm~,were assigned to be +12, 4-15, and 4-18. For H-HI, Mm~ was set to be 4, 6 and 8 with Mmi, fixed to 1; for Model H-H2, Mmin-Mm~ was set to be +8, 4-10, and +12. Thus 36 observations were obtained for each model. These levels were determined by expanding the levels used by Hogg and Huberman (1984). Table 2 presents the levels of parameters.
52
M. LEE A N D W. J. B. O L D H A M
TABLE 2
Levels of parameters Model H-H1
Parameter Memory matrix size
Smin_Smlx M,;,_M,,, H-H2
Memory matrix size
S,~_Sm~, Mm~_Mm~
Levels 3
1
2
4
4x14 -I-12 1_4
6• "t-15 1_6
8x14 +18 1_8
10x14
4x14 +12 +8
6x14 4-15 4-10
8x14 4-18 4-12
10x14
The performances of these models were determined using the following criteria: accuracy, required memory space, learning speed, and recognition speed. These four criteria are defined below: (1) The accuracy is scored as the fraction of correctly recognized characters with respect to the 156 characters. (2) The required memory space is defined by the number of memory matrices used to build the decision tree. (3) The learning speed is determined by the number of iterations required to build the decision tree. (4) The recognition speed is determined by the number of times the input data have to be submitted to the memory matrices in the decision tree in order to recognize 156 characters.
3. Simulation results Tables 3-6 give the results for all of the simulation results measured as defined above. These results are discussed below. A total of 36 sets of computer output were generated from model H-HI, using the following model parameters of Table 2. Table 3 lists the resulting recognition rates, the number of memory matrices used, the learning speeds, and the recognition when trained on three of six fonts. Across all variables, the mean recognition rate was 82.15% ( S . D . = 2 . 3 1 % ; it ranged from 77.56 to 87.82%. On the average, 12.86 memory matrices were needed in the learning phase; the number ranged from 2 to 23. The mean learning speed was 707-14 (S.D. - 215.79) iterations with the highest speed at 258 iterations and the speed at 1143 iterations. As for recognition speed, the mean was 216 (S.D. = 26.15) and the range was 161 to 263. Overall, the program averaged 5-385 seconds of CPU time on the VAX 8650. For comparison purposes, other computer simulations were conducted using all of the letters (156) as the training and the recognition set. Table 4 presents the 36 observations of Model H-HI for recognition rate, memory space used, learning speed, and recognition rate (S.D. = 2-50%) was 96.07% with a maximum of 100% and minimum of 89.1%. For number of memory matrices used, the mean was 37-17 (S.D. = 25.75) and the range was from 8 to 104. The mean for learning speed was 2937 iterations (S.D. = 1794.03 with the highest speed at 852 iterations and the lowest speed at 8148 iterations. The mean recognition speed was 255.86 (S.D. = 40.11) times; the range was from 194 to 340 times.
FONT RECOGNITION
53 TABLE 3
Model performance (Model H-HI; trained three fonts) Memorysize
Sm~Sm~
Mmax
Recognition rate
Memory space
Learning speed
Recognition speed
4x 4x 4x 4x 4x 4x 4x 4x 4•
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 3 6 6 2 7 6
82.05 82.69 83.97 77-56 82.05 83.97 81.41 81.41 80.13
4 9 8 3 6 6 2 7 6
306 525 510 369 594 522 258 543 750
178 214 210 166 184 185 161 195 195
6x 6• 6• 6• 6• 6• 6• 6• 6•
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 4 6 8 4 6 8
81.41 84.61 83.97 78.85 78.20 83.97 78.20 80.77 85.90
7 9 14 10 9 14 14 10 11
401 609 739 566 693 697 568 638 733
194 205 241 209 204 229 222 217 214
8• 8• 8• 8• 8x 8• 8• 8x 8•
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 4 6 8 4 6 8
80-13 83.33 83.97 83.97 81-41 83.97 87.82 81.41 81.41
21 17 14 6 14 23 16 23 23
834 857 752 492 728 965 751 1085 1097
231 213 233 186 198 235 211 251 249
10 • 10 x 10 • 10 • 10 • 10 • 10 • 10 • 10 x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 4 6 8 4 6 8
80.77 84.61 84.61 80.13 81.41 82.69 82.05 78.85 83.93
21 7 14 15 12 19 16 20 23
852 891 816 771 672 945 798 987 1143
220 213 233 245 209 251 263 254 259
Thirty-six sets of c o m p u t e r output were g e n e r a t e d using M o d e l H - H 2 , using the p a r a m e t e r values of T a b l e 2. F o r training on three fonts, the results are given in Table 5. Overall, the mean recognition rate was 83-67% (S.D. = 4 . 1 7 % ) of the recognition set. It ranged from 72.43 to 89.10%. O n average, 13.17 m e m o r y matrices were n e e d e d in the learning phase; this r a n g e d from 4 to 29. T h e m e a n learning s p e e d was 654.81 (S.D. = 255.75) iterations with the highest s p e e d at 316 iterations a n d the lowest speed at 1418 iterations. F o r recognition s p e e d , the m e a n was 283.67 times
54
M. LEE AND W. J. B. OLDHAM TABLE 4
Model performance (Model H-HI; trained six fonts) Memorysize
Sm~,Sm,~
M~
Recognition rate
Memory space
Learning speed
Recognition speed
4x 4x 4x 4x 4x 4x 4x 4x 4•
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 4 6 8 4 6 8
98.72 97-43 98.72 99.36 97-73 98-72 100.00 98.08 98-72
19 18 14 17 48 15 8 20 16
1461 1410 1458 1299 2940 1587 852 1680 1494
205 224 201 208 243 205 194 228 214
6x 6x 6x 6x 6x 6x 6• 6x 6•
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 4 6 8 4 6 8
96.15 95-51 97.43 97.43 95.51 97.43 98.72 96.79 98-72
33 28 16 27 29 17 16 27 14
2163 2042 1736 2157 2089 1831 1424 2103 1594
272 280 241 252 276 245 223 269 217
8x 8x 8x 8x 8x 8x 8x 8x 8x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 4 6 8 4 6 8
98.74 92.95 94.23 90.38 92.95 94.87 89.10 94.23 96-79
96 60 21 80 61 14 60 98 23
7158 4392 2085 5630 4411 1760 4350 7304 2513
338 302 260 332 310 239 340 284 240
10 x 10 x 10 x 10 x I0 x 10 x 10 x 10 x 10 •
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
4 6 8 4 6 8 4 6 8
91-67 94.87 96.79 94.23 94.87 96.79 93-59 95.51 94.87
104 45 48 56 45 45 30 49 22
8148 3303 3312 4008 3651 3513 2490 3825 2574
305 243 239 301 256 236 302 253 234
(S.D. = 38.46); it ranged from 194 to 346. The average C P U time was 6.47 seconds for Model H-H2 to complete the recognition process on the V A X 8650. Table 6 presents the 36 observations of Model H - H 2 for recognition rate, m e m o r y space used, learning speed, and recognition speed, when all 156 characters were submitted as both the training set and the recognition set. The mean recognition rate was 99-41% (S.D. = 7 . 2 4 % ) with the highest at 100% and lowest at 96.15%. This suggested that learning was perfect only for some parameter combinations. For number of m e m o r y matrices used, the mean for learning speed was 1285.78
55
FONT RECOGNITION
TABLE 5
Model performance (Model 11-H2; trained three fonts) Memorysize 4 4 4 4 4 4 4 4 4
Smi,Sin,,
M,,~,_ Mm~
Recognition rate
Memory space
Learning speed
Recognition speed
x x x x x x x x x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 10 12 8 10 12 8 10 12
76.92 80.13 75.00 78-20 76.92 83.97 78.85 79.49 81.41
10 6 9 5 11 5 7 7 4
431 327 427 392 356 316 397 492 369
273 238 270 194 224 204 203 273 249
6x 6x 6x 6x 6x 6x 6x 6x 6x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 10 12 8 10 12 8 10 12
85.26 83.97 72.43 85.26 84.61 86.54 82.69 82.05 81.41
14 17 4 15 6 16 13 12 11
541 687 365 643 552 805 624 577 621
302 306 287 270 289 346 280 331 303
8x 8• 8x 8x 8• 8x 8x 8x 8x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 10 12 8 10 12 8 10 12
87.82 85.26 85.90 87.18 89.10 83.33 86.54 86.54 88.46
19 27 26 23 21 6 12 18 26
751 964 943 785 967 504 625 805 1227
297 325 315 327 312 295 288 299 324
10 • 10 x 10 x 10 x 10 x 10 x 10 x 10 x 10 x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 10 12 8 10 12 8 10 12
88.46 85.26 83.97 89.10 86.54 85.90 87-82 87.18 82.69
9 13 6 9 8 22 11 17 29
604 782 648 525 594 1033 600 876 1418
248 319 296 240 299 313 250 320 303
i t e r a t i o n s ( S . D . = 424.38) with t h e h i g h e s t s p e e d at 715 i t e r a t i o n s a n d t h e l o w e s t s p e e d at 2090 i t e r a t i o n s . T h e m e a n r e c o g n i t i o n s p e e d was 309.11 ( S . D . = 4 2 . 9 6 ) t i m e s a n d r a n g e d f r o m 178 to 382. T a b l e 3 s h o w s t h a t t h e m e a n r e c o g n i t i o n r a t e was 8 2 - 1 6 % f o r M o d e l H - H 1 a n d was 8 3 . 6 7 % w h e n t h r e e f o n t s w e r e u s e d in the t r a i n i n g set a n d six fonts w e r e u s e d in the r e c o g n i t i o n set. W h e n six fonts w e r e u s e d for b o t h t h e t r a i n i n g set as well as t h e r e c o g n i t i o n set, t h e m e a n r e c o g n i t i o n r a t e w a s 9 6 . 0 7 % f o r M o d e l H - H I a n d was 9 9 . 4 1 % for M o d e l H - H 2 . T h e m a x i m u m r e c o g n i t i o n r a t e was 100% for b o t h m o d e l s
56
M. LEE AND W. J. B. OLDtlAM TABLE 6
Model performance (Model H-H2; trained six fonts) Memorysize 4 4 4 4 4 4 4 4 4
Sml.S,.~
M,~,_ Mm.x
Recognition rate
Memory space
Leaning speed
Recognition speed
x x x x x x x x x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 I0 12 8 10 12 8 10 12
100.00 100.00 100.130 1130-00 99.36 100-00 100.00 100.00 100.130
16 6 13 16 6 11 5 9 8
824 715 718 746 794 820 746 732 717
247 293 241 246 283 300 178 261 284
6x 6x 6x 6x 6x 6x 6x 6x 6x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 10 12 8 10 12 8 I0 12
100-130 99.36 99.36 99.36 99.36 96.15 100.00 100-130 99.36
9 18 20 17 16 26 20 11 15
865 1341 1311 1235 1112 1612 1155 852 1073
292 325 282 275 309 354 295 314 330
8x 8x 8x 8x 8x 8x 8x 8x 8x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 10 12 8 10 12 8 10 12
99.36 99.36 99.36 98.72 98.08 99.36 100.00 99.36 99-36
26 30 34 18 38 25 18 26 25
1580 1839 2010 1318 2090 1684 1351 1603 1634
309 368 374 305 349 376 308 341 336
10 x 10 • 10 x 10 x 10 x 10 x 10 x 10 x 10 x
14 14 14 14 14 14 14 14 14
12 12 12 15 15 15 18 18 18
8 10 12 8 10 12 8 10 12
99.36 99.36 99.36 98.36 99.36 98.72 100.00 99.36 99.36
19 14 28 38 27 17 24 24 15
1464 1379 2057 1927 1691 1228 1392 1436 1237
321 309 361 340 382 315 282 330 313
(e.g. 4 • 14 m e m o r y m a t r i x size, Smin_Smax= + 1 8 , M m i n - 1, a n d Mm~x---4 f o r M o d e l H - H 1 ; 4 x 14 m e m o r y m a t r i x size, Smi,-Sma~ = + 1 8 , a n d Mmin_Mm~ = + 8 for M o d e l H-H2). To distinguish the performance differences between Model H-H1 and Model H - H 2 , t h e P a i r w i s e d t test was c o n d u c t e d . T a b l e 8 p r e s e n t s t h e test r e s u l t s w h e n using t h r e e o u t o f six fonts as t h e t r a i n i n g set. T h e t e s t results f a i l e d to j u s t i f y statistically t h a t t h e r e was a n y d i f f e r e n c e in r e c o g n i t i o n r a t e . O n t h e o t h e r h a n d , t h e two m o d e l s b e h a v e d significantly d i f f e r e n t l y w h e n six fonts w e r e u s e d as b o t h t h e
FONT RECOGNITION
57 TABLE 7
Model H-H1 Model H-H2
Trained Trained Trained Trained
t h r e e fonts six fonts t h r e e fonts six fonts
Mean (%)
S.D. (%)
82.16 96.07 83.67 99.41
2-31 2-59 4-17 0.72
Minimum Maximum value value (%) (%) 77.56 89.10 72-43 96.15
87.82 100.00 89-10 100.00
training set and the recognition set. The test results illustrated that the recognition rate of Model H-H2 was significantly higher (3.34% higher) than that of Model H-H1. Recognition rates among published papers are difficult to compare directly as they are, because of differences in data, number of data sets and test patterns. For comparison of the results achieved here, we have chosen three earlier papers that are in our opinion the closest to our data sets. These are discussed immediately below. A wider range of results can be found through the review articles cited above. The recognition rates of models H-H1 and H-H2 were compared to the result of Cash and Hatamian's (1987) study based on the method of moments. In this study, both H-H1 and H-H2 performed better, in terms of recognition rate, than the optical character recognition using the method of moments. However, in their study, the training and recognition sets are both six machine-printed fonts for the 62 alphanumeric characters. The six fonts used in their study were Courier, Elite, Pica, Helvetica, Memphis Medium and Times Bold Italic. In the training phase, t h e training documents were isolated by contour tracing, and then the two-dimensional moments of each character were computed and stored in a library of feature vectors. In the recognition phase, the document to be recognized was scanned, and the two-dimensional moments of its characters were compared with those in the library for classification. They gave recognition rates between 98-5 and 99-7% for all fonts tested. The Hogg and Huberman models (Model H-H1 and Model H-H2) have TABLE 8 Pairwised t test of performance of model H-H1 and model H-H2 Variable
Mean w
Diff It Diff 25
-1-5147 -3.3408"
S.E.82
tll
81-8412 -1.85 0 . 4 2 4 3 -7.87
P R > ITItt 0-0727 0.0001
t Diff 1: The differences of recognition of the two models when three fonts were used as the training set. ~tDiff 2: The differences of recognition of the two models when six fonts were used as the training set. wMean: average. $ S.E.: standard error of mean. II t: t-test score. t t PR: the probability of a greater absolute value of t.
58
M. LEE AND W. J. B. OLDIIAM
100% recognition rate for some parameter combinations (e.g. 4 x 14 memory matrix size, Smi,_Sm~x= +18, Mini, = 1, and Mmax=4 for Model H-H1; 4 • 14 memory matrix size, Sm~,_S,,ax= +18, and Mmin-Mmax = - t - 8 for Model H-H2). The performance of models H-HI and H-H2 are significantly better. This comparison was made when all of the recognition set was used as the training set, as was the case in Cash and Hatamian's (1987) study. However, the results are on a smaller pattern set and improved recognition should be expected. It is left to future work to see if similar improvements are found when a 62 character set is used. When comparing the performance of Model H-H1 and Model H-H2 with the performance of the recognition system using a similar encoding scheme, the Hogg and Huberman models outperform the Fujii and Morita's (1971) study. Fujii and Morita (1971) used a simulation of the visual nervous system as their recognition system. In their study, a cascade connection of the lateral inhibition structure and the Adaline learning system were applied to the handwritten recognition problem. Both the training and the recognition sets were handwritten numerals (0-9). Eleven properties were extracted from each of the handwritten letters. Then the adaptive classifier was trained with supervision. In the training phase, the training set (400 numerals) were submitted to the classifier and the connection weights in the classifier were adjusted accordingly. One hundred suitably selected characters from the training set were submitted to the classifier in the recognition phase. Fujii and Morita (1971) showed that the 100 suitably selected characters were identified 100% correctly. On the other hand, models H-HI and H-H2 have achieved 100% accurate recognition for all the training data. This indicates that the Hogg and Huberman models are more appropriate than the system that was based on the simulation of the visual nervous system for the font recognition application. Kahan, Pavlidis and Baird (1987) address the problem of recognition of printed characters of different fonts and size for the Roman alphabet. The 70 characters ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789&( ) - $/[ ]* were used. The data set consisted of 196,000 character images with 40 samples each of the 70 characters. A performance of better than 99.5% was achieved over a range of point sizes (12-18), for most single fonts, and for mixtures of six fonts the recognition rate fell to 97%. This system combined several techniques in order to achieve this overall recognition rate. Thinning and shape and extraction were performed. The results were mapped using a shape-clustering approach into binary features which were fed to a Bayesian Classifier. The overlay with the work here is that the Bayesian Classifier is replaced by a neural network. That is the feature extractor used in this work could be substituted for the one we used.
4. Conclusions Model H-H1 and Model H-H2 have been successfully applied to recognize the 26 capitalized English letters, each with six font representations. The conclusion of this
FONT RECOGNITION
59
study are as follows: (1) The Hogg and Huberman models (Model H-H1 and Model H-H2) are useful in font recognition. Both models achieved a 100% recognition rate for particular parameter combinations when all six fonts were submitted as both the training and the recognition set. (2) Recognition is better, when using these models, than when using the two previous methods for this problem. (a) The recognition performance of these two models are considerably better than the character recognition method using moments [6] (b) Models H - H I and H-H2 outperformed the recognition system of Fujii and Morita (1971). (3) Model H-H2 significantly outperformed Model H-H1 in terms of recognition rate, use of memory space, and learning speed when the models trained all six fonts for the 26 English letters. (4) The recognition rate for fonts that were not in the training set was also acceptable. When three out of six fonts were used for training, Model H - H I achieved a maximum recognition rate 87-82% and Model H-H2 achieved a maximum recognition rate 89.10%. This indicates that the models were capable of recognizing many of the untrained characters. This behaviour shows that the basins of attractor states existed for the letters in most of the various font presentations. (5) It is likely that the Hogg and Huberman models can be useful in other pattern recognition problems, since the fonts used in this study certainly represent a pattern class. Since Model H-HI and Model H-H2 can achieve reliable font recognition, it seems that they should be able to exhibit highly accurate recognition behaviour in other pattern recognition problems. This study applied the Hogg and Huberman models to the font recognition problem and suggests the need for work as follows: (1) This work only investigated the line characters. That is the characters that are only with one unit wide. Most fonts have a width larger than 1 for various parts of the letters. The line representation of the capital letters was chosen as the patterns for this work principally to test the Hogg and Hubermann models on this kind of problem. In order to apply to the more general problem which would include lower case letters, numerals and hand written characters, either a preprocessor to produce skeleton characters would be needed or a wider range of features would be necessary to account for the width. In either case new features will probably be required to represent additional characters. For machine generated characters it could be anticipated that recognition would remain excellent for the characters in the training set. If a preprocessor is used to produce skeleton characters from the wider character input data, noise introduced by this processing would become a factor in the recognition rate. However, it is known from previous work by Hogg and Hubermann (1984) and by Rogers and Oldham (1987) on widely different patterns that the tolerance for recognition in the presence of noise for these models is very high. That is the basins of attractions are wide enough that patterns "near-by" are attracted to the same final state. The ability to cascade the networks presented here, lends the capability of proper training even with noise. Indeed, more work needs to be done to see if the excellent results achieved here can be extended to more realistic fonts. The lower case English letters, numerals, and handwritten letters need to be investigated. We have not attempted recognition of handwritten characters at all. (2) Further improvement of model performance for the untrained fonts should be
60
M. LEE AND W. J. B. OLDHAM
attempted by increasing the number of character properties and/or by selecting different character properties. (3) Further research on how the performance is affected by the introduction of degraded or corrupted letters should also be investigated prior to committing to a system design. (4) A hardware implementation of the models appears feasible since a reliable recognition performance (100% recognition rate) has been achieved. According to the study done by McClain, Rogers and Oldham (1988), the inherent parallelism operations in the models gives them a definite speed advantage over conventional sequential computers. In their study, they showed that the performance of the Hogg and Huberman model (Model H-H1) can gain in excess of 400 times using two GAPP chips vs the VAX (11/780). A true parallel processor optimized for the H-H models would perform much faster than the G A P P chips. However, a short calculation can show that this approach is in the right direction for front recognition. The CPU time for running Model H-H1 to train 156 characters and to recognize 156 characters was 4.5s, when the memory matrix size was 4 x 6 , Sm~,_Sm~x=-t-18, Mmln = 1, and Mmax = 4. Using the results of McClain et al., it should take only ll-2ms to train and recognize 156 characters if Model H-HI is implemented in hardware. Because the learning speed vs recognition speed was 852: 194, it should take less than 2-08 ms to recognize 156 characters. Assume that an envelope has 100 characters on it (all English capital letters), an unrealistic assumption, it would take only 1-33ms to recognize the characters on the envelope. Even though the assumption of all capital letters is unrealistic as noted, the recognition speed probably would not increase more for the full character set than would be gained by using an optimal processor. This suggests that the hardware implementation of the Hogg and Huberman models can provide satisfactory speed and has potential to be used as a character recognition device.
References CASII, G. L. & HATAMIAN, M. (1987). Optical character recognition by the methods of moments. Computer Vision, Graphics, and Image Processing, 39, 291-310. Fore, K. & MORn'A, T. (1971). Recognition system for handwritten letters simulating visual nervous system. In K. S. Fo, Ed., Pattern Recognition and Machine Learning, pp. 56-69. New York: Plenum Press. HoGo, T. & ItUBERMAN, B. A. (1984). Understanding biological computation: reliable learning and recognition. In Proceedings of the National Academy of Science U.S.A.Neurobiology, 81, 6871-6875. HOGG, T. & HUBERMAN, B. A. (1985a). Parallel computing structures capable of flexible associations and recognition of fuzzy inputs. Journal of Statistical at Physics, 41,115-123. HOG(], T. d~. HUBERMANN, B. A. (1985b). Attractors on finite sets: the dissipative dynamics of computing structures. Physical Review, 32, 2338-2346. HOBERMAN, B. A. & HOGG, T. (1984). Adaptation and self-repair in parallel computing structures. Physical Review Letters, 52, 1048-1051. KAHAN, S., PAVLIDIS, T. 8~. BAIRD, H. S. (1987). On the recognition of printed characters of
any font and size. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9(2), 274-288. McCLAtN, R. A., ROGERS, C. It., & OLDltAM, W. J. B. (1988). Hardware implementation of an artificial neural network. In Proceedings of the SPIE-Symposium on Innovative Science and Technology [O-E/LASE 88], vol 4, pp. 2-15: Los Angeles, California.
FONT RECOGNITION
61
Mogx, S., YAMAMOTO & K. YASUDA, M. (1984). Research on machine recognition of handprinted characters. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(4), 386-405. ROGERS, C. H. & OLDHAM,W. J. B. (1987). Isolated word recognition with an artificial neural network. In Proceedings of the IEEE First International Conference on Neural Networks, 4, 435-442. SUEN, C. Y., BERTHOD, M. d:: MORI, S. (1980). Automatic recognition of handprinted characters--the state-of-the-art. Proceedings of the IEEE, 68(4), 469-487.