Pattern Recognition, Vol. 26, No. 12, pp. 1757 1770, 1993 Printed in Great Britain
0031 3203/93 $6.00 + .00 Pergamon Press Ltd © 1993 Pattern Recognition Society
BENGALI ALPHA-NUMERIC CHARACTER RECOGNITION USING CURVATURE FEATURES ABHIJIT DUTTA~ and SANTANUCHAUDHURY~ t Department of Computer Science, Texas A&M University, Texas, U.S.A. :~Department of Electrical Engineering,Indian Institute of Technology, Delhi, India (Received 1 July 1992; in revised form 15 June 1993;received for publication 24 June 1993)
Abstract--This paper is concerned with recognition of hand-written and/or printed multifont alphanumeric Bengali characters. It is assumed that characters are present in an isolated fashion. In the present work characters have been represented in terms of the primitives and structural constraints between the primitives imposed by the junctions present in the characters. The primitives have been characterized on the basis of the significant curvature events like curvature maxima, curvature minima and inflexion points observed along their extent. Curvature properties have been extracted after thinning the smoothed character images and filtering the thinned images using a Gaussian kernel. The unknown samples are classified using a two-stage feed forward neural net based recognition scheme. Experimental results have established the effectiveness of the technique Character recognition Backpropagation
Hand-written
Curvature feature
l. INTRODUCTION There has been particular interest over the last decade in recognition of hand-written characters, both isolated and cursive/~'2) Recognition of hand-written isolated characters and/or numerals is in itself a challenging problem because there are tremendous variations in the character shapes in the handwriting of different individuals. Even in the case of printed fonts, differences in the shapes of the characters and their ornamentations (e.g. calligraphic fonts) do not simplify the recognition problem. Also script dependent peculiarities have to be taken care of, for designing a recognition scheme for characters of different languages. In this paper we propose a scheme for recognition of handwritten and printed Bengali alpha-numeric characters. For the present work, it is assumed that the characters occur in an isolated fashion. Significant works on isolated character recognition in the recent past include the multifont character recognition scheme suggested by Kahan and Pavlidis.t31 Lam and Suen t41proposed a scheme for the recognition of hand-written numerals which consisted of a fast structural classifier and a robust recognition algorithm. Kimura and Shridhar tS) developed a statistical classification technique that utilized the histogram of the direction vectors derived from the contours. They combined this technique with a high speed accurate recognition algorithm that utilized features derived from the left and right profile of the character images. Chen and Lieh tr~ proposed a two-layer random graph based scheme which used components and strokes as primitives. Through a relaxation labeling scheme the possible variations of the character shapes were encoded in the random graphs. Gader et al/7~ used template and model matching in a two-stage recognition
Gaussian filtering
Neural networks
system. Other important works on recognition of Arabic(8) (see also reference (1)), Chinese and Japanese (cf. references (1,2)) alpha-numeric characters have also been reported in the literature. These recognition schemes, in general, relied on a heuristically chosen set of features and lacked any mechanism for learning the characteristics of the fonts. Among the Indian scripts, notable work has been done on recognition of Devnagari characters by Sinha and Mahabala/9'1°~ They had also suggested contextual post processing for Devnagari character recognition and text understanding. Some attempts have been also made for recognition of Tamil,tl 1~Telugu,tl 2~and Bengali1~3'14~characters. Roy and Chatterjee t~3~ presented a nearest neighbor classifier for Bengali characters employing features extracted by a string connectivity criterion. Dutta t14~presented a formal approach for generation and analysis of Bengali and Hindi characters. These schemes were applicable for printed character recognition. No work on recognition of handwritten Bengali alpha-numeric characters have been reported so far in the literature. The recognition scheme proposed in this paper can work with both hand-written and printed alphanumeric Bengali characters. Features have been selected on the basis of a careful study of the nature of the shape of the Bengali characters. Curvature related characteristics have been used as features. These features are expected to remain invariant over different fonts and writing styles. Further, use of the backpropagation based learning scheme in the recognition strategy enables the system to learn from examples. The generalizing capability of this learning scheme has been harnessed to achieve writer and font invariant recognition of the characters. In the next section we present an overview of the
1757
1758
A. DUTTAand S. CHAUDPIURY
scheme. In Section 3 we discuss the techniques used for pre-processing of the character images. Section 4 is concerned with the feature extraction strategies. In Section 5 the basic recognition scheme has been presented. Results and implementationai details are presented in Section 6. Conclusions are contained in Section 7.
2. OVERVIEW OF THE SCHEME
The basic objective of the present scheme is to suggest a mechanism for recognition of Bengali alphanumeric characters drawn from various fonts of print and different writing styles. In Fig. 1 a complete set of Bengali alphabetic characters is presented. Character images shown in Fig. 2 are a representative sample of the characters and numerals considered for the present work. Based on these samples we can make the following observations: • In most of the characters there exists a meeting point of two or more branches. These junction points are invariably present in all kinds of samples and the number of branches meeting at those points is the same for the different samples of the same character. • In many cases, two branches meeting at the junction appear as an extension of one another because there does not exist any sharp change in curvature across the junction for such combinations (see dotted portion of the character in Fig. 3). On observation of the writing pattern, it is found that these combinations are generated in one single movement of the pen. Other branches of the junction are generated separately. The approximate relative position of the junction in these branches and/or combination of the branches is more or less the same in different samples of the same character. Hence, the structural property of a character is encoded in the connectivity pattern of the branches at the junctions. • The branches meeting at the junction show a distinct pattern of significant curvature events along their length. Curvature maxima, curvature minima, and inflexion points have been considered as the significant curvature events. If we consider the loops present in the characters of the first three columns of Fig. 2(a), we find, in all the cases, presence of either three or four curvature maxima irrespective of the font used. A similar observation can be made about the hand-written characters shown in Fig. 2(b) which contain such loops. Loops in the Bengali character set can be, therefore, characterized by the presence of three or four curvature maxima. Also, the relative position of these maxima remain approximately invariant. Similarly, for the starred numeral shown in Fig. l(c), one curvature minima, one maxima and an inflexion point are invariably present in all the samples obtained from the handwriting of different individuals. This observation can be generalized for all the alpha-numeric characters of Bengali (see the set shown in Fig. 1). Hence, these patterns of the curvature events are the signatures for the different primitives which constitute the characters.
°
°
Fig. 1. Complete Bengali alphabet set.
Motivated by the above observation, in the present scheme, characters have been represented in terms of the structural constraints imposed by the junctions and the primitives meeting at the junctions. These primitives have been represented by the pattern of curvature events observed along their length. This approach makes the representational scheme relatively invariant to the changes in the character shapes in different handwriting or fonts. Permissible variations are assimilated in the recognition scheme through the learning mechanism. The recognition module has been designed around a feed forward neural net. The structural constraints imposed by the junctions have been encoded in the topology of the network itself. Consequently, advantages of a feature vector based classification scheme
Bengali alpha-numeric character recognition using curvature features using neural networks have been successfully combined with the flexibility and generality of the structural constraints based approach. In the present system, character images have been obtained by optical scanning of the character-impressions on plain paper. The character images are preprocessed for removal of noise, thinned, and filtered for elimination of extraneous curvature events. Junctions and primitives are extracted from these images and their feature vectors are generated. The recognition module is constructed through a process of knowledge acquisition. In the knowledge acquisition phase, characteristics of different primitives and junctions present
1759
in the different characters are learned by the backpropagation learning algorithm and encoded in the weights of a network termed the classification net. The output of the classification net is fed to the recognition network. The recognition network is constructed using the knowledge about the type of primitives and the junctions involved in the individual characters. Next, this network is trained to learn the association of the combination of the junctions and the primitives with the identity of the characters. On completion of the process of knowledge acquisition, the complete recognition module can be used for categorizing unknown character samples.
dhaw
daw
k~
R
baw
(a) Fig. 2. Sample characters: (a) sample printed alphabetic character; (b) hand-written alphabetic characters; (c) hand-written numerals. PR 26:12-C
1760
A. DUTTA and S. CHAUDHURY
~
~
~~
m
ba~
ka~
jha~
~a
dha~
ra~
(b)
9 ~9 ')9 ~ ' ) ~ i J o o o o
(c) Fig. 2. (Continued.)
o o.~
Bengali alpha-numeric character recognition using curvature features
1761
3. PRE-PROCESSING The input data obtained by scanning of printed or hand-written text is almost always contaminated with noise and contains redundant information. The preprocessing stage concentrates on elimination of noise and removal of redundant information, as far as possible. After scanning, because of unwanted noise and/or slight smudging of the character images, arbitrary extrusions and intrusions may be found at the boundary of the character images. Noisy cavities in the character images are also common. These distortions detrimentally effect the shape of the characters, particularly after thinning. These aberrations, to some extent, are removed by replacing each pixel in the binary'image by its majority function in an iterative fashion. The majority function of a binary pixel x is defined below
/f
majority (x) = 1; if (one_cnt (x)) > (9 - one_cnt (x)) Fig. 3. A Bengali character.
= 0; if (one_cnt (x)) < (9 - one_cnt (x))
11111 1111111111 111111111 111 11111 111 1 1 1 1 1 1 1 1 1 111 111 1 1 1 1 1 111111111111111 111111111111111 11111111111111 11111111111 1111111111 111111111 11111111 1111111 111111111111111 1111111 111111111 11111111111 11111 11111 111111111 iiiii 11111 11111111 1111 111111 1111111 1111 111111 11111111 1111 111111 1111111 11111 1111111111111 1111111 1111111 1111111 1111111 111111111111111 111 1111 1111111111111 111111 111111 1111111111 1111111111 11111
1111 11111111 111111111 111111111 111111111111 1111111]11111 11111111111111 111111111111111 11111111111111 11111111111 1111111111 111111111 iiiiiiii iiiiiii iiiiiiiiiiiiiii iiiiiii 111111111111111111111 iiiiii 11111 111111111 11111 11111 11111111 iiii 111111 1111111 1111 Iiiiii 11111111 1111 1111111 1111111 11111 1111111111111 11111111 1111111 11111111111111 111111111111111 1111111111111 11111111111111 111111111111 1111111111 1111111111 1111
(a) (c)
1 iii 1 1 ii ii I i i i 11 11 1 1 Ii 1 iii 1 iiiii
ii 1111 II 111 iiii iii
11111 ii
IIi 111
ii 11
iii 1
1111111 iiii
ii
11
ii 1 1
1
1 1 1 1 111111 1 1
1 1
ii 1 1 1 111
1 1 1 11
1
1
1
11
1111 1 1
1
1
1111111
1111 ~111
1 ii
11
1 1
1
1 1 1 1
Ii
1 1
1 1 1 1
1 1 1
1
1 11
111111
iiii 111
1111111
(b)
(d)
Fig. 4. Majority filtering and thinning: (a) original image; (b) thinned original image; (c) majority filtered
image; (d) thinned majority filtered image.
1762
A. DUTTAand S. CHAUDHURY
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
....
I
. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
I
I
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
~
. . . . . . . . . . . . . . . . . .
|
. . . . . . . . . . . . . . . .
| . . . . . . . . . . . . . . . . . |
. . . . . . . . . . . . . .
Fig. 5. A candidate junction for blip filtering. where one_cnt(x) equals the number of ls in a 3 x 3 neighborhood of the binary pixel x. The iterative process continues until there are no more replacements of the pixel value in a complete pass. The effect of modification ofa pixel value depending on its immediate neighbor is propagated to the adjoining pixels in an iterative fashion making the outer contour smooth and fills up cavities. An example of noise removal by the majority function is shown in Fig. 4. Since the thickness of the strokes of a character is of little practical importance as compared to the topology of their configuration as far as their identity is concerned, character images are thinned prior to feature extraction. A standard thinning algorithm, the "Safe Point Thinning Algorithm" or SPTA" 5~ is used here to eliminate redundant width information while preserving the topological characteristics of the characters. Comparison of the thinned majority filtered character image with that of the unfiltered image shown in Fig. 4 establishes the effectiveness of majority filtering. Random shifts along the skeletal branches are still common in the thinned images because majority filtering cannot remove all possible uneveness of the character boundaries. Again, at the junction of two or more branches due to the inherent characteristics of the thinning algorithm the junction pixel is found to be shifted by one pixel. Random shifts of the pixels along the branches and at the junctions introduce sharp curvature changes. Such shifts along the object boundaries have been referred to as blips in reference (16). In the present context also the same terminology has been adopted• It has been shown in reference (16) that blip filtering removes unnecessary high frequency noises but preserves essential curvature changes. In this case the blip filtering proposed in reference (16) has been used in the following way: the filtering scheme is specified with respect to a general point p(x, y) on a segment (curve) S. Case 1. Immediate neighbors ofp are m with coordinates ( x - 1 , y + 1) (or ( x - 1 , y - 1)) and n with coordinates (x + 1,y + 1) (or (x + 1,y - 1)): Set y-coord, of p to y-coord, of any one neighbor of p on S; Case 2: Immediate neighbors ofp are m with coordi-
hates ( x - 1 , y - 1) (or (x + 1 , y - 1)) and n with coordinates (x - 1, y + 1) (or (x + 1, y + 1)): Set x-coord, of p to x-coord, of any one neighbor of p on S; Case 3: The point p is a junction point on S: If ((x-coords of two neighbors ofp on S are equal to each other and differ from x by unity) and (y-coord. of one neighbor of p is greater than y by unity) and (y-coord. of the other neighbor of p is less than y by unity)) then introduce a new pixel p' with x-coord, of any one neighbor of p on S and y-coord, of p; else if ((y-coords of both neighbors of p on S are equal to each other and differ from y by unity) and (x-coord. of one neighbor of p is greater than x by unity) and (x-coord. of the other neighbor of p is less than x by unity)) then introduce a point p' with y-coord, of any one neighbor of p on S and x-coordinate of p. All the above cases (other than 3) can be generalized for segments of fixed length instead of a point as in reference (16). It is found that blip filtering is most effective when it is applied for short segments. For the junctions, possible combinations of the segments meeting at the junction are considered. After applying the junction condition for the blip filtering, the resulting segment is further blip filtered checking for the other two conditions. A junction which is a candidate for blip filtering is shown in Fig. 5.
4• FEATURE EXTRACTION
It is clear from the previous discussion (Section 2) that a Bengali character can be adequately represented in terms of the strokes and junctions that constitute it. Therefore, from a thinned image of a character, junctions and individual strokes must be extracted and appropriately represented for the recognition task. With respect to the thinned character images the following definitions have been adopted for the purpose of feature extraction:
Bengali alpha-numeric character recognition using curvature features
A
"°°
°°
•
•
•
•
•
•
• • °
° ° °
°
•
••
•°
°
•
•°
°
*
° • • °
°°
• • • B e
° ° o , ° ° • °•.
.
. . . . .
°°
.
.
•
.
.
°
.
°
.
.
• o ° o o o o o o o l l l
o°•
.
.
• o • o o •
.
.
°•
~
I , o
o•o••
.
•o
C°
° • , ,
•o
.
•
••
•
oQ•
•oo
••
•
.°
.
. .
. .
. .
. .
.
. .
. .
. . . . . . . .
.
.
.
•
.
.
.
.
° ~ °
. .
.
. .
.
.
.
.
•
° • °
.
°•
. .
. .
.
. .
. .
• ° • ° ° ° . °
. .
. .
. .
•
•
•
o ° ° o ° ,
°•
I l l ° ° .
•
•
•.
•
o°°.
,
o,..
:::-"-i: ::::
:: :::'=-- ':::::1:::::::: • I
•
o • • • • •
.
.
.
.
.
.
.
.
.
.
.
.
.
• . . . . . . . . .
. . . . .
• ° ° ~ ° ° °
.
.
•°
o•
.
| ° ° . ° ° | ° ° .
.
.
.
.
.
.
::: :":".'." .~..~".".'.i'i".';21" ~.2.'.i".'. i l 2"." . . . . . . . . . . . . . . . . . . . .
° . . . . . . °
°°
°°°
| °
. ,
.
.
~
°
.
.
.
~
.
. .
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
.
.
.
. °.
.
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.............................................. Fig. 6. Examples of segments, strokes, junctions in thinned image of the character "kaw": DE, AC, BC are segments; ACB, DE are strokes; DEG is a loop; C and D are junctions. Junction: A junction is a dark pixel, that has got at least three dark 8-neighbors. Segment: A segment is a set of dark pixels such that for all except two of its members there are two dark neighbors from among the members of the set itself. Of the two exceptions, at least one has a junction as its neighbor. None of the other pixels of a segment is a junction pixel. Stroke: A stroke is a set of dark pixels such that for all except two of its members there are two dark neighbors from among the members of the set itself. Loop: A loop is a set of dark pixels such that for all of its members there are two dark neighbors from among the members of the set itself. Some examples of junctions, strokes and loops are given in Fig. 6. The feature extraction process consists of the following steps: (1) Identification of junctions. (2) Tracking and identification of segments and loops. (3) Removal of essentially local curvature changes in the strokes by Gaussian filtering. (4) Identification of strokes from proper combination of segments. (5) Generation of feature vectors for strokes and junctions. 4.1. Junction identification To start with, all the junctions (i.e. junctions between any two segments in a character) present in the image are identified by scanning the image from top to bottom and left to right. The characteristic details (e.g. coordinates on the image plane) for all the junctions are preserved and added to the set of junctions J. The positions of occurrence of the junctions are also marked on the image plane by putting appropriate integral values as labels at the right pixel locations. It is found that very often two or three junctions, adjacent to each other, are appearing in the thinned image. These twin or triplet junctions are merged into single junctions.
1763
The middle junction in a triplet is retained as a junction while each of the other two are deleted from the junction set and considered a part of the segment closest to it. Twin junctions are very rare, and in their case, only one is retained as a junction and the other is considered a part of the segment closest to it. For characters and numerals that consist of a single stroke (e.g. Bengali "1"), the question of identifying junctions does not arise because there are none. But, in all the cases, end-pixels of the stroke are labeled as junctions and are included in the junction set. These elements of J are specially distinguished as terminators. This measure is adopted to ensure uniformity of processing at later stages. 4.2. Segment and loop identification Starting from each true junction (i.e. non-terminator elements of set J), segments are identified by applying the simple curve following algorithm. After identifying a segment, its characteristic features are stored in the data structure of its parent junction. In cases where a segment is terminated at another junction, the information is stored in the data structure of both starting and terminating junctions of the segment. During any segment-following, the pixels lying on a segment are labeled by a unique "color", i.e. an integer. The segments detected in the thinned character image constitute the set S. Considering each element in J as a node in a graph and segments as the branches emanating from the nodes, a character image can be represented by a graph C = (J, S). Now, the problem of loop identification in the character image reduces to the problem of finding all the circuits in the graph C. A loop can have, therefore, any number of junctions on it. These loops are logically deleted from the image plane so that they can maintain their distinct identity and are not combined with the other segments. Loops are identified by following a simple modification of the circuit detection algorithm by PatonJ 17~Since the number of junctions involved is very small (not more than ten in any case), high computational complexity of the circuit finding algorithm is of little consequence for the present application. 4.3. Filtering of the segments In the present recognition scheme, the strokes and segments comprising the characters are characterized by the basic nature of the curvature changes along them. Trends in the curvature change have been found to be a property which remains invariant over a lmge number ofwritiag styles and printed fonts. But, for this purpose intrinsic curvature changes must be identified by masking extraneous high frequency changes which may be caused by the nature of the writing surface, quality of the paper and pen, age of the writer and the noise introduced by the digitizer, among many other factors. In this section, we present a scheme for extracting the inherent curvature properties of the segments and the strokes. Each segment can be represented in
1764
A. DUTTAand S. CHAUDHURY
the parametric form as C = x(t), y(t) where t is a linear function of the path length ranging over the closed interval [0, 1]. For a loop, x(t) and y(t) are periodic in nature. Curvature K of a planar curve at a point P on the curve can be shown to be given by K
= (xly 2 +
z
.
YlX2)/(x21 + y 2 ) 3 / 2
where
xl = dx/dt x 2 = d2x/dt 2
(a) Yl = dy/dt Y2 = d2y/dt2" For computing curvatures at varying levels of detail the functions x(t) and y(t) can be convolved with a one-dimensional Gaussian kernel, as in Mokhtarian and Mackworth. tta) The one-dimensional Gaussian kernel O(t, ~) of width cr is given by g(t, or) = (1/tr~/(2n))exp ( - tz/2~2). Then
6-=5.0
x(t, ~) = x(t) ® g(t, a);
(b)
X l(t, iT) = x(t) ® ~O(t, tr)/c~t; X2(t, ~) = x ( t ) ® 02g(t, a)/~t z. (Similarly for Y (t, t~).) The convolution of the curve with the Gaussian kernel is essentially a process of smoothing the curve with a low pass filter. The pass band of the filter is controlled by the parameter a. As tr increases, the high order spatial frequencies are removed suppressing essentially local, extraneous curvature changes. This has a two fold advantage, in the first place, the effect of high frequency noise can be minimized for identification of the significant curvature changes and secondly by varying s, these changes can be identified at different levels of detail. Now the question that arises is how to choose ~. The choice of this parameter is very critical for feature extraction. If tr is too small, the presence of a large number of local curvature changes will overshadow the intrinsic characteristic of the segments. If it is large, the overall shape may be distorted providing wrong features for the recognition scheme. In Fig. 7, the effect of filtering a thinned image-curve with a Gaussian kernel of varying widths has been shown. In the present scheme, strokes and segments are characterized by localizing curvature maxima, curvature minima and zero-crossings on the segments or strokes. With this perspective, the appropriate value of sigma is chosen by identifying the range of sigma for which the number of these curvature features on a segment remained constant. For each segment, these three features are counted after filtering the curve with the Gaussian kernels obtained by varying tr from 3.0 to 8.0 in steps of 0.5. From the results, the range of # for which the feature counts are more or less constant (within maximum permissible difference of one) is identified. The minimum value of tT in this "stable
(c)
=
.0
(d)
o'% 12
(e) Fig. 7. Effect of Gaussian filtering of the thinned image of Bengali 6 with different values of tr.
Bengali alpha-numeric character recognition using curvature features range" is selected as the appropriate value. On some occasions, there is more than one "stable range" for tr. In such cases, the lower range is only considered for fixing the final value of a. For the purpose of convolution, a spread of 6a is found to produce good results. Very often it so happened, particularly at the end of the segments, that the length of the sequence to be convolved is much shorter than the length of the kernel. Under this circumstance, the sequence had to be logically extended. Taking the end of the segment as the origin, each point on the segment is reflected (point-retleetion transformation) about the origin and corresponding points are generated on the other side of the origin. This is achieved by simply taking the point reflection of the x and y coordinates of all the points in the segment. The operation is done for both ends of the segment.
1765
(a)
4.4. Identification of strokes The problem of stroke identification from segments may be stated as follows. There may be three or more segments meeting at a junction. In the most common case of three segments meeting at a junction, almost always two of them are part of a single stroke and the third one is another stroke. Our objective at this stage is to find out the right combination of segments forming a stroke at each of the junctions in the image. Figure 8 gives some examples of strokes and loops identified from the segments. At any particular junction if there are three segments emanating from the junction then there a r e 3C 2 ways of combining two segments into a stroke and identifying the uncombined segment as another stroke. For the right combination, it is assumed that, the curvature change across the junction is a minimum among all the possible combinations. This approach is motivated by the observation that strokes always have smooth variation of curvature along their lengths. This is also expected from the considerations of natural movement of the pen during writing. All possible combinations of the segments at each junction are considered. Curvature values for the pixels belonging to a combination is computed after smoothing it by a Gaussian kernel as discussed in the previous section. The absolute difference in the curvature value over a window (of length 3) across the junction is calculated for all possible pairings. The minimum of these absolute differences is then compared to a preset threshold (0.8) value. If the minimum difference is found to be less than or equal to the threshold then the corresponding segments are combined into a stroke. This operation is repeated for all the junctions. In order to render the feature extraction process invariant to the order of initial traversal of the segments we followed a convention of numbering the segments at a junction before proceeding with the computation of curvature. The convention is to assign lower numbers to segments for which the x-coordinate of the starting pixel is smaller than the x-coordinate of the starting
(c) Fig. 8. Extracted strokes from the characters: (a) printed character "kaw"; (b) hand-written character "kaw"; (c) printed numeral Bengali 4. point of the other segments. For two segments having the same x-coordinates for their starting point the lower number is assigned to the one having the smaller y-coordinate for their starting point. If at any junction a segment is found to be very short compared to the other segments (if its length is less than 0.2 of the length of the longest stroke), then that particular segment is deleted as it is considered insignificant as far as the shape of the character is concerned. Thus, finally there are two types of strokes, one type originating from the combination of segments and the other simply the uncombined segments. The loops which are previously logically deleted so as not to participate in the process of segment combination are also considered as strokes. All the identified strokes are stored in a "stroke data structure". 4.5. Generation of the feature vectors The feature vector for the strokes consists of the following feature fields: (1) Number of points of curvature maxima. (2) Number of points of curvature minima. (3) Number of points ofinflexion from - v e to + ve curvature.
1766
A. DUTTAand S. CHAUDHURY
(4) Number of points ofinflexion from + ve to - ve curvature. (5) Normalized positions with respect to the strokelength for the points considered in 1 (number of components = 4). (6) Normalized positions with respect to the strokelength for the points considered in 2 (number of components = 4). (7) Normalized positions with respect to the strokelength for the points considered in 3 (number of components = 4). (8) Normalized positions with respect to the strokelength for the points considered in 4 (number of components = 4). The values of these feature fields are computed as follows: (1) A count of the number of "curvature maxima" is set up and initialized to 0. At each point along a stroke it is checked whether the curvature at that point is greater (in magnitude) than that of its two immediate neighbors on both sides. If the curvature is found to be greater, the count of the number of "curvature maxima" is incremented. When the entire stroke is traversed the count showed the final value of this feature field. (2) The procedure is similar to that for (1), with the checking being done for curvature of the current point to be less instead of greater than that of the neighbors. (3) A count of the number of - ve to + ve inflexion points is set up. At each point along a stroke checking is done to find out whether the sign of the curvature of two of its predecessor points is non-negative and the sign of the curvature of two of its successor points on the strokes non-positive, and whether both the curvatures are non-zero. If the checking resulted in a success, the count is incremented. When the entire stroke is traversed, the count showed the final value of this feature field. (4) The procedure is similar to that for (3), with the checking being done for the sign of curvature of the predecessor to be non-positive and that for the successor to be non-negative. For feature fields (1) through (8), "normalized position with respect to stroke length" is calculated by first taking the ratio of the position of the point concerned (in pixels), from both ends of the stroke to the length of the stroke in pixels; and then retaining only the minimum of the two values. This value, obviously, lies in the interval [0, 1]. This interval is divided into four equal parts. Depending on the quarter in which the actual value lies a code is assigned. This coded value is used as the locational parameter for the feature. Normalization of the positions of curvature maxima, minima, etc. with respect to the stroke length has made feature vector scale invariant. Use of the coded values has made these components relatively invariant to small variations in the location of these curvature events on the strokes. Further, these localization parameters corresponding to a particular type of feature
(e.g. curvature maxima) are incorporated in the feature vector in descending order of their value. By considering the minimum of the two normalized positions for each feature and by arranging these values in a fixed order the feature vector has been made invariant to the direction in which the stroke is being scanned. Also, for the smoothed strokes the number of features detected never exceeded the chosen bounds. The feature vector for the junction consisted of the following feature fields. (1) The number of strokes meeting at the junction. (2) The normalized positions of the junction in each of the strokes meeting at the junction (maximum of 4). The feature fields mentioned above are computed as follows: (1) The number of strokes (including loops) finally retained for a junction is considered for field "a". This number is also called the cardinality of the junction. (2) A junction may appear at the end of a stroke as well as at any other position on the stroke. For each of the strokes meeting at a junction, the position of occurrence of the junction on it (normalized with respect to the stroke length) is computed, encoded (as described before) and stored according to the order of their values in the feature vector. 5. CLASSIFICATION AND R E C O G N I T I O N
The curvature-based feature extraction process generates representations of strokes and junctions in the form of feature vectors. These feature vectors are used for establishing identity of the character samples. In the first phase of the recognition process, the feature vectors are classified into some distinct categories. These categories represent strokes and junctions which are common over the alphabet set. For this purpose a feed forward neural net based classifier has been used. The net is trained by the well-known backpropagation algorithm.~ 9) The structural relation between the strokes connected via junctions is exploited for establishing the final identity of the characters. This is performed by the recognizer net which encodes the structural relation in its structure itself. This network is also trained using the backpropagation algorithm. In the backpropagation algorithm that is used, the termination of the iterations is kept conditional on two alternate conditions. One of the conditions is that the error at each of the individual output units should be less than a certain preset threshold (0.005), while the other condition is that the sum of the squared errors of all the output units should be less than another preset threshold (0.01). To avoid the problem of local error minima and oscillations, training samples were randomly chosen and a small learning rate (0.2) was used. 5.1. The classifier The classifier used for the strokes and junctions is a standard feed forward muitilayer net. The net has one
Bengali alpha-numeric character recognition using curvature features
1767
Recogntzed characters
C1
C2
I~
/",
I1~
~
"x
C3
C4
C5~r~// ~
r
;,
7
/
I I
/ ~-C---~r---~--~
,' !/_,_ )-
1I/, , i t~
/
~
t I t i
7
~~
,
/
/
I
~,
. . . . . . . . . .
N
*
.
_Output layer .
.
.
.
.
.
.
.
.
"
,..
k
~,
ii i_ ~ ' 2 ~ 4 ' ~ II //'x ,,
.........
/~
.......
2nd hidden layer
//
x.~ ~
......
Ist hidden layer Stroke mput
I str-~k-:~---taye, ,
,
,
4 ........
I Junction 1
~
Junction input
layer
Junct1"on N
Junction 2 From the classlfication
stage
Fig. 9. Architecture of the recognizer net. hidden layer apart from the input and output layers. The number of nodes in the input layer is fixed by the number of feature-fields in the feature vectors to be classified. In the classification stage, a particular input vector is attributed to the class which is denoted by a node at the output layer having maximal activation. Activation of the maximal node is considered as the input for the recognizer. For a character having more than one stroke and junction, we assume replication of the classification net as many times as there are primitives or junctions. 5.2. T h e recognizer The output of the classifier is in the form of classified strokes and junctions. Thus, for example, the closed loops extracted from all samples (for both printed and hand-written) of the Bengali character"kaw" (see Fig. 2) are classified into one category; the open curves from the same character are also classified into another category. Also, for all other characters like "baw', "jhaw" and "dhaw" which have a closed loop similar to that found in "kaw", the respective closed loops are classified into the same category as that of "kaw". But for the junctions, junctions coming from "kaw", "jhaw" and "dhaw" are classified into different classes. Also, the other strokes of"kaw", "jhaw", and "dhaw" other than the closed loop are naturally classified into different categories. All of the classified outputs of strokes and junctions for a character are fed at a time into the recognizer. The final result of the recognition appears at the ouput nodes of the recognizer. The exact topology of the recognizer has been shown in Fig. 9. The stroke inputs are fed into the nodes of the input layer where as the junction inputs are fed into
the nodes of the first hidden layer. For each type of stroke and junction, there exists at least one input node. Thus the first hidden layer acts as an input layer as well as a hidden layer. This is because, in addition to receiving connections from the input (stroke) nodes, the nodes of the first hidden layer receive direct inputs from the classified junctions. The second hidden layer thus receives connections from the actual input layer (stroke layer) as well as from the input-cum-hidden first layer (junction layer). The remainder of the network, i.e. other hidden layers and the output layer are kept similar to those in a standard multilayer net. In the topology of the recognizer net described above, the junction nodes (nodes of the first hidden layer) form the link between the nodes (input layer nodes) for the strokes between which the junction exists. After sufficiently training the net with the examples of the characters and numerals, the weights between the junction nodes and the stroke nodes are able to encode the structural relation that exists between the strokes of the test characters and numerals. This supports the suggestion made by Hinton. c2°) At the same time, through the learning process possible structural variations and permissible variations regarding the types of the junctions and/or primitives are also learned by the network. Use of the second hidden layer ensures integration of the structural and the primitive based information for identifying the classification boundary in the feature space. In the actual implementation, the recognition of characters and numerals with only a single stroke, could be carried out in the classification stage itself. For characters having multiple strokes, since the structural information is an essential component in the recognition, the recognizer net is used.
1768
A. DUTTAand S. CHAUDHURY 6. EXPIgRIMENTAL RESULTS
A prototype recognition system was implemented in C on sun-4. For experimentation, samples were drawn from a number of different fonts and handwriting of different individuals. Examples of some of the data used are shown in Fig. 2. The character images were hand-scanned and processed individually. In the current implementation different parameter values used for feature extraction were the same as those mentioned before. Three different types of experiments were performed to evaluate the system. In the first set of experiments (type I) the classification and the recognition net was trained with samples of the same character drawn from different printed fonts. In one experiment, only numerals were involved. In this case the output layer of the stroke classification net had 10 nodes corresponding to the different classes of the strokes. The hidden layer of the classification net had 25 hidden units. Another classification net with two output nodes (corresponding to the types of junctions involved) was used for classifying the junctions. The recognition net had 10 output units (corresponding to 10 numerals). The input-cum-hidden unit of this net had only two nodes corresponding to the types of junctions. The second hidden layer had 15 nodes. Both classification and recognition nets were trained with the classical backpropagation algorithm. On completion of training, it was found that, weights of the links were consistent with the expectations. For example, Bengali "1" is a single stroke character. The input node corresponding to this stroke was connected to the junction nodes with a link having negligible weights (between 0.005 and 0.001). In this case the networks were trained with numerals drawn from two different printed fonts. F o r training, 10 samples of each numeral of each font were used. The training set of 20 real life samples were enlarged to 100 by generating new instances through random rotation (angle of rotation was between - 15 and 15 deg) of the original samples. The recognition system was used for classifying samples which were drawn from the fonts learned. The experiment was done on 10 samples of each numeral of each font and these instances were different from those Table 1. Results of type I experiments Training set Category numerals
Test set
fontl
font2
font3
font4
hwl
hw2
97 98 96 98 97 98 95 97 99
98 99 97 96 98 95 97 99 98
93 94 96 94 91 89 88 93 90
94 91 90 94 94 91 90 90 92
78 73 75 77 76 70 76 75 73
74
2 3 4 5 6 7 8 9
71 77 78 72 73 71 73 71
alphabet
93
91
86
85
69
71
1
learned. The net was also used for recognizing samples drawn from two different printed fonts and handwritings of two different individuals. In these experiments character samples were obtained by randomly choosing instances from texts of particular fonts and handwritings. In each case 10 samples were used. In all the experiments the test set was enlarged to 100 by generating new samples through random rotation of the given samples. In the second experiment of this set 10 different alphabets (not having more than one junction and at most three strokes) were used. The configuration of the nets used was identical and a similar training and testing process was followed. Summarized recognition results of these sets of experiments are presented in Table 1. For the alphabets, the overall recognition result is presented in the same table. The percentage values for the alphabets were calculated in terms of the total number of correctly classified samples as against total number of samples of different characters considered. In the table types of printed samples have been indicated as font/(e.g, fontl, font2, etc.) and hand-written ones as hwi (e.g. hwl, hw2, etc.) This convention is followed for the other tables also. In the second set of experiments, the networks were trained with hand-written samples of different writers. For this experiment, handwriting of three different writers was used. The training scheme followed was the same as that of type I experiments. On completion of training, the recognition system was tested with samples obtained from the training set, different handwritings and printed characters. For experimentation 10 samples of each type were obtained and this set was expanded to 100 through random rotation. For alphabets, the experiment was done with 25 most commonly used Bengali characters. In this set there were characters having at most five strokes and two junctions. There were 10 different types of strokes; but five types of junctions involved. The recognition net was more complex. The second hidden layer of the recognition net in this case contained 40 nodes. Overall recognition results are presented in Table 2. The percentage values for each type of sample have been calculated by considering all the numerals (i.e. 10 numbers) and all alphabetic characters (25) together. In the third set of experiments the network was trained with samples chosen from handwritings as well as printed material. In this case the handwriting of two different writers were used. Other training samples were chosen from instances of two fonts. The experiment was done on all the numerals and ten alphabets (alphabets used for type I experiment). The configuration of the network was the same as that of the type I experiment. In this case the number of iterations required for convergence for the classification net was greater than 10,000 which was the maximum required for all the other cases. With 12,000 iterations the error for the recognition reduced to only 0.028. Results obtained from these experiments are presented in Table 3. In all the recognition experiments, the classes and/or the characters inferred corresponded to the nodes in
Bengali alpha-numeric character recognition using curvature features
1769
Table 2. Results of type II experiments Category
numerals alphabet
Training set
Test set
hwl
hw2
hw3
hw4
hw5
hw6
fontl
font2
94 95
94 95
96 93
91 90
90 89
88 86
77 74
72 70
Table 3. Results of type III experiments Category
numerals alphabet
Training set
Test set
fontl
font2
hwl
hw2
hw3
hw4
font3
font4
94 92
90 91
92 89
91 90
78 72
72 74
67 75
75 71
the output layer having maximal activation irrespective of the actual value of the activation. Activation values varied between 0.30 and 0.98. Analysis of the results show that the scheme performed properly when it was used for recognition of either hand-written or printed characters only after training with samples of the corresponding type of characters. Performance was not encouraging when the network was asked to recognize hand-written samples after being trained with the printed samples or vice versa because of the conflicting characteristic of the handwriting and the printed fonts. But the generalizing capability of the recognition methodology was clearly evident from its capability to correctly recognize the samples drawn from unknown fonts or handwritings. The invariant nature of the feature vectors has minimized the possible variations over different fonts and the learning scheme with its generalizing capability has assimilated the nature of variations for making the correct decisions. If the set of features would have undergone random variations for different fonts and writing styles it would have been impossible for the network to learn the possible variations through the standard backpropagation learning scheme.
can provide correct results for unknown samples (of different fonts or handwriting) around 90% of the cases for numerals and 85% of the cases for alphabetic characters (cf. Tables 1 and 2). Obviously, there is scope for testing with a larger sample set and all the alphabetic characters. At the same time, superior system performance for a particular font or handwriting can be guaranteed by suitably training the network with samples of that class. Another advantage of this scheme is its modular design. For example, this scheme can be used for recognition of characters in some other Indian language scripts like Devanagari, which has close similarity to Bengali with respect to the general structural features of its characters. For effecting this modification only the stroke and junction classifiers need to be trained in another manner and no other changes are required. This scheme may also be used with other methods of data acquisition, for example by using an electronic tablet. Again, for this changeover, only the pre-processing stage needs to be modified keeping the other modules unaltered. The present scheme also satisfies the standard criteria of rotation and scaling invariance. One limitation of the scheme is that with the present design it is not possible to recognize composite characters (Juktakshars) which are also used in Bengali.
7. CONCLUSIONS A new approach for recognition of alpha-numeric Bengali characters is presented in this paper. Curvature related features have been used for characterizing the strokes which constitute the characters. Structural constraints imposed by the junctions have also been exploited. These descriptors are relatively invariant over different fonts and writing styles. Neural net based recognizer learns these descriptors and their possible variations for correctly recognizing the samples belonging to unknown fonts and/or writing styles. The relatively invariant nature of the features used and the learning scheme has given the recognition methodology a natural generalizing capability. The experimental results show that after being trained with either printed or hand-written samples, the recognizer
REFERENCES
1. V. K. Govindan and A. P. Shivaprasad, Character recognition--a review, Pattern Recognition 23, 671-683 (1990). 2. C. C. Tappert, C. Y. Suen and T. Wakahara, The state of the art in online handwriting recognition, IEEE Trans. Pattern Analysis Mach. Intell. 12(8), 787-808 (1990). 3. S. Kahan and T. Pavlidis, Recognition of printed characters of any font and size, IEEE Trans. Syst. Man Cybern. 274-276 (1987). 4. L. Lam and C.Y. Suen, Structural classification and relaxation matching of totally unconstrained handwritten ZIP-code numbers, Pattern Recognition 21, 19-31 (1988). 5. F. Kimura and M. Shridhar, Handwritten numerical recognition based on multiple algorithms, Pattern Recognition 24, 969-983 (1991). 6. L.-H. Chen and J. R. Lieh, Handwritten character rec-
1770
7.
8. 9. 10.
11. 12. 13.
A. DUTTA and S. CnAUDHURV
ognition using a 2-layer random graph model by relaxation matching, Pattern Recognition 23, 1189-1205 (1990). P. Gader, B. Forester, M. Ganzberger, A. Gillies, B. Mitchell, M. Whalen and T. Yocum, Recognition of handwritten digits using template and model matching, Pattern Recognition 24, 421-431 (1991). Sherif Sami El-Dabi, Rafat Ramsis and Aladin Kamel, A recognition system for printed Arabic text, Pattern Recognition 23, 485-497 (1990). R. M. K. Sinha and H. Mahabala, Machine recognition of Devnagari script, IEEE Trans. Syst. Man Cybern. 9 (1979). R. M. K. Sinha, Rule based contextual post-processing for Devnagari text recognition, Pattern Recognition 20, 475-485 (1987). G. Siromoney, R. Chandrasekaran and M. Chandrasekaran, Machine recognition of printed Tamil characters, Pattern Recognition 10 (1978). S. M. S. Rajasekaran and B. L. Dekshatulu, Recognition of printed Telugu characters, Comput. Graphics Image Process. 6 (1977). A. K. Roy and B. Chatterjee, Design of nearest neighbour
14.
15. 16.
17. 18.
19. 20.
classifier for Bengali character recognition, J. IETE 30 (1984). A. K. Dutta, A generalized formal approach for description and analysis for major Indian scripts, J. IETE 30 (1984). A. Nacache and S. Singhal, SPTA--Safe point thinning algorithm, IEEE Trans. Syst. Man Cybern. 14 (1984). D.M. Wuescher and K. L. Boyer, Robust contour decomposition using a constant curvature criterion, IEEE Trans. Pattern Analysis Mach. Intell. PAMI-13, 41-50 (1991). K. Paton, An algorithm for finding a fundamental set of cycles of a graph, Commun. A C M 12, 514-518 (1969). F. Mokhtarian and A. Mackworth, Scale based description and recognition of planar curves and two dimensional shapes, IEEE Trans. Pattern Analysis Mach. lntell. 34-44 (1986). J. Hertz, A. Krogh and R. Palmer, Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, Massachusetts (1991). G. E. Hinton, Mapping part-whole hierarchies into connectionist networks, Artif. Intell. 46, 47-75 (1990).
About the Autbor--ABmJIT DUTTA was born in 1967 in Calcutta, India. He finished his B.E. in electronics and telecommunication engineering from Jadavpur University, Calcutta in 1989 and M.Tech. in automation and computer vision from Indian Institute of Technology, Kharagpur in 1991. Currently he is a Doctoral student in the Department of Computer Science at Texas A&M University, Texas, U.S.A. His research interests include computer vision, parallel and distributed processing and fault-tolerant computing,
About the Author--SANTANU CHAUDHURY obtained his B.Tech. degree in electronics and electrical communication engineering in 1984 and Ph.D. in computer science and engineering in 1989 from I.I.T., Kharagpur. Since January 1992 he has been an assistant professor in the Department of Electrical Engineering, I.I.T., Delhi. His previous experience includes lecturership at I.I.T., Kharagpur, Computer Engineer at ISI, Calcutta, and Junior Scientific Officer at I.I.T., Kharagpur. His research interests are in the areas of computer vision, neural networks and artificial intelligence. He is the author of more than 30 publications in refereed journals and conference proceedings.