Pattern Recognition Letters 20 (1999) 293±303
Construction of partitioning paths for touching handwritten characters Jianming Hu *, Donggang Yu, Hong Yan Department of Electrical Engineering, University of Sydney, NSW 2006, Australia Received 13 October 1997; received in revised form 4 December 1998
Abstract In this paper we propose a new approach to the determination of the partitioning paths for handwritten numeral strings. The average stroke width of the connected numerals is estimated ®rst, and then three algorithms are developed to deal with three dierent situations. Instead of using one partitioning path for each area of contact, we use two partitioning paths to ensure that the recovered strokes have at least the average stroke width. Experiments show that by using this new approach, the strokes can be recovered satisfactorily. Ó 1999 Published by Elsevier Science B.V. All rights reserved. Keywords: Optical character recognition; Character segmentation; Boundary analysis; Unconstrained handwritten numerals
1. Introduction Recognition of handwritten numeral strings is an important research topic in optical character recognition (OCR). Approaches to the problem fall into two categories. In the ®rst category, the numeral string is segmented into individual numerals and each numeral is recognized by a recognition algorithm. In the second category, the numeral string is recognized as a whole. The method in the ®rst category involves two steps: (a) ®nding the cut points and (b) constructing the partitioning paths based on these cut points. A
* Corresponding author. Tel.: 61 2 9351 4824; fax: 61 2 9351 3847; e-mail:
[email protected]
number of algorithms have been proposed to identify the cut points (Casey and Lecolinet, 1996; Lu and Shridhar, 1996), but full investigations into how to construct the partitioning paths have not been reported in research papers. Constructing a partitioning path between two cut points is relatively easy if the length of the path is short. However, large distortions may exist for the separated numerals if the touching area is long and deep. Fig. 1 shows the boundaries of two connected numerals, the cut points (``S'') and the line segment connecting them. In Fig. 1(a), the right part of ``0'' and the left part of ``9'' are distorted by the partitioning path. In Fig. 1(b), the partitioning path crosses the background region and the separation result is not correct. In the following sections we will address the partitioning problems and develop algorithms to recover the strokes in the area of contact in various situations.
0167-8655/99/$ ± see front matter Ó 1999 Published by Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 9 8 ) 0 0 1 4 8 - 2
294
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
Fig. 1. Examples of partitioning paths which causes large distortions for the separated characters: (a) the partitioning path does not cross the background region; (b) the partitioning path does cross the background region.
Fig. 2. Examples of character string where only one cut point was found.
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
295
Table 1 The number of each ``stroke run'' for Fig. 1(a) l n
l
2 0
3 0
4 3
5 2
6 15
7 23
8 3
9 9
10 4
11 7
12 4
l n
l
14 1
15 0
16 2
17 1
18 1
19 3
20 0
21 3
22 1
23 4
24 2
2. Problem formulation Two connected numerals may have one or more areas of contact. To separate them, one partitioning path is required for each area of contact. The algorithms are the same for each partitioning path. In this paper, we have assumed that the cut points have been found for each area of contact. Methods for ®nding the cut points can be found in a number of papers: (Westall and Narasimha, 1993; Cheriet et al., 1992; Fenrich, 1992; Fujisawa et al., 1992; Strathy et al., 1993; Shridhar and Badreldin, 1987; Kimura and Shridhar, 1992). In the following, we work with the boundary image of the numerals to be partitioned because this simpli®es the separation procedure. For an area of contact, a pair of cut points can usually be found (Fig. 1). However, in some cases only one cut point may be found on lower or upper boundaries. Fig. 2 shows two examples, for which there is only one cut point S. The parti-
Fig. 3. Flow diagram for the application of the three algorithms.
13 3
tioning paths for the two situations will be referred to as a two-point-path and a one-point-path, respectively. To construct a two-point-path, the simplest way is to connect the two cut points by a straight line. We shall call this method ``the straight line method''. To construct a one-pointpath, the Hit and De¯ect Strategy (HDS) (Shridhar and Badreldin, 1986) can be used. In the straight line method and HDS, the left and right numerals share the same partitioning path. If two strokes merge together in the area of contact, large distortions may appear if we use only one partitioning path. To reduce distortion, two partitioning paths should be constructed. The path for the left numeral is called the ``left partitioning path'' and that for the right numeral is called the ``right partitioning path''. To recover the stroke shape, the average stroke width ws of the numerals must be known ®rst. With this information we can set the width of the recovered strokes to be at least ws . Three algorithms are designed to construct a partitioning path for the following situations: a two-point-path which does not cross the background region (Algorithm 1), a two-point-path which crosses the background region (Algorithm 2), and a one-point-path (Algorithm 3).
Fig. 4. Filling schemes for two adjacent lines where a black box is the position at a scanline after boundary tracing: (a) for the left partitioning path; (b) for the right partitioning path.
296
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
3. Stroke width estimation We extract the boundary image of the connected numerals, and one pixel margin is set at its four sides. Suppose the origin of the image is located at the leftmost point of the ®rst row and the height and width of the image are h and w, respectively. The pixel at point
x; y is denoted as I
x; y, and I
x; y is set to 1 for a boundary pixel and 0 for all other pixels. For each horizontal scan line (x direction), we search for a ``stroke run'' which begins with a 0 to 1 transition and ends with a 0 to 1 transition. The ``stroke run'' is used to approximate the stroke width of the numerals, and each ``stroke run'' starts from an even number 0 to 1 transition in a scan line. For each length l (1 < l < w=3) of the ``stroke run'', its total number n
l in the image is calculated. The results for Fig. 1(a) are listed in Table 1. To determine the stroke width ws , we search for three neighbouring non-zero numbers n
l0 , n
l1 and n
l2 such that n
l0 n
l1 n
l2 max :
1
The stroke width is then taken as the average value of l0 , l1 and l2 . From Table 1, we can see that n
6 n
7 n
8 max, and therefore the estimated stroke width is 7. The stroke width estimation algorithm can be used eectively to estimate the stroke width of a numeral string or of a single numeral. The estimated stroke widths for numerals in Fig. 1(b), Fig. 2(a) and 2(b) are 7, 8 and 8, respectively. 4. Construction of partitioning paths As discussed in Section 2, three algorithms are developed to deal with three dierent cases as shown in Fig. 3. In all the algorithms, the width of a recovered stroke must be at least the estimated stroke width while recovering the shape of the stroke from nearby boundary segments. Details of the three algorithms are described as follows. Algorithm 1 (two cut points, no crossing). · Input. The boundary image, the two cut points S1 and S2 , and the line segment S1 S2 which does
Fig. 5. Sample numerals separated by Algorithm 1: (a) boundary image and initial partitioning path; (b) left and right partitioning paths represented by ``l'' and ``r'', respectively; (c) separation results.
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
297
Fig. 6. Illustration of Algorithm 2: (a) construction of the right partitioning path; (b) construction of the left partitioning path; (c) the case for rejection.
not cross the background region. Denote the y coordinates of S1 and S2 as y1 and y2 (y1 < y2 ), respectively. Denote the x coordinates of the leftmost point and the rightmost point of S1 S2 at scanline j (y1 < j < y2 ) as Lx
j and Rx
j, respectively. If there is only one point in S1 S2 at scanline j, we have Lx
j Rx
j. · Step 1. Estimate the stroke width ws of the image. · Step 2. Set the initial left partitioning path as Pl
j Rx
j (y1 < j < y2 ). · Step 3. At each scanline j (y1 < j < y2 ), calculate the distance d between point Rx
j and the ®rst boundary point on the left. If d < ws , then we have Pl
j Rx
j ws ÿ d:
2
Otherwise, Pl
j is unchanged. · Step 4. Check the 8-connectivity of points Pl
j and Pl
j 1. If Pl
j 1 ÿ Pl
j > 1, then the points Pl
j 1; . . . ; Pl
j 1 ÿ 1 are added in the path. If Pl
j ÿ Pl
j 1 > 1, then the points Pl
j 1 1; . . . ; Pl
j ÿ 1 are added in the path. Fig. 4(a) shows the two cases, where a black box is the initial point determined in Step 3 and the blank boxes are the points added in the partitioning path. So far, the left partitioning path has been determined. · Step 5. Set the initial right partitioning path as Pr
j Lx
j (y1 < j < y2 ). · Step 6. At each scanline j (y1 < j < y2 ), calculate the distance d between point Lx
j and the ®rst
boundary point on the right. If d < ws , then we have Pr
j Lx
j ÿ
ws ÿ d:
3
· Step 7. Similar to Step 4, check the 8-connectivity of points Pr
j and Pr
j 1 and add new points to the path if necessary (Fig. 4(b)). The right partitioning path is then determined. Fig. 5 shows an example, where (a) is the image with line segment S1 S2 (ws 7), (b) is the image with left and right partitioning paths represented by l and r, respectively, and (c) is the separated images. If the line segment S1 S2 crosses the boundary (Fig. 1(b)), the partitioning path cannot be determined by Algorithm 1. In this case, we ®rst ®nd the two intersection points I1 and I2 . Fig. 6(a) and (b) shows the two cases when S1 S2 crosses a boundary (the boundary is traced anticlockwise). The boundary segment between I1 and I2 (denoted as B(I1 , I2 )) is the boundary segment to be traced. In Fig. 6(a), the right partitioning path consists of three segments; line segment S1 T1 , curve T1 T2 , and line segment T2 S2 . To determine curve T1 T2 , the leftmost point Lx
j (yI1 6 j 6 yI2 ) of B(I1 , I2 ) is ®rst to be determined. For each scanline j, a point is located on the left at a distance ws from Lx
j. Curve T1 T2 is then obtained by checking the connectivity of these points and applying the ®lling scheme in Fig. 4(b).
298
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
Fig. 7. Sample numerals separated by Algorithm 2: (a) initial partitioning points ``l'' for left numeral determined from B(I1 , I2 ); (b) initial partitioning points ``r'' for right numeral determined from B(I1 , I2 ); (c) the left and right partitioning points after ®lling the path according to Fig. 4; (d) ®nal separation results.
The left partitioning path, however, is more dicult to build because we cannot trace only B
I1 ; I2 . In this case, the boundary on the left of B
I1 ; I2 should also be considered. For each scanline j, we search for the ®rst boundary point Qx
j on the left of Lx
j. If the distance between Lx
j and Qx
j is dj , the partitioning point Pl
j is determined as follows:
Pl
j Qx
j ws
if dj 6
3ws 1=2;
Pl
j Lx
j ÿ
ws 1=2;
otherwise:
4
The partitioning path between T1 and T2 is then obtained by checking the connectivity of Pl
j and applying the ®lling scheme in Fig. 4(a). In this way, the partitioning segment T1 T2 is at least ws
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
away from the boundary point of the left numeral. The whole left partitioning path consists of the line segment S1 T1 , the segment T1 T2 obtained above, and the line segment T2 S2 . An example is shown in Fig. 7. The partitioning points for the left path (denoted as ``l'') and the right path (denoted as ``r'') are shown in Fig. 7(a) and (b), respectively. After ®lling the path according to Fig. 4 and adding S1 T1 and T2 S2 , the left and right partitioning paths are shown in Fig. 7(c). The separated numerals are given in Fig. 7(d). The procedure for Fig. 6(b) is similar to that for Fig. 6(a) except that we use the rightmost point Rx
j for B(I1 , I2 ). The boundary segment B(I1 , I2 ) in Fig. 6(a) and (b) is required to be monotonic. Otherwise, there will be two ``r'' points for the scanlines above the dashed line in Fig. 6(c). In this case, a smooth partitioning path between I1 and I2 cannot be constructed. This pair of cut points is then rejected by Algorithm 2 (see Section 5 for further discussion). The algorithm can be summarized as follows. Algorithm 2 (two cut points, crossing). · Input. The boundary image, the two cut points S1 and S2 . Denote the y coordinates of S1 and S2 as y1 and y2 (y1 < y2 ), respectively. · Step 1. Estimate the stroke width ws of the image. · Step 2. Identify the two intersection points I1 and I2 and determine the case of the intersection (Fig. 6(a) or 6(b)). · Step 3. Check the monotonicity of the boundary segment B(I1 , I2 ). If it is monotonic, then go to Step 4. Otherwise, the algorithm stops and this pair of cut points are rejected. · Step 4. Find the leftmost (rightmost) point at each scanline for B(I1 , I2 ). · Step 5. Determine the boundary segment B(T1 , T2 ) for the left (right) path according to the method described above. · Step 6. Add line segments S1 T1 and T2 S2 to the path. If there is only one cut point S in some cases (usually the top boundary is ¯at, see Fig. 2), we can extend the HDS to incorporate the estimated
299
stroke width ws . The scanning point starts from S. If S is located at the upper boundary, then the searching is carried out downwards. Otherwise, it is carried out upwards. We search for the ®rst boundary point Pb on the left which is within the distance ws . The starting point Tl for the left boundary is determined as Tl Pb ws . A boundary segment is then located which can be traced for the left partitioning path. Fig. 8 shows the tracing procedure, where the boundary segment to be traced is represented by black boxes, the dotted line is the path of the scanning point, the blank box is the traced position, and the boxes with a cross ``'' inside are the ®lled pixels according to Fig. 4(a), the ®nal partitioning path is shown by the solid line. The right partitioning path can be determined similarly. The algorithm can be described as follows. Algorithm 3 (one cut point). · Input. The boundary image, the cut point S in upper or lower boundary. · Step 1. Estimate the stroke width ws of the image.
Fig. 8. Illustration of the starting point determination and the tracing procedure for Algorithm 3.
300
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
Fig. 9. Separation results for Fig. 2 where only one cut point can be found for each character string. (a) and (c) results by HDS; (b) and (d) results by Algorithm 3.
· Step 2. Identify Tl for the left boundary, and trace the left boundary segment until the scanning point reaches the other boundary (upper or lower). Add the line segment ST1 to the path. · Step 3. Identify Tr for the right boundary, and trace the right boundary segment until the scan-
ning point reaches the other boundary (upper or lower). Add the line segment ST1 to the path. · Step 4. If both of Tl and Tr have been found, the algorithm stops. If neither of Tl and Tr have been found, the algorithm fails to ®nd the partitioning path. If only one partitioning path
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
has been located, the other partitioning path is derived from the located boundary by adjusting the points to ensure the minimum stroke width to be ws . Fig. 9 shows the separation results of Fig. 2, where (a) and (c) are the results by the HDS, (b) and (d) are the results by using Algorithm 3. 5. Experiments The data used in the experiment consist of twodigit strings extracted from the NIST database (Garris and Wilkinson, 1992). The training group and the testing group consist of 1200 and 3355 images, respectively. The original image is ®rst smoothed (Hu et al., 1996) and the boundary extracted. The boundary is smoothed further (Yu and Yan, 1997; Hu et al., 1998) and the cut points are identi®ed using the method in (Hu and Yan, 1997). In that method, the separation results are veri®ed by a recognition algorithm. If one numeral is rejected by the recognition algorithm, the next candidate cut point will be selected. If two cut points are found, we then calculate the distance d of the two points. In the following the height of the boundary image is denoted as h. If d < h=4, the straight line method is used. Otherwise, Algorithms 1 and 2 are used. If there is only one cut point, Algorithm 3 is applied. In the training group of 1200 images, 45 images were partitioned by Algorithms 1±3 (the remaining images were partitioned by the straight line method). In the testing group of 3355 images, 106 (3.16%) images were partitioned by Algorithms 1± 3, where Algorithms 1±3 were applied to 86 (81.1%), 13 (12.3%) and 7 (6.6%) images, respectively. All the separation results were examined manually. The results showed that strokes in the area of contact were recovered properly. Fig. 10 shows some original images and their separated results, where the separated numerals were smoothed by the boundary smoothing algorithms. The average time for one image was 0.01 s on a Sun Sparc 1 workstation. The time includes stroke with estimation, construction of the partitioning paths, and separation of the two numerals.
301
The above algorithms work well when the cut points are located correctly. Algorithm 1 is generally a better method of separating numerals than the straight line method. When the length of contact is not too long, there is not much dierence between straight line method and Algorithm 1. In our experiments, we found that the straight line method is sucient when the distance between the two cut points is less than h=4. Most images (96.84%) in the database could be separated by the straight line method. Where the straight line method is unable to separate the numerals, Algorithms 2±3 can be used. However, if the cut points are not located properly Algorithms 2±3 can result in additional substitution errors. In Algorithm 2, the boundary segment between two cut points is required to be monotonic. This requirement can reject wrongly located cut points, thereby reducing substitution errors for a recognition algorithm. Fig. 11(a) shows an example, where the boundary segment between two intersection points is not monotonic. Therefore, the algorithm rejected the numeral string. However, if the cut points are not located correctly and the boundary segment between two intersection points is monotonic, Algorithm 2 will
Fig. 10. Some examples of the numeral strings and their separation results by the proposed algorithms.
302
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303
6. Conclusion In this paper we proposed a new approach to the determination of partitioning paths for connected handwritten numerals. Three algorithms were developed in order to deal with three dierent situations, in which the stroke width estimation algorithm played an important part. Experiments showed that the algorithms were able to recover the joining strokes correctly. The use of two partitioning paths instead of one, and the fact that the recovered stroke had at least an average stroke width proved to be an improvement over the straight line method. References
Fig. 11. Further illustration of Algorithm 2: (a) rejected case; (b) the case where the cut points are not correct; (c) separation results of (b).
still separate the numeral string. An example is shown in Fig. 11, where (b) shows the connected numerals and (c) the separation results.
Casey, R.G., Lecolinet, E., 1996. A survey of methods and strategies in character segmentation. IEEE Trans. Anal. and Mach. Intell. 18, 690±706. Cheriet, M., Huang, Y.S., Suen, C.Y., 1992. Background region-based algorithm for the segmentation of connected digits. In: Proceedings of the International Conference on Pattern Recognition 92, pp. 619±622. Fenrich, R., 1992. Segmentation of automatically located handwritten numeric strings. In: Impedovo, S., Simon, J.C. (Eds.), From Pixels to Features III: Frontiers in Handwriting Recognition. pp. 47±59. Fujisawa, H., Nakano, Y., Kuriro, K., 1992. Segmentation methods for character recognition: From segmentation to document structure analysis. Proceedings of the IEEE 80, 1079±1092. Garris, M.D., Wilkinson, R.A., 1992. Handwritten segmented characters database. Technical Report Special Database 3, HWSC, National Institute of Standards and Technology (NIST). Hu, J., Yan, H., 1997. A model-based segmentation method for handwritten numeral strings. Computer Vision and Image Understanding, accepted. Hu, J., Yu, D., Yan, H., 1996. Algorithm for stroke width compensation of handwritten characters. Electronics Letters 32, 2221±2222. Hu, J., Yu, D., Yan, H., 1998. A multiple point boundary smoothing algorithm. Pattern Recognition Letters 19, 657± 668. Kimura, E., Shridhar, M., 1992. Segmentation-recognition algorithm for zip code ®eld recognition. Mach. Vision Applic. 5, 199±210. Lu, Y., Shridhar, M., 1996. Character segmentation in handwritten words ± an overview. Pattern Recognition 29, 77±96. Shridhar, M., Badreldin, A., 1986. Recognition of isolated and simply connected handwritten numerals. Pattern Recognition 19, 1±12.
J. Hu et al. / Pattern Recognition Letters 20 (1999) 293±303 Shridhar, M., Badreldin, A., 1987. Context-directed segmentation algorithm for handwritten numeral strings. Image and Vision Computer 5, 3±8. Strathy, N.W., Suen, C.Y., Krzyzak, A., 1993. Segmentation of handwritten digits using contour features. In: Proceedings of the Second International Conference on Document Analysis and Recognition, pp. 577±580.
303
Yu, D., Yan, H., 1997. An ecient algorithm for smoothing, linearization and detection of structural feature points of binary image contours. Pattern Recognition 30 (1), 57±69. Westall, J.M., Narasimha, M.S., 1993. Vertex directed segmentation of handwritten numerals. Pattern Recognition 26, 1473±1486.