A novel stroke-based feature extraction for handwritten Chinese character recognition

Pattern Recognition 32 (1999) 1947}1959 A novel stroke-based feature extraction for handwritten Chinese character recognition Hung-Pin Chiu, Din-Chan...

Download PDF

427KB Sizes 0 Downloads 114 Views

Report

PDF Reader
Full Text

Pattern Recognition 32 (1999) 1947}1959

A novel stroke-based feature extraction for handwritten Chinese character recognition Hung-Pin Chiu, Din-Chang Tseng* Institute of Computer Science and Information Engineering, National Central University, Chung-li, Taiwan 320

Abstract A stroke-based approach to extract skeletons and structural features for handwritten Chinese character recognition is proposed. We "rst determine stroke directions based on the directional run-length information of binary character patterns. According to the stroke directions and their adjacent relationships, we split strokes into stroke and fork segments, and then extract the skeletons of the stroke segments called skeleton segments. After all skeleton segments are extracted, fork segments are processed to "nd the fork points and fork degrees. Skeleton segments that touch a fork segment are connected at the fork point, and all connected skeleton segments form the character skeleton. According to the extracted skeletons and fork points, we can extract primitive strokes and stroke direction maps for recognition. A simple classi"er based on the stroke direction map is presented to recognize regular and rotated characters to verify the ability of the proposed feature extraction for handwritten Chinese character recognition. Several experiments are carried out, and the experimental results show that the proposed approach can easily and e!ectively extract skeletons and structural features, and works well for handwritten Chinese character recognition. 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Thinning; Character stroke; Stroke direction; Fork point; Primitive stroke; Handwritten Chinese character recognition

1. Introduction A large number of studies on optical character recognition have been reported. One of the most di$cult problems in the recognition is to resolve the shape variation of handwritten Chinese characters [1]. When the rotation invariance is required, the problem becomes even more di$cult [2,3]. A survey of pattern recognition literature [1,4] reveals that a pattern-recognition system must solve two problems: (i) how input patterns should be represented, and (ii) how input patterns should be classi"ed based on the representation. In general, pattern representations vary with the extracted features [1]. Local features such as

* Corresponding author. E-mail address: [email protected] (D-C. Tseng)

cellular features [1,5] and global features such as moment invariants [6,7] are usually used as feature vectors, then statistical recognition methods [4,8,9], fuzzy-set approaches [2,3] or neural networks [2,5}7,10}12] are used to classify the feature vectors. When structural features such as strokes are used, a pattern is generally represented by a string [13], relational graph [14,15] or unordered stroke set [16}20], and then a dynamic programming algorithm [13,17], genetic algorithm [21], relaxation matching [14}16,18,22] or linear programming approach [19] is used to match the patterns. The selection and extraction of features are important for a pattern-recognition system. A set of representative features should have enough information to e!ectively distinguish one pattern from others, and the success of a recognition system is greatly dependent on the stability and correctness of the extracted features. Since structural features can describe the geometrical structures of

0031-3203/99/$20.00 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 0 3 - 5

1948

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

characters more accurately and are more insensitive to noises and character shape variations, a structural-based recognition system is a powerful approach for the recognition of handwritten Chinese characters [1,23]. In this kind of approaches, characters are represented and recognized by a set of structural features such as feature points [1,3,8], line segments [17,24] and strokes [1,14}16,18}23]. In o!-line handwritten Chinese character recognition, the structural-feature extraction is very di$cult due to the following facts: (i) there are many types of strokes in Chinese characters and the character shapes are very complicated; (ii) the stroke thickness is varied; (iii) the character strokes overlap each other, and then strokes become ambiguous or misleading at intersections; and (iv) the boundaries of the character patterns are generally rough. Two kinds of approaches have been proposed to extract satisfactory structural features. In the "rst kind of approaches, a pixel-based thinning algorithm [23,25}29] was "rstly employed to obtain character skeletons to solve the stroke-thickness variation problem. Fork points were then detected. Next, every skeleton segment between two fork points was extracted and further split into several straight-line segments with a line approximation [17,30] or curve "tting method [22,31]. Finally, these line segments were merged into strokes according to a number of prede"ned rules [19,22,30]. Since no global and structural information was utilized in the pixel-based thinning algorithm, there were several problems in the thinned results such as the hairy problem [26], the shortening problem [26], the fork-point distortion problem [26,32,33] and failure in preserving the stroke straightness [33]. Several additional processes such as hair-removal and fork-point-merging procedures [31,34,35] were utilized to remove the problems (distortions) to improve the performance of feature extraction. However, the additional processes made the approaches to be more unstable [34], non-e$cient [31], or complicated [35]. The second kind of approaches [9,32,33,36}38] used other than pixel-based thinning algorithms to get thinned skeletons, and the thinned skeletons were generally represented by a set of line segments and their relationships from which strokes can easily be extracted. Baruch [36] introduced a technique called line following to skeletonize binary patterns. The method seems quite simple, but still generates distortion at intersections [32]. Lin and Chen [37] constructed graphs from the run-length encoding of characters. A modi"cation process was then employed to re"ne the graph, and a skeleton was obtained based on the re"ned graph. Similarly, Chen [38] utilized the run-length-encoding graphs to represent characters and proposed several merging rules to extract strokes. In general, the run-length-based approaches involve numerous steps and cases; thus these approaches

are tri#ing and complicated. Moreover, only one-pass scan was considered in these approaches, segmenting a character into black-run segments is somewhat sensitive to the boundary noises and the stroke intersections, and then in#uence the following skeleton and feature extraction. The hair-removal and fork-point-merging procedures are usually unstable, complicated, or time consuming. The non-pixel-based thinning algorithms tried to utilize the run-length or contour information to directly acquire better skeletons from which the features can be e!ectively extracted. However, the results are still somewhat sensitive to the boundary noises and stroke intersections. In this paper, we propose a stroke-based approach to effectively extract skeletons and structural features for handwritten Chinese character recognition. Based on directional run-length information of characters, we split character strokes into non-overlapped stroke segments and overlapped fork segments to solve the stroke-intersection problem. Based on the run lengths of all segments, we estimate the stroke widths of characters by which hairy branches will be e!ectively deleted. Several structural features including feature points, line segments, primitive strokes, and stroke direction maps are extracted in this study. A primitive stroke is de"ned as a limited-length straight line. Stroke direction maps are features for representing the geometrical structures of characters and are easier to be extracted than strokes. We extract these structural features of Chinese characters based on the stroke directions. Four directions are considered in this study, which are 03, 453, 903, and 1353 that are encoded as 0, 1, 2, and 3 direction codes, respectively, and called 0-, 1-, 2-, and 3-directions as shown in Fig. 1a. The directions 1803, 2253, 2703, and 3153 are identical to 03, 453, 903, and 1353, respectively. At "rst, one of four direction codes is assigned to every black pixel according to the run-length information of character strokes. Connected black pixels with the same direction code form a block and the blocks are then split into stroke and fork segments according to the direction codes of the blocks and the adjacent relationships among the blocks. The skeletons of all stroke segments called skeleton segments are extracted and fork segments are processed to "nd fork points and fork degrees. The skeleton segments that touch a fork segment are connected at the fork point and all connected skeleton segments form the character skeleton. Based on the extracted skeletons and fork points, the primitive strokes and stroke direction maps of characters are easily extracted. One example of the processing steps is illustrated in Fig. 2. The extracted skeletons obtained by the proposed approach are compared with those of two traditional thinning algorithms to demonstrate that the skeletondistortion problems can e!ectively be solved by our approach. Since only four stroke directions are considered, the stroke direction map can tolerate a small range of rotation. A simple classi"er based on the stroke direction

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

1949

2.1. Determination of direction codes

Fig. 1. The image coordinate system and de"nition of directions. (a) The four direction codes; and (b) the eight extension directions.

Fig. 2. Procedure of the proposed stroke-based feature extraction. (a) Direction-code determination: gray strokes are 2blocks and white strokes are 0-blocks; (b) stroke splitting: white blocks are stroke segments and gray blocks are fork segments; (c) skeleton-segment extraction: black lines are the extracted skeleton segments; (d) fork-point extraction: `ma indicates a multi-fork point and `ca indicates a corner point; (e) the "nal skeleton: all connected skeleton segments compose the "nal skeleton; (f ) the extracted horizontal and vertical strokes; and (g) the extracted stroke direction map.

Assume that f is the gray level of pixel (x, y) in VW a binary character pattern. f is `1a if (x, y) is a black VW pixel and is `0a if (x, y) is a white pixel. A character is composed of black pixels. The direction code of a black pixel is determined based on the lengths of runs passing through the pixel along the four directions. Traditionally, a run is de"ned as a sequence of consecutive black pixel in the horizontal. Here a d-directional run is de"ned as a sequence of consecutive black pixels along the d-direction, where d"0, 1, 2, or 3. Let l (d) be the length of VW a d-directional run passing through the black pixel (x, y). The direction code of pixel (x, y) is de"ned as the direction of the longest run passing through (x, y) and is denoted as d ; that is, VW l (d )*l (d), d"0, 1, 2, 3. (1) VW VW VW Connected black pixels with the same direction code form a block. If the direction code is d, then the block is called a d-block. One example of four d-block maps is shown in Fig. 3. There exist many small blocks (fragments) due to the rough border of strokes, and these fragments will a!ect the performance of the following processing. Thus we enforce the direction codes of black pixels in the fragments to be the same as those of their neighbors to eliminate these fragments. The direction codes of all black pixels are adjusted in parallel. A 5;5 window centered at a black pixel (x, y) is considered. The total length of all d-directional runs passing through the black pixels in the window is de"ned as S (d)" l (d)j(d, d ) VW V>GW>H V>GW>G G\ H\ where

(2)

j(d, e)"1 if d"e, map is proposed to recognize regular and arbitrarily rotated handwritten Chinese characters to verify the ability of the proposed feature-extraction approach for handwritten Chinese character recognition. The remains of this paper are organized as follows. Splitting of strokes is described in Section 2. Approaches to extract structural features such as corner points, fork points, primitive strokes, and stroke direction maps are described in Section 3. Section 4 presents the simple classi"er based on the stroke direction map to recognize regular and arbitrarily rotated handwritten Chinese characters. Experiments are presented in Section 5. Section 6 gives conclusions and the future works. 2. Splitting of character strokes The determination of direction codes and the splitting of stroke and fork segments are presented in this section.

j(d, e)"0 if dOe. We adjust the direction code of (x, y) to be d in which VW S (d )*S (d), d"0, 1, 2, 3. (3) VW VW VW The fragments may not be absorbed completely in a single pass. More iterations of the re"nement are performed until no pixel is adjusted. One example of the re"ned results is shown in Fig. 4. 2.2. Splitting of stroke and fork segments A block is further divided into stroke and fork segments if the block touches other blocks as shown in Fig. 5a, or the block is forked as shown in Fig. 5b. We de"ne a d-run as a directional run in which all pixels have the same direction code d. It should be noted that a d-run is di!erent from a d-directional run. A d-block can be

1950

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

Fig. 3. Four block maps of a character. (a) The original pattern; (b) 0-block map; (c) 1-block map; (d) 2-block map; and (e) 3-block map.

Fig. 4. Block maps after re"nement. (a) 0-block map; (b) 1-block map; (c) 2-block map; and (d) 3-block map.

(ii) d "1 and [( f "1 and d O1) or VW V\W> V\W> (f "1 and d O1)], V>W\ V>W\ (iii) d "2 and [( f "1 and d O2) or VW V\W V\W (f "1 and d O2)], and V>W V>W (iv) d "3 and [( f "1 and d O3) or VW V\W\ V\W\ (f "1 and d O3)]. V>W> V>W> Fig. 5. Splitting of fork and stroke segments. (a) The 2-block is split into stroke segments and fork segments, and the gray runs are fork runs; (b) the forked 0-block is split by the gray run that is a fork run; (c) the white stroke segment is still 8-connected; and (d) the white stroke segment is split by a fork segment.

decomposed into a sequence of consecutive d-runs in one of the four directions. We extract every p-directional d-run in a d-block and labeled it as a part of a stroke segment or a fork segment, where p-direction is perpendicular to d-direction; that is, if d"0, then p"2; d"1, p"3; d"2, p"0; and d"3, p"1. In this paper, if the direction of a d-run is not speci"ed, the default direction is the p-direction that is perpendicular to d-direction. A d-run is labeled as a fork run, if it touches other blocks or located at a fork of a block as shown in Fig. 5a and b; otherwise, the d-run is labeled as a stroke run. A d-run touches other blocks if its either endpoint (x, y) satis"es one of the following conditions: (i) d "0 and [( f "1 and d O0) or ( f "1 VW VW\ VW\ VW> and d O0)], VW>

A d-run is located at a fork of a block, if it is adjacent to more than one d-run in either side as shown in Fig. 5b. After the labeling process, all connected stroke runs form a stroke segment and all connected fork runs form a fork segment. In general, a fork segment appears at cross, fork or corner of a stroke. It means that a stroke is separated into two segments by a fork segment, or two strokes are crossed at a fork segment. In a few special cases, especially d"1 or 3, when a fork segment is just one-pixel width, the touched block or forked block is not truly split by the fork segment (the two stroke segments are 8connected) as shown in Fig. 5c. The problem will a!ect the result of the extraction and connection of skeleton segments in the following processing. Thus, when a d-run is labeled as a fork run, the next d-run is also labeled as a fork run to make a fork segment containing at least two d-runs to ensure that a block is truly split by a fork segment as shown in Fig. 5d. One example of the splitting of stroke and fork segments is shown in Fig. 6a and b. The stroke width will be used in the elimination of hairy branches. Here, we estimate the stroke width using

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

1951

Fig. 6. Stroke, fork, and skeleton segments. (a) The stroke segments; (b) the fork segments; (c) the skeleton and fork segments; and (d) the "nal skeleton.

the lengths of all d-runs in one character. A width (i.e., d-run length) histogram is constructed by using all d-run lengths. The width corresponding to the largest number in the width histogram gives the estimated stroke width.

ent the geometrical structures of characters, and they are easier to be extracted than strokes. In this section, the extractions of skeleton segments, fork points, and character skeletons are "rstly presented. Based on the extracted fork points and skeletons, the primitive strokes and stroke direction maps are then generated.

3. Feature extraction 3.1. Extraction of skeleton segments Structural features are extracted from the stroke and fork segments of characters. The structures of characters can be de"ned from di!erent structure levels, including pixels [1], line segments [9,17,24], strokes [1,14}16, 18}23,30,38,39], substructures [40], radicals [22,41] and characters. Line segments and strokes are the most basic primitives to represent structures of characters. Substructures and radicals represent the higher-level structures, and can be constructed from line segments and strokes. Feature points such as corner points and multi-fork points are other popular structural features [1,3]. In our study, several structural features, corner points, multi-fork points, line segments, primitive strokes, and stroke direction maps, are extracted for character recognition. In general, a stroke is de"ned as a continuous draw between the pen fall and the pen rise. In this sense, a stroke is treated as a list of line segments with di!erent gradients; thus extracting the strokes is more di$cult. From the viewpoint of character recognition, it is not necessary to extract such strokes [39]. A number of basic directional strokes called primitive strokes have been de"ned to simplify the extraction process. Two, four, and eight directional primitive strokes have been used in many studies [30,39,42]. In this study, four primitivestroke types: horizontal, vertical, left-diagonal, and right-diagonal strokes are adopted. In another sense, a character pattern can be divided horizontally and vertically into many receptive "elds called cells [1,5,10,11]. We use the directions of the line segments passing through a cell to de"ne the cell's direction. The directional cells compose the stroke direction map of the character. Stroke direction maps can repres-

Skeleton segments are extracted from the stroke segments. Let S be a stroke segment whose direction code is d. S is decomposed into a sequence of consecutive d-runs. The two extreme d-runs of S are "rst found. The midpoints of these two d-runs are connected with a straight line to form the skeleton of the stroke segment. Although skeleton segments are classi"ed into four classes according to their direction codes, any direction of a skeleton segment is possible. Noises and hairs should be removed. Let len be the length of a skeleton segment and w be the estimated stroke width. Let a be used to control the tolerant degree of shortness. If len(aw (i.e., the length is less than a proportion of the estimated stroke width), the skeleton segment is short. A short and isolated skeleton segment (neither endpoint touches a fork segment) is regarded as a noise. A short skeleton segment is a hair branch if one endpoint of the segment touches a fork segment and the other is not. One example of skeleton segments and fork segments is shown in Fig. 6c. We record the related information of every extracted skeleton segment in a skeleton-segment table to facilitate the feature extraction; the information includes the direction codes, the coordinates and the extension directions of the two endpoints as illustrated in Fig. 7a and b. As the classi"cation of skeleton-segment directions, the extension directions of skeleton segments are divided into eight classes as shown in Fig. 1b. With the classi"cation, the cases that skeleton segments touch a fork segment are easily considered to "nd the fork point of the fork segment as described in the following section.

1952

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

3.2. Extraction of fork points and skeletons Assume that a fork segment just contains one fork point. When a fork segment is found, fork degree and location of the fork point are "rst calculated; then all skeleton segments that touch the fork segment are extended to the fork point to form a complete character skeleton. Suppose that n is the number of skeleton segments that B are with direction code d and touch the same fork segment, where d"0, 1, 2 and 3. Let n be the fork degree of the R fork segment, then n "n #n #n #n . Four cases of R n are considered to "nd the location of the fork point. B (i) n "0. There is no skeleton segment so no operation R occurs. (ii) n "1. Only one skeleton segment exists, so the R skeleton segment is extended along its direction until a white pixel is met, and the last black pixel is taken to be the fork point. (iii) n '1, n (2, n (2, n (2,andn (2. In this R case, there exists at least two skeleton segments, and exists at most one skeleton segment in one direction. Any two skeleton segments are chosen to extend along their directions, respectively. The location of the fork point is just the intersection of the two extended lines. (iv) n *2, n *2, n *2, or n *2. In this case, there exist at least two skeleton segments with the same direction code. If there exists a pair of d-direction skeleton segments with opposite extension directions, the indicator k is set to 1; otherwise, k is set B B to 0. Three sub-cases are considered to "nd the fork-point location.

Fig. 7. Demonstration of the proposed feature extraction. (a) skeleton and fork segments; (b) skeleton-segment table; (c) character skeleton; (d) fork-point table; (e) the primitive strokes (¸ indicates the merging result of ¸ and ¸ ); and (f ) the GH G H "nal skeleton-segment table.

(a) k #k #k #k "0. There exists no pair of skeleton segments with opposite extension directions. If there are two skeleton segments with di!erent direction codes, the two segments are extended along their directions to intersect a fork point. If all skeleton segments have the same direction code, one of the skeleton segments is extended along its direction with a proportion of the estimated stroke width to a location that is the desired fork point. (b) k #k #k #k "1. There is only one pair of skeleton segments with opposite extension directions. The pair of skeleton segments is connected by a straight line; and the midpoint of the straight line is taken to be the fork point. (c) k #k #k #k '1. There exist at least two pairs of skeleton segments with opposite extension dir ections. Any two pairs of the skeleton segments are chosen. Segments in each pair are connected by a straight line. The intersection of the two straight lines is just the fork point.

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

The cases considered above are complete [43]; that is, all possible cases are included in these considered cases. After a fork point is extracted, the touched skeleton segments are connected at the fork point. All connected skeleton segments form the character skeleton that is just the less-distortion thinned result of the character as illustrated in Fig. 6d. When processing fork segments, we also check whether a 2-fork point is a corner point or not. If the di!erence between the extension direction codes of two touched skeleton segments is less than three or greater than "ve, the 2-fork point is a corner point; that is, the included angle between the two skeleton segments is less than 903. The related information of every fork point is recorded in a fork-point table as illustrated in Fig. 7c and d; the information includes fork degree, fork type, location, and the information of all touched skeleton segments in the skeleton-segment table. The skeleton-segment table and the fork-point table will be used to facilitate the structural-feature extraction. 3.3. Extraction of structural features The fork points in the fork-point table and the skeleton segments in the skeleton-segment table are the desired point and line-segment features. Furthermore, we extract primitive strokes based on the information in these two tables, and extract stroke direction maps only based on the skeleton-segment table. We propose a simple merging method to merge skeleton segments into primitive strokes. The fork points in the fork-point table are sequentially tested; all touched skeleton segments of a fork point are examined that if two skeleton segments are nearly collinear, the two segments are merged. Let h and h be the directions of two considered skeleton segments, and let

1953

has more skeleton segments to be merged, the mostsimilar-direction segment is chosen. After merging, a new skeleton segment is generated to replace the two old skeleton segments by updating the endpoints in the skeleton-segment table and the fork-point table. The direction code of the new skeleton segment is set to d, where d is one of the four pre-de"ned directions and d is the most similar direction to the direction of the new skeleton segment. After all fork points were tested, the desired primitive strokes are extracted and their related information is recorded in the skeleton-segment table as illustrated in Fig. 7e and f. One example of primitive strokes is given in Fig. 8. According to the direction codes of skeleton segments, we create stroke direction maps to represent the geometrical structures of characters. An n;n character pattern is divided horizontally and vertically into k;k cells in which the size of each cell is m;m and n"km. Every cell is associated with a direction code that is determined by the direction codes of skeleton segments overlapping it. In a cell, we accumulate the numbers of the direction codes of the skeleton pixels in the four direction classes and the cell's direction code is labeled as the dominated direction code. If d is the direction code of skeleton VW pixel (x, y). The number of skeleton pixels with direction code d in cell c(i, j) is de"ned as Z (d)" j(d,d ), GH VW VWZAGH

(4)

where j(d,d )"1 if d"d , VW VW j(d,d )"0 if dOd . VW VW The direction code of cell c(i, j) is de"ned as the direction code with maximum Z (d), and denoted as cij; that is, GH

h"min("h !h ", 3603!"h !h ")

Z (cij)*Z (d), d"0, 1, 2, 3. GH GH

be the included angle of the two skeleton segments. If h is greater than 1573 (and less than 1803), the two skeleton segments are allowed to be merged. If a skeleton segment

If there is no skeleton segment passing through a cell, the cell's direction code is set Null. The direction codes of all cells compose the desired stroke direction map.

(5)

Fig. 8. The primitive strokes. (a) The original characters; (b) the horizontal strokes; (c) the right-diagonal strokes; (d) the vertical strokes; and (e) the left-diagonal strokes.

1954

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

Fig. 9. The re"nement of stroke direction maps. (a) The original skeleton segments; (b) the stroke direction map with many redundant direction codes; and (c) the stroke direction map after re"nement.

In some cases, a skeleton segment just overlaps a cell with a few pixels, and we take these cells' direction codes to be redundant as illustrated in Fig. 9b. A simple re"nement method is used to eliminate the redundant codes. The re"nement is applied to all cells in parallel. If the direction codes of a cell and its upper-neighboring cell are all 0, then the direction code of the cell is redundant and then eliminated. If the direction code of a cell is 1, 2, or 3 and its left-neighboring cell has the same code, then the direction code of the cell is eliminated. One example of a re"ned result is shown in Fig. 9c.

4. Classi5cation To demonstrate the power of the proposed strokebased feature extraction for handwritten Chinese character recognition, a simple classi"er based on the stroke direction maps is designed to recognize handwritten Chinese characters. Only the stroke direction map is employed here since it is su$cient to represent the geometric structure of characters and the recognition speed is fast. In order to reduce the e!ect of shape variations of handwritten Chinese characters, many reference (or says training) samples of a character are utilized to generate a representative reference direction map for the character. The direction-code variation of a cell in a reference direction map is represented by a "ve-entry vector instead of a single value to provide more information for recognition. The frequencies of four direction codes are recorded in the "rst four entries and that of null direction code is recorded in the "fth entry. For simplicity, we consider the appearance instead of the frequency of a direction code in a cell for recognition. In our experience, the recognition result based on the direction-code frequency is not better than that based on the directioncode appearance, when less training data were used. Let R"[r ] be the direction map of a reference charGH acter sample, and M"[m (d):d"0, 1, 2, 3, Null] be the GH reference direction map of a character. The binary value m (d) indicates whether there exists a reference sample GH R with r "d. If it is true, m (d) is set `1a; otherwise m (d) GH GH GH

is set `0a. Let ;"[u ] be the direction map of an GH unknown character sample, and M "[mA (d)"0, 1, GH ! 2, 3, Null] be the reference direction map of character C. The similarity between U and M is de"ned by ! Sim(;, M )" mA (u ). ! GH GH GH

(6)

The unknown character sample U is recognized as character D if Sim(;, M )"Max Sim(;, M ). " ! !

(7)

Rotation-invariant features are hard to be found for invariant handwritten Chinese character recognition. We have constructed ring data for the invariant recognition using a fuzzy min-max neural network [2] and a fuzzyset classi"er [3]. However, their discrimination power is not strong for high shape-variation handwritten characters. Since only four stroke directions are considered, the stroke direction map can tolerate a small range of rotation; that is, a limited rotated character samples are su$cient to represent all arbitrary rotated character samples for invariant recognition. Thus, we also consider the recognition of rotated characters by using the same classi"er. In order to recognize arbitrarily rotated characters, we use K reference direction maps at the representative degrees to represent a character. Starting from the zero degree, we take each representative rotated sample by increasing the degree u, where u"3603/K. The reference direction map at a representative degree is constructed by using all rotated samples at that degree. Let M be !I the reference direction map of character C at the degree k , where k"0, 1, 2,2, K!1. Then the maximum simP ilarity between the unknown character sample U and reference character C is de"ned as SR(;, M )"Max Sim(;, M ). ! !I I

(8)

The sample U is recognized as character D if SR(;, M )"Max SR(;, M ). " ! !

(9)

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

5. Experiments Several experiments were carried out to verify the ability of the proposed feature-extraction approach for handwritten Chinese character recognition. We implemented the proposed approach in the C language and ran on a 486DX33 PC. The size of the processed character patterns is 96;96. The parameter a for controlling the tolerant degree of shortness of skeleton segments was conservatively set to 0.9. In the execution of the approach, the determination of direction codes of black pixels is time consuming since it is an iteration process. Fortunately, in each iteration, only a few pixels have to update their direction codes, so we used a bu!ering technique to speed up the direction-code determination. If a pixel is updated in an iteration, only its 5;5 neighboring pixels need to be checked for updating in the next iteration. In the "rst iteration, the updating decision is performed on all black pixels; the coordinates and the new direction codes of the updated pixels are stored into a bu!er. In the following iterations, the direction codes of all bu!ered pixels are "rst updated by using the stored direction codes. Then the pixels in the 5;5 neighborhood of every bu!ered pixel are checked for updating. The neighboring regions of bu!ered pixels may be double processed, so a #agged technique is used to avoid rechecking. Forty Chinese characters used on Chinese landregister maps and checks were employed to evaluate the performance of the feature-extraction approach as shown in Fig. 10. Each character has forty samples that were written by forty persons and supplied by ITRI/CCL in Taiwan for evaluation. The characters on Chinese landregister maps may vary arbitrarily in location, scale, and orientation. In order to test the rotation invariance, all forty samples of every character were rotated to generate the rotated samples.

1955

The intermediate results (character skeletons) of the proposed feature-extraction approach are just the thinned results; however, they have less hairy and fork-point distortions. The extracted skeletons of eight character samples are shown in Fig. 11. In order to demonstrate that the skeleton distortions have been e!ectively solved by the proposed approach, the thinned results of the Zhang-and-Suen method [27] and the Wu-and-Tsai method [28] are also given in the "gure for comparison. As shown in the "gure, the skeletons of the proposed approach have the best-thinned quality. The straightness of the stroke skeletons is good enough, and the hairy and shortening problems were eliminated due to the proposed skeleton-segment extraction technique. With the aid of fork segments, the distortion at intersections of strokes was clearly reduced as shown in the "gure. Moreover, small holes in the border were disappeared in our results, but the other two methods could not eliminate any hole. The average execution time of the Zhang-andSuen method is 0.36 s per character, the Wu-and-Tsai method is 0.97 s, and the proposed skeletonization is 0.64 s. Although bu!ering technique was applied, the determination of direction codes of black pixels still took 0.47 s in average (it takes 73% process time). If the algorithm is executed on a parallel machine or a noniteration determination method is used, the processing time should be greatly reduced. Besides, as the thickness of strokes increases, the performance of the proposed approach over other methods is more signi"cant due to its less-dependence of stroke width. In the primitive-stroke extraction, the skeletons obtained by the proposed approach is good enough to e$ciently extract the primitive strokes. The extracted primitive strokes of eight character samples are shown in Fig. 12. However, the skeletons obtained by the other two methods must be intricately processed, such as hair-removal, fork-point merging, and line

Fig. 10. The samples of forty Chinese characters used in the experiments.

1956

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

Fig. 12. The primitive strokes. (a) The original characters; (b) the horizontal strokes; (c) the right-diagonal strokes; (d) the vertical strokes; and (e) the left-diagonal strokes.

Fig. 11. Character skeletons. (a) The original characters; (b) skeletons of the proposed approach; (c) thinned results of the Wu-and-Tsai method; and (d) thinned results of the Zhangand-Suen method.

approximation, to extract primitive strokes, and thus spend much time. In the handwritten Chinese character recognition, two experiments were carried out to evaluate the performance of the proposed classi"er for recognizing regular and rotated characters based on the stroke direction maps. Since the sizes of the character samples were di!erent, a moment normalization [6,7] was used to normalize character samples such that they are invariant to translation and scale. The cell size of a stroke direction map is 6;6; that is, the dimension of a direction map is 16;16. In the "rst experiment, we arbitrarily select several samples out of the forty samples of every character as the training samples to constitute the training set and the remaining samples of all characters were taken to be the

Table 1 The recognition rates of regular characters N

Training set

Testing set

Average

1 3 5 10 15 20

100% 100% 100% 100% 100% 100%

32.1% 63.2% 82.6% 92.0% 93.9% 95.0%

33.8% 66.0% 84.8% 94.0% 96.2% 97.5%

(N is the number of training samples for each character).

testing set. The recognition performances of the training and testing sets were evaluated, respectively. The recognition rate is de"ned as the ratio of the number of correctrecognized samples to the number of all samples. The recognition rates with a variety of the training-sample numbers for characters are given in Table 1. When the training-sample number of all characters is 1, the recognition rate is low. It means that one training sample for

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

1957

Table 2 The recognition rates of regular and rotated characters Regular sets

Rotated sets

K

Training set

Testing set

Average

Training set

Testing set

Average

8 12 16

100% 100% 100%

93.5% 93.4% 94.0%

96.8% 96.7% 97.0%

67.0% 86.5% 94.6%

64.2% 81.0% 88.0%

65.6% 83.8% 91.3%

K is the number of representative degrees.

a character does not capture su$cient variation information for recognition using the proposed classi"er. When the number is 5, the recognition rate is greatly increased. When the number is 10, the recognition rate is quiet good (94%). The recognition rate for 20 training samples already reaches 97.5%. The experimental results reveal that when the number of training samples of each character is greater than ten, the direction maps can capture su$cient variation information for recognition. However, the direction maps just represent the geometrical structures of characters; hence, if the character structures are more similar, misclassi"cation is easier to occur; for example, two cases of character samples with and

with

always involve more mis-

classi"cation. In the study of Chen and Lieh [15], fourteen Chinese characters used on Chinese checks are represented by 2-layer attributed graphs and recognized by a relaxation matching method; the recognition rate is 96.3%. In the study of Hsieh [19], "fty-one Chinese postal characters are represented by unordered stroke sets and recognized by a linear programming approach; the recognition rate is 93%. Comparing with these approaches, our approach is quite simple and the recognition rate is also good enough. In the second experiment, the recognition performance of regular and arbitrarily rotated characters was evaluated. To recognize arbitrarily rotated characters, a character is represented by several reference direction maps at di!erent representative degrees as described in Section 4. We arbitrarily chose 20 samples out of the 40 samples of every character as the training samples to constitute the regular training set and the remaining 20 samples of every character were taken to be the regular testing set. The 20 samples of every character in the training set were rotated to K representative degrees to generate the rotated training samples, and each character was represented by K direction maps for the K representative rotation angles. The 20 samples of every character in the testing set were arbitrarily rotated to generate the rotated testing samples. Thus, four di!erent character sets were tested, respectively. The evaluation results are given in Table 2;

in the table, the recognition rates with three di!erent numbers of representative degrees are given. The recognition rate of regular characters is still remained about 97% in average; this means that the recognition performance is not a!ected by the reference direction maps at the other representative degrees. When 16 representative degrees were used, the recognition rate of the arbitrarily rotated characters had reached 91.3% in average. In the invariant recognition, the experimental data set is a superset of the data set used in our previous work [2]. In that method, ring data and Moment invariants are the employed rotation-invariant features. The performance of this proposed approach is much superior to that of the previous method ((71%), but this proposed approach needs large storage space. These experimental results reveal that the rotation-invariant features are insu$cient to overcome the large shape variation of handwritten Chinese characters, and structural features can tolerate high written variation [1,23] but are not rotation invariant. The proposed stroke direction maps are structural features and can tolerate a small degree of rotation, thus a limited rotated samples at representative degrees can cover the full range of rotation to recognize arbitrarily rotated characters at expense of large storage space.

6. Conclusions In order to recover the high complexity and variability for handwritten Chinese character recognition, the characters are generally represented by a set of structural features such as feature points, line segments, and strokes. However, the character strokes overlap each other, and strokes are usually ambiguous or misleading at intersections. In addition to the rough boundaries of strokes, the quality of extracted skeletons and strokes is greatly degraded. To solve the problem, we split character strokes into stroke and fork segments to specially deal the fork segments to extract skeletons and structural features for handwritten Chinese character recognition. The proposed skeleton-extraction approach itself can avoid the hairy problem and the fork-point distortion problem, so no additional process is needed to re"ne the

1958

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

skeletons. As shown in the experiments, the proposed approach produced "ne skeletons. The quality mainly comes from the reduction of the distortion at fork points and the insensitivity to noises. The geometrical structures of crossing lines, sharp corners, and straight lines are mostly preserved. The "ne skeletons facilitate the extraction of structural features: corner points, line segments, primitive strokes, and stroke direction maps. Invariant handwritten Chinese character recognition is a very di$cult problem [2,3]. The proposed classi"er based on the stroke direction maps worked well for recognizing regular and rotated handwritten Chinese characters. The recognition results of arbitrarily rotated characters are clearly superior to those of the previous invariant recognition approaches in which ring data and Moment invariants were used as the invariant features [2,3]. In the proposed approach, the determination of the direction codes of black pixels is a bottleneck. The process is time consuming since it is an iteration process. However, the process is parallel for all pixels; if the algorithm is executed on a parallel machine, the processing time can be greatly reduced. In the current implementation, we use a bu!ering technique to reduce the processing time. If a non-iterative determination method is used, the processing time should be greatly improved. Furthermore, if a fuzzy theory or relaxation approach is able to incorporated into the direction-code determination method, the proposed approach should be able to skeletonize low-quality or gray-level patterns [33,44,45]. Only four types of primitive strokes are extracted in this study. Based on the extracted skeleton segments and primitive strokes, we can further extract other types of strokes [30] with the aid of the relevant stroke knowledge. It is a remarkable research to study the knowledge-based stroke extraction method and the related recognition techniques for handwritten Chinese characters. The character set used in our experiments is small. When a large set of characters is met, a clustering technique [1,3,4,40,41] may be applied. The extracted features include the numbers of strokes and feature points are suitable for preclassi"cation [3]. Line segments, strokes, and stroke direction maps can be used for preclassi"cation and classi"cation. It is a worthy study to use partial extracted features to preclassify a larger set of characters and use the other features to classify the characters in a cluster. Recently, preclassi"cation and classi"cation of handwritten Chinese characters based on basic stroke substructures [40] and radicals [22,41] have been proposed and have attracted the researchers' attention. In the future, we will try to use the extracted strokes to compose stroke substructures or radicals, and to employ these features to preclassify and recognize a large set of handwritten Chinese characters.

References [1] T.H. Hildebrand, W. Liu, Optical recognition of handwritten Chinese characters: Advances since 1980, Pattern Recognition 26 (2) (1993) 205}225. [2] H.-P. Chiu, D.-C. Tseng, Invariant handwritten Chinese character recognition using fuzzy min-max neural networks, Pattern Recognition Letters 18 (1997) 481}491. [3] D.-C. Tseng, H.-P. Chiu, J.-C. Cheng, Invariant handwritten Chinese character recognition using fuzzy ring data, Image Vision Comput. 14 (9) (1996) 647}657. [4] J.T. Tou, R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, Reading, MA, 1974. [5] B. Hussain, M.R. Kabuka, A novel feature recognition neural network and its application to character recognition, IEEE. Trans. Pattern Anal. Mach. Intell. 16 (1) (1994) 98}106. [6] S.J. Perantonis, P.J.G. Lisboa, Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classi"ers, IEEE Trans. Neural Networks 3 (2) (1992) 241}251. [7] A. Khotanzad, J.H. Lu, Classi"cation of invariant image representations using a neural network, IEEE Trans. ASSP 38 (6) (1990) 1028}1038. [8] T.A. Mai, C.Y. Suen, A generalized knowledge-based system for the recognition of unconstrained handwritten numerals, IEEE Trans. Systems Man Cybernet. 20 (4) (1990) 835}848. [9] T. Pavlidis, A vectorizer and feature extractor for document recognition, Comput. Vision Graphics and Image Process. 35 (1986) 111}127. [10] Y. LeCun, B. Boser, L.D. Jackel, Handwritten digit recognition application of neural network chips and automatic learning, IEEE Commun. (1989) 41}46. [11] K.T. Blackwell, T.P. Vogl, S.D. Hyman, G.S. Barbour, D.L. Alkon, A new approach to handwritten character recognition, Pattern Recognition 25 (6) (1992) 655}666. [12] C. Yuceer, K. O#azer, A rotation, scaling, and translation invariant pattern classi"cation system, Pattern Recognition 26 (5) (1993) 687}710. [13] X. Ying, S. Chengjian, Recognizing restricted handwritten Chinese characters by structure similarity method, Pattern Recognition Lett. 11 (1990) 67}73. [14] X. Huang, J. Gu, Y. Wu, A constrained approach to multifont Chinese character recognition, IEEE Trans. Pattern Anal. Mach. Intell. 15 (8) (1993) 838}843. [15] L.-H. Chen, J.-R. Lieh, Handwritten character recognition using a 2-layer random graph model by relaxation matching, Pattern Recognition 23 (11) (1990) 1189}1205. [16] C.H. Leung, Y.Y. Cheung, Y.L. Wong, A knowledge-based stroke-matching method for Chinese character recognition, IEEE Trans. Systems Man Cybernet. 17 (6) (1987) 993}1003. [17] B. Chen, H.-J. Lee, Recognition of handwritten Chinese characters via short line segments, in: Proc. Int. Computer Symp., Hsinchu, Taiwan, Dec. 1990, pp. 117}122. [18] F.H. Cheng, New stroke merging and matching for handwritten Chinese character recognition, in: Proc. 5th National Workshop on Character Recognition and Document Analysis, Chung-li, Taiwan, Jan. 1996, pp. 105}112. [19] A.J. Hsieh, Extension of bipartite weighted matching problem and their application to handwritten Chinese

H-P. Chiu, D-C. Tseng / Pattern Recognition 32 (1999) 1947}1959

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27] [28]

[29]

[30]

[31]

character recognition. Ph.D. Dissertation, Institute of Computer Science and Information Engineering, National Central University, Chung-li, Taiwan, 1995. C.-W. Liao, J.-S. Huang, A transformation invariant matching algorithm for handwritten Chinese character recognition, Pattern Recognition 23 (11) (1990) 1167}1188. Y.K. Wang, Pattern recognition applications of genetic algorithms. Ph.D. Dissertation, Institute of Computer Science and Information Engineering, National Central University, Chung-li, Taiwan, 1995. A.B. Wang, Radical-based handwritten Chinese character recognition by hierarchical matching. Ph.D. Dissertation, Institute of Computer Science and Information Engineering, National Central University, Chung-li, Taiwan, 1996. H. Ogawa, K. Taniguchi, Thinning and stroke segmentation for handwritten Chinese character recognition, Pattern Recognition 15 (4) (1982) 299}308. F.H. Cheng, W.H. Hsu, M.Y. Chen, Recognition of handwritten Chinese characters by modi"ed Hough transform techniques, IEEE Trans. Pattern Anal. Mach. Intell. 11 (4) (1989) 429}439. L. Lam, S.-W. Lee, C.Y. Suen, Thinning methodologies } a comprehensive survey, IEEE Trans. Pattern Anal. Mach. Intell. 14 (9) (1992) 869}885. Y.-S. Chen, W.-H. Hsu, A modi"ed fast parallel algorithm for thinning digital patterns, Pattern Recognition Lett. 7 (1988) 99}106. T.Y. Zhang, C.Y. Suen, A fast parallel algorithm for thinning digital patterns, Commun. ACM 27 (3) (1984) 236}239. R.-Y. Wu, W.-H. Tsai, A new one-pass parallel thinning algorithm for binary images, Pattern Recognition Lett. 13 (1992) 715}723. L. Lam, C.Y. Suen, An evaluation of parallel thinning algorithms for character recognition, IEEE Trans. Pattern Anal. Mach. Intell. 17 (9) (1995) 914}919. H.H. Liu, Extraction of strokes in o!-line handwritten Chinese characters, Master Thesis, Institute of Computer Science and Information Engineering, National ChiaoTung University, Hsinchu, Taiwan, 1994. C.W. Liao, J.S. Huang, Stroke segmentation by bernsteinbezier curve "tting, Pattern Recognition 23 (5) (1990) 475}484.

1959

[32] C. Chouinard, R. Plamondon, Thinning and segmenting handwritten characters by line following, Mach. Vision Appl. 5 (1992) 185}197. [33] S.-S. Yu, W.-H. Tsai, A new thinning algorithm for grayscale images by relaxation technique, Pattern Recognition 23 (10) (1990) 1067}1076. [34] P.-N. Chen, Y.-S. Chen, W.-H. Hsu, Stroke relation coding } a new approach to the recognition of multi-font printed Chinese characters, Comput. Process. Chinese Oriental Languages 3 (3) (1988) 319}330. [35] S.W. Lu, H. Xu, False stroke detection and elimination for character recognition, Pattern Recognition Lett. 13 (1992) 745}755. [36] O. Baruch, Line thinning by line following, Pattern Recognition Lett. 8 (1988) 271}276. [37] J.Y. Lin, Z. Chen, A Chinese-character thinning algorithm based on global features and contour information, Pattern Recognition 28 (4) (1995) 493}512. [38] L.H. Chen, A new approach for handwritten character stroke extraction, Comput. Process. Chinese Oriental Languages 6 (1) (1992) 1}17. [39] L.Y. Tseng, C.T. Chuang, An e$cient knowledge-based stroke extraction method for multi-font Chinese characters, Pattern Recognition 25 (12) (1992) 1445}1458. [40] R.H. Cheng, C.W. Lee, Z. Chen, Preclassi"cation of handwritten Chinese characters based on basic stroke substructures, Pattern Recognition Lett. 16 (1995) 1023}1032. [41] C.C. Han, Y.L. Tseng, K.C. Fan, A.B. Wang, Coarse classi"cation of Chinese characters via stroke clustering method, Pattern Recognition Lett. 16 (1995) 1079}1089. [42] K.W. Gan, K.T. Lua, A new approach to stroke and feature point extraction in Chinese character recognition, Pattern Recognition Lett. 12 (1991) 381}387. [43] H.-P. Chiu, D.-C. Tseng, A feature-preserved thinning algorithm for handwritten Chinese characters, Signal Processing 58 (2) (1997) 203}214. [44] L. Coetzee, E.C. Botha, Fingerprint recognition in low quality images, Pattern Recognition 26 (10) (1993) 1441}1460. [45] S.-S. Chen, F.Y. Shih, Skeletonization for fuzzy degraded character images, IEEE Trans. Image Process. 5 (10) (1996) 1481}1485.

About the Author*HUNG-PIN CHIU received the B.S. degree in Computer Science from Tankang University, Taiwan, in 1987, and the M.S. degree in Information Engineering from National Central University, Taiwan, in 1989. After receiving his M.S. degree, he worked in the industry for several years. He received his Ph.D. degree in Computer Science and Information Engineering from National Central University, Taiwan, in 1997. His current research interests include pattern recognition, image processing, fuzzy theory, neural networks, object-oriented programming, and Internet-related technologies. About the Author*DIN-CHANG TSENG was born in Taipei, Taiwan, in 1957. He received his Ph.D. degree in Information Engineering from National Chiao-Tung University, Hsinchu, Taiwan, in June 1988. He is currently Professor of the Department of Computer Science and Information Engineering in National Central University, Chung-li, Taiwan. He is a member of The IEEE, The Chinese Image Processing and Pattern Recognition Society, and Chinese Geographic Information Association. His current research interests include image processing, virtual reality, computer vision, and computer graphics; especially in the topics multispectral/ hyperspectral remote-sensing image processing, model-based halftone printing, laproscopic surgical simulation, and multi-resolution terrain modeling.

A novel stroke-based feature extraction for handwritten Chinese character recognition

A novel stroke-based feature extraction for handwritten Chinese character recognition

Recommend Documents