Optimal stroke-correspondence search method for on-line character recognition

Optimal stroke-correspondence search method for on-line character recognition

Pattern Recognition Letters 23 (2002) 601–608 www.elsevier.com/locate/patrec Optimal stroke-correspondence search method for on-line character recogn...

185KB Sizes 3 Downloads 49 Views

Pattern Recognition Letters 23 (2002) 601–608 www.elsevier.com/locate/patrec

Optimal stroke-correspondence search method for on-line character recognition Jung-pil Shin

*

Department of Computer Software, The University of Aizu, Tsuruga Ikki-machi Aizu-Wakamatsu City, Fukushima 965-8580, Japan Received 26 March 2001; received in revised form 3 July 2001

Abstract This paper describes an optimal stroke-correspondence search method that makes possible stroke-order-free on-line character recognition. During the stroke-correspondence search process, conventional individual stroke-information regarding the shape and position of each stroke, and interstroke-information regarding the mutual relationships among strokes are both employed. The optimal path search for stroke-correspondence, being based on an optimal criterion including both intra- and interstroke-information, is systematically carried out. The reasonable level stroke-correspondence search is achieved partially by using the information regarding the actually occurring stroke-order, which does not hinder the framework of stroke-order-free recognition due to its use of statistically stable information. A large improvement in both computational time and recognition-accuracy were achieved in the current experiments. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: On-line character recognition; Stroke-correspondence; Stroke-order information; Dynamic programming; Markov cube search

1. Introduction For on-line recognition of large-alphabet languages such as Chinese or Japanese, much research has been carried out in an attempt to address three major issues: stroke-order-free recognition, stroke-number-free recognition, and the robustness of stroke-deformation (Nakagawa, 1990; Tappert et al., 1990). Previous works on stroke-order-free Chinese character recognition

*

Tel.: +81-242-37-2704; fax: +81-242-37-2731. E-mail address: [email protected] (J.-p. Shin).

have been carried out by Odaka et al. (1982) and Wakahara et al. (1983, 1996). They define strokedistance matrix as the distance between each stroke of the input pattern and each stroke of the reference pattern. Based on this matrix, the correspondence between input-pattern and referencepattern strokes is determined. The problem with this approach, however, is that though it determines the rule of stroke-correspondence, the closest input-pattern stroke, i.e., that with the smallest stroke-distance, to the reference-pattern stroke is locally selected for correspondence (Odaka et al., 1982). Hence, there is the possibility of instability in the correspondence selection. Though previous

0167-8655/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 1 ) 0 0 1 3 6 - 2

602

J. Shin / Pattern Recognition Letters 23 (2002) 601–608

research (Wakahara et al., 1983, 1996) has identified an all-encompassing evaluation standard for making the stroke-distance summation the smallest, the computation is carried out using local correspondence. If the correct solution is determined by an exhaustive search, the computational amount becomes OðN !Þ when the stroke-count is N . Since the above-described pioneering works, Sakoe and Shin (1997) have investigated possible pragmatic algorithms for ‘‘minimizing the strokedistance summation’’ used in (Wakahara et al., 1983), taking into account recent developments of modern computers. The problem of optimal correspondence is formalized as a problem of optimal path on a cube-shaped graph, with the strokedistance considered to be the cost. Based on carrying out the computation with dynamic programming (DP), the computation order is OðN  2N Þ for a N stroke-character. This search method is called the Cube Search. Character-recognition research, however, typically involves dividing the features of a character into intrastroke information representing the shapes and positions of the individual strokes, and interstroke information representing relative relations among strokes, e.g., relative positions, length ratios, and topological relations. With regard to on-line character recognition, however, the major emphasis has been utilization of intrastroke information to match the stroke-curve (Yoshida and Sakoe, 1982; Wong and Fallside, 1985; Lin et al., 1993; Nakagawa and Akiyama, 1994; Hsieh et al., 1995; Wakahara et al., 1996), analysis or classification of stroke-codes (Terai and Nakata, 1973; Yhap and Greanias, 1981; Yurugi et al., 1985; Shiau et al., 1988), and so on. In these methods, the application of interstroke information using postprocessing is feasible. That is, evaluating and determining the relative features between strokes of an input pattern based upon stroke-correspondence between input and reference patterns, as determined by certain types of stroke-correspondence processing in which intrastroke-information is used, is possible (Wakahara and Odaka, 1997; Shin and Sakoe, 1999). In a recognition system involving structural analysis for a stroke-correspondence search, however, obtaining high accuracy for both stroke-correspondence and character

recognition requires the use of interstroke-information even during the search process. As one modification of the Cube Search, a practical search algorithm of exponential-order is described that incorporates evaluation of interstroke-information among three adjoining strokes during the stroke-correspondence search. A reasonable level stroke-correspondence searching is realized in part by using information regarding the actual stroke-order, which does not hinder the framework of the stroke-order-free recognition using statistically stable information. The algorithm that partially incorporates the stroke-order information is expected to reduce the computation time and to improve the recognition-accuracy by neglecting unreal stroke-correspondences. Experiments demonstrate a large improvement in both the computation time and recognition-accuracy.

2. Stroke-order-free recognition principle 2.1. Stroke-correspondence problem An on-line input character is expressed as an ordered series of writing strokes, i.e., A ¼ A 1 A2    A k    A N ;

ð1Þ

where the kth stroke Ak is the time-sequence representation of the local feature aik of a character, e.g., x–y coordinates or stroke-direction, being expressed as Ak ¼ a1k a2k    aik    aIk ;

I ¼ IðkÞ:

ð2Þ

The reference pattern is similarly expressed as B ¼ B1 B2    B l    BM ; Bl ¼ b1l b2l    bjl    bJl ;

ð3Þ J ¼ J ðlÞ:

ð4Þ

Finally, N is the stroke-number of the input pattern and M the stroke-number of the reference pattern, with N being equal to M for correct stroke-number recognition. The dissimilarity measure between the inputpattern stroke Ak and the reference-pattern stroke Bl is calculated using intrastroke-information regarding the shape and position; being denoted as dðk; lÞ and referred to as the stroke-distance. One-to-one stroke-correspondence is defined by

J. Shin / Pattern Recognition Letters 23 (2002) 601–608

bijection flðkÞg to the stroke-number l of the reference pattern from the stroke-number k of the input pattern. As an evaluation standard of optimum correspondence, the sum total of strokedistance dðk; lÞ is used. More specifically, based on intrastroke-information, the solution of the following minimization problem is considered to give optimal stroke-correspondence in which the minimum value DðA; BÞ is chosen as the measure of matching (i.e., the dissimilarity) (Wakahara et al., 1996) " # N X DðA; BÞ ¼ min dðk; lðkÞÞ : ð5Þ flðkÞg

k¼1

Solving this stroke-correspondence determination problem provides a structural analysis of the pattern; hence, the dissimilarity DðA; BÞ and stroke-correspondence flðkÞg between patterns are obtained as results. 2.2. First-order Markov cube search Sakoe and Shin (1995, 1997) have proposed a method using a cube graph (Fig. 1) as the strokecorrespondence search graph in which only intrastroke-information is used (Eq. (5)). This graph is an N -dimensional hypercube in which each node corresponds to one state and a number. The numbers within the brackets under each node are

Fig. 1. First-order Markov cube-search graph (N ¼ 4).

603

the state numbers, and the numbers inside each node are binary representations of the same expression. Each state is comprised of N bits, and each bit position corresponds to the referencepattern stroke-number l in which the least significant bit denotes the first stroke. State transition involving the lth reference stroke from stage k  1 to k inverts the lth bit from ‘0’ to ‘1’. The stroke distance dðk; lÞ representing cost is attached to the edge where the lth bit is inverted. The stroke-order determination problem becomes an optimal path search problem from the initial node 0 ¼ ð00    0Þ to the final node 2N  1 ¼ ð11    1Þ. Due to its similarity to a simple Markov model (Fig. 1), we call this prototype a first-order Markov cube search and denote it as FMCS. 2.3. Use of interstroke information In addition to using character intrastroke-information, information regarding the relative positions between strokes can provide topological properties of a character such that character recognition is more effective. In particular, the length ratios between strokes are important features in discriminating several particular characters, being commonly referred to as interstroke-information. The use of vectors among start or end points can reflect the relative position and length ratio between strokes. As shown in Fig. 2(a), the

Fig. 2. Interstroke-information used for discriminating typical Chinese characters.

604

J. Shin / Pattern Recognition Letters 23 (2002) 601–608

Chinese characters ‘‘’’ and ‘‘’’ can be discriminated using relative position information between the first and second strokes included in the vector d ss , while characters ‘‘’’ and ‘‘’’ in (b) can be discriminated by considering that in d es and d ee . d ss , d ee , and d es are vectors between start points, between end points, between end and start points, respectively. In Fig. 2(c), the characters ‘‘’’ and ‘‘’’ can be discriminated using length-ratio information between the first and third strokes included in d ss and d ee , while characters ‘‘’’ and ‘‘’’ in (d) can be discriminated using length-ratio information in d ss and d ee . While the use of interstroke-information is known to be effective in character recognition with regard to recognition/stroke-correspondence accuracy, the effectiveness reaches saturation only when interstroke-information from the adjacent three stroke-pairs in the stroke-order is added to the evaluation, which is then called a ‘‘third-order model’’. In other words, comparable performance to the model using all possible stroke-pairs, called the ‘‘full-order model’’, is obtained such that the use of interstroke-information from the third-order model is nearly sufficient (Shin et al., 1999). By considering both inter- and intrastroke-information, the stroke-correspondence determination problem is formulated as " ( N X DðA; BÞ ¼ min ð1  wÞ  dðk; lðkÞÞ flðkÞg

þw

2.4. Third-order Markov cube search Based on the results of Section 2.3, an optimal stroke-correspondence search algorithm is used as a practical algorithm to solve Eq. (6) based on the search graph of Fig. 3. Note that, in order to improve clarify, not all edges and nodes are shown. By subdividing each node of the search graph of Fig. 1, subnodes are prepared for each permutation of stroke-pairs that is chosen from among the corresponding strokes. In addition, the last two strokes from the already matched strokes of the reference pattern, namely lðk  1Þ and lðkÞ, are memorized at the subnode. ð1  wÞ  dðk; lðkÞÞ þ P qðk; lðkÞ; p; lðpÞÞ is attached as the cost w  k1 p¼k2 in which the edge represents an inversion of the lth bit from ‘0’ to ‘1’ during the transition from state k  1 to state k. For example, subnode h1 2i of node (1 0 1 1) in Fig. 3 indicates that 1. the first, second, and fourth strokes of the reference pattern have already been matched, 2. the second and third strokes of the input pattern, respectively, correspond to the first and second strokes of the reference pattern. ð1  wÞ  dð3; 2Þ þ w  fqð3; 2; 1; 4Þ þ qð3; 2; 2; 1Þg, being the cost of the edge resulting in subnode h1 2i of (1 0 1 1) from subnode h4 1i of (1 0 0 1), is associated with the corresponding edge. By this process, Eq. (6) is calculated. This prototype is called a third-order Markov cube search and is denoted as TMCS.

k¼1 k1 X

)# qðk; lðkÞ; p; lðpÞÞ

;

ð6Þ

p¼k2

where w is a weighting coefficient. qðk; l; p; qÞ is the evaluated difference between 1. interstroke-information between the kth and pth strokes of the input pattern, 2. interstroke-information between the lth and qth strokes of the reference pattern, where interstroke-information can be computed by a certain definition. P Note that in the full-order model, p ¼ qðk; lðkÞ; k  1; lðk  1ÞÞ þ    þ qðk; lðkÞ; 1; P lð1ÞÞ, and if only stroke-distance is used for p ¼ 0, this is called a ‘‘first-order model’’ (Eq. (5)).

Fig. 3. Third-order Markov cube-search graph (N ¼ 4).

J. Shin / Pattern Recognition Letters 23 (2002) 601–608

2.5. Beam search DP Although the above stroke-correspondence searches by the first- and third-order models are an optimal-path search process using DP, the calculation is made more efficient by including a beam search within the DP process (Sakoe and Shin, 1997). That is, once processing is completed at any stage k, the minimum Gmin among the accumulated evaluation values computed at each node/subnode of stage k is determined. Next, margin-constant k is added to Gmin to obtain threshold value hðkÞ ¼ Gmin þ k, which is used in the beam-search pruning operation.

3. Use of stroke-order information The previously mentioned algorithm has the inherent advantages of being completely free with regard to stroke-order. However, a large percentage of unreal stroke-correspondences are also carried out. It is expected that a reasonable level of stroke-correspondence searching would be realized by using information regarding the actually occurring stroke-order. Further, there is no hindrance to the framework of stroke-orderfree recognition with the use of statistically stable information. Development of a recognition framework incorporating stroke-order information is expected to result in a reduction in computation time and an improvement in the recognition-accuracy by neglecting unreal strokecorrespondences. Stroke-order variation among writers is caused primarily by personal writing style. The following are considered with regard to stroke-order variations. 1. How much range does the fluctuation of strokeorder show in comparison with the standard stroke-order, or 2. does the stroke-order change occur completely at random? For the purposes of using stroke-order information in the stroke-order-free framework, the stroke-order change in real data is investigated based on these viewpoints.

605

The investigation data included 90 samples for each of 2965 Chinese character categories written by 90 subjects. The subjects were directed to write cursively in a normal manner. To analyze the stroke-order, we used the totaled 172 428 characters with correct stroke-number. First, using the Cube-Search method, the correct stroke-correspondence between the input and reference patterns is automatically searched for by backtracking. Some of the wrong stroke-correspondences are manually converted to the appropriate stroke-correspondences by observation of these characters. These errors are due to strokes written on extremely different positions. An analyzed result is shown by using the frequency of an example included in Table 1. Ckl refers to the number of input patterns in which the lth stroke of the reference pattern is written as the kth stroke of the input pattern. In the example of ‘‘’’, based on 9 of 59 people having written the third stroke to the first stroke incorrectly, the deviations of nine writers are shown for the distribution of the first, second, and third strokes. In the results of the stroke-order analysis, all characters of 76% are written in accordance with standard stroke-order. Further, the distribution of remaining characters is concentrated along a diagonal line. In the occurrence Table 1, if the maximum distance having a non-zero element from the diagonal line is r, it is called a r-gap, and Fig. 4 shows the percentage of the characters having this r-gap. Table 1 Occurrence Ckl of correct stroke-correspondence between (a) input-pattern stroke k and (b) reference-pattern stroke l

606

J. Shin / Pattern Recognition Letters 23 (2002) 601–608

Fig. 4. Stroke correspondence distribution.

The characters of 95.7% and 98.1% are included in the ranges of r ¼ 2 and r ¼ 3, respectively. Based on the above stroke-correspondence results, the following conditions arise. 1. If there is no case in which the lth stroke of the reference pattern is written as the kth stroke of the input pattern, i.e., Ckl ¼ 0, the ðk; lÞ correspondence is excluded by dðk; lÞ ¼ 1 and qðk; l; p; lðpÞÞ ¼ 1. 2. Based on the tendency that the stroke-correspondences are concentrated along a diagonal line, if jk  lj > r, the ðk; lÞ correspondence is excluded by dðk; lÞ ¼ 1 and qðk; l; p; lðpÞÞ ¼ 1. In the example of ‘‘’’, only 15 pairs of Ckl 6¼ 0 or 34 pairs within r ¼ 2 are considered as objects for which to search for correspondences.

4. Experiments The usefulness of the presented method was demonstrated by PC-performed recognition experiments (Pentium III processor, 700 MHz). Training data consisted of 2965 Chinese character categories, which were used as the investigation data in Section 3. Reference patterns were generated from training data by storing average values of the loci of feature points from strokes extracted by rearranging according to correct stroke-order. Test data were similarly provided by another 24 writers and consisted of the same character categories as those for the training data (total ¼ 69 900 characters). Test data totaled 48 631 characters,

limited to characters with correct stroke-number. The input character is transformed into a 128  128 mesh plane by preprocessing steps of redundant elimination, smoothing, size-normalization, and feature-point extraction. As feature information, the x–y coordinates and movement directional vector between one point and the next are extracted from character data. Feature information regarding the input and reference patterns is placed into aik and bjl , respectively. Using DP-matching, the stroke-distance dðk; lÞ is calculated by the weighted sum of 1. the distance between x–y coordinate sequences, 2. the distance between directional vector sequences (Sakoe and Shin, 1997). An asymmetric DP equation is employed for DPmatching, i.e., 2 3 gði  1; jÞ gði; jÞ ¼ dði; jÞ þ min 4 gði  1; j  1Þ 5; ð7Þ gði  1; j  2Þ where dði; jÞ is the Euclidian distance or the directional difference between feature points aik and bjl . The evaluation value q of the difference between interstroke-information is determined using the distance qðk; l; p; qÞ ¼ 14½Rðd ss ðAp ; Ak Þ; d ss ðBq ; Bl ÞÞ þ R  ðd se ðAp ; Ak Þ; d se ðBq ; Bl ÞÞ þ R  ðd es ðAp ; Ak Þ; d es ðBq ; Bl ÞÞ þ R  ðd ee ðAp ; Ak Þ; d ee ðBq ; Bl ÞÞ:

ð8Þ

where l denotes the stroke-number, Bl the lth stroke of the reference pattern, d ss ðAp ; Ak Þ the vector from the start point of Ap to the start point of Ak , d se ð; Þ the vector from the start point to end point, d es ð; Þ the vector from the end point to start point, and d ee ð; Þ the vector from end point to end point. Rð; Þ is the weighted sum of the directional difference (h) and longitudinal difference ðeÞ between vectors (Fig. 5). Recognition results were obtained using a forcedecision that selects the candidate with minimum dissimilarity DðA; BÞ. Optimal weighting factors were determined for each experiment. Since a tradeoff exists between recognition rate/stroke-corre-

J. Shin / Pattern Recognition Letters 23 (2002) 601–608

607

1. the existence of extremely similar characters, 2. heavy deformation of character.

5. Conclusion

Fig. 5. Vectors used as interstroke-information.

spondence search time and the margin constant k that controls the intensity of pruning, a preliminary experiment was performed using k ¼ 91. Only a slight change occurs in the recognition rate with a value greater than 91. The average value of cost, i.e., the weighted sum of d and q, was 214. Table 2 summarizes the experiment results using (a) FMCS, (b) TMCS, (c) full-order and (d) TMCS incorporated stroke-order information models. The improvement of recognition rate from FMCS to TMCS is remarkable, and performance comparable to full order is obtained. Moreover, a large improvement is achieved by using strokeorder information on TMCS. The stroke-correspondence search time based on stroke-order information can be reduced by approximately 1/3. Note that the approach using the full-order model lacks applicability, as it requires a prohibitively long computation time. Recognition time is the sum total of approximately 0.51 s, which is total calculation time of d and q, and these search times. The reasons for the remaining misrecognition are Table 2 Recognition results using (a) FMCS, (b) TMCS, (c) full-order and (d) TMCS incorporated stroke-order information models Model

(a)

(b)

(c)

(d)

Search time (s) Recognition rate (%)

0.31 98.14

0.74 98.93

13.68 98.96

0.24 99.28

As a means of realizing stroke-order-free character recognition, a new systematic stroke-correspondence search algorithm has been described and evaluated in this report. This algorithm has a novel advantage of efficiently searching for optimal stroke-correspondence based on an optimality criterion that includes intra- and interstrokeinformation. That is, the search algorithm has the ability to identify stroke-correspondence by simultaneously using intra- and interstroke-information. The improvement of recognition-accuracy is remarkable when using interstroke-information among three adjacent strokes, achieving performance comparable to that obtained with a fullorder method. Regarding the framework of this stroke-order-free recognition, a reasonable level of stroke-correspondence can be identified by using information regarding actually occurring strokeorder. In the experiments, a large improvement in both computational time and recognition-accuracy was demonstrated based on a simple comparison of the degree of dissimilarity using 2965 different Chinese characters. Since this paper has focused on the feasibility of using interstroke-information, the relation between model-orders, and the effectiveness of using stroke-order information, the investigation was limited to an experiment involving a fixed strokenumber. It is considered that dealing with the problem of stroke-number-free recognition is possible by incorporating the inter-stroke-information evaluation into a multilayered TMCS structure.

References Hsieh, A., Fan, K., Fan, T., 1995. Bipartite weighted matching for on-line handwritten Chinese character recognition. Pattern Recognition 28 (2), 143–151. Lin, C.K., Fan, K.C., Lee, F.T.P., 1993. On-line recognition by deviation-expansion model and dynamic programming. Pattern Recognition 26 (2), 259–268.

608

J. Shin / Pattern Recognition Letters 23 (2002) 601–608

Nakagawa, M., 1990. Non-keyboard input of japanese text – on-line recognition of handwritten characters as the most hopeful approach. IPSJ Trans. Jpn. 13 (1), 15–34. Nakagawa, M., Akiyama, K., 1994. A linear-time elastic matching for stroke number free recognition of on-line handwritten characters. In: Proc. 4th IWFHR, December, pp. 48–56. Odaka, K., Wakahara, T., Masuda, I., 1982. Stroke order free on-line handwritten character recognition. IECE Trans. Jpn. J65-D (6), 679–686 (in Japanese). Sakoe, H., Shin, J., 1995. A stroke order search algorithm for on-line character recognition. Techn. Rep. IEICE PRU-95 (59), 55–60 (in Japanese). Sakoe, H., Shin, J., 1997. A stroke order search algorithm for online character recognition. Res. Rep. on Information Science and Electrical Engineering of Kyushu University 2 (1), 99–104 (in Japanese). Shiau, S.L., Chen, J.W., Hsieh, A.J., Kung, S.J., 1988. On-line handwritten Chinese character recognition by string matching. In: Proc. 1988 Internat. Conf. on Comput. Processing on Chinese and Oriental Languages, pp. 76–80. Shin, J., Sakoe, H., 1999. Stroke correspondence search method for stroke-order and stroke-number free on-line character recognition – multilayer cube search. IEICE Trans. Inform. Syst. J82-D-II (2), 230–239 (in Japanese). Shin, J., Ali, M.M., Katayama, Y., Sakoe, H., 1999. Stroke order free on-line character recognition algorithm using inter-stroke information. IEICE Trans. Inform. Syst. J82D-II (3), 382–389 (in Japanese). Tappert, C.C., Suen, C.Y., Wakahara, T., 1990. The state of the art in on-line handwriting recognition. IEEE Trans. Pattern Anal. Machine Intell. 12 (8), 787–808.

Terai, H., Nakata, K., 1973. On-line real-time recognition of handwriting Chinese characters and Japanese katakana syllabary. IECE Trans. Jpn. J56-D (May), 312–319 (in Japanese). Wakahara, T., Odaka, K., 1997. On-line cursive Kanji character recognition using stroke-based affine transformation. IEEE Trans. Pattern Anal. Machine Intell. 19 (12), 1381– 1385. Wakahara, T., Odaka, K., Umeda, M., 1983. Stroke number and order free on-line character recognition by selective stroke linkage method. IECE Trans. Jpn. J66-D (5), 593– 600 (in Japanese). Wakahara, T., Suzuki, A., Nakajima, N., Miyahara, S., Odaka, K., 1996. Stroke-number and stroke-order free on-line Kanji character recognition as one-to-one stroke correspondence problem. IEICE Trans. Inform. Syst. E79-D (5), 529– 534. Wong, K.H., Fallside, F., 1985. Dynamic programming in the recognition of connected handwritten script. In: Proc. Second Conf. on Artificial lntell. Appl.. IEEE Computer Societies, Silver Spring, MD, pp. 666–670. Yhap, E.F., Greanias, E.C., 1981. An on-line Chinese character recognition system. IBM J. Res. Dev. 25 (May), 187–195 (see also, p. 282). Yoshida, K., Sakoe, H., 1982. Online handwritten character recognition for a personal computer system. IEEE Trans. Consumer Electron. 28 (3), 202–208. Yurugi, M., Nagata, S., Onuma, K., Kubota, K., 1985. Online character recognition by hierarchical analysis method. IECE Trans. Jpn. J68-D (6), 1320–1327 (in Japanese).