Handwritten Chinese character recognition using nonlinear active shape models and the Viterbi algorithm

Pattern Recognition Letters 23 (2002) 1853–1862 www.elsevier.com/locate/patrec Handwritten Chinese character recognition using nonlinear active shape...

Download PDF

156KB Sizes 0 Downloads 49 Views

Report

PDF Reader
Full Text

Pattern Recognition Letters 23 (2002) 1853–1862 www.elsevier.com/locate/patrec

Handwritten Chinese character recognition using nonlinear active shape models and the Viterbi algorithm Daming Shi, Steve R. Gunn, Robert I. Damper

*

Department of Electronics and Computer Science, Image, Speech and Intelligent Systems (ISIS) Research Group, University of Southampton, Southampton SO17 1BJ, UK Received 27 July 2001; received in revised form 13 February 2002

Abstract Since Chinese characters are composed from a small set of fundamental shapes (radicals) the problem of recognising large numbers of characters can be converted to that of extracting a small number of radicals and then ﬁnding their optimal combination. In this paper, radical extraction is carried out by nonlinear active shape models, in which kernel principal component analysis is employed to capture the nonlinear variation. Treating Chinese character composition as a discrete Markov process, we also propose an approach to recognition with the Viterbi algorithm. Our initial experiments are conducted on oﬀ-line recognition of 430,800 loosely-constrained characters, comprised of 200 radical categories covering 2154 character categories from 200 writers. The correct recognition rate is 93.5% characters correct (writer-independent). Consideration of published ﬁgures for existing radical approaches suggests that our method achieves superior performance. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Handwritten Chinese character recognition; Active shape model; Kernel principal component analysis; Viterbi algorithm

1. Introduction Chinese characters follow a hierarchical structure. A graph of the Chinese writing system stands not for a unit of pronunciation but for a morpheme, a minimal meaningful unit of the Chinese language (Sampson, 1985). These simple graphs are known as radicals (Chang, 1973; Suen and Huang,

*

Corresponding author. Fax: +44-2380-594588. E-mail addresses: [email protected] (D. Shi), srg@ ecs.soton.ac.uk (S.R. Gunn), [email protected] (R.I. Damper).

1984), and can compose many diﬀerent Chinese characters. Radical approaches decompose Chinese characters into a small set of categories, so the complex character recognition problem is converted to a simpler problem of radical extraction and optimisation of some combination of the radicals. In Fig. 1, if the redundant information is ignored, the four complex Chinese characters can be classiﬁed by recognising the following simple radicals, ‘‘ ’’, ‘‘ ’’, ‘‘ ’’, ‘‘ ’’, which should be far easier than recognising the whole characters. In this research, we consider position-dependent radicals––i.e., what might nominally appear to be the same-shaped radicals in diﬀerent positions within a

0167-8655/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 2 ) 0 0 1 5 8 - 7

1854

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

Fig. 1. Examples of complex Chinese characters decomposed into idealised radicals. They look similar, but they are absolutely diﬀerent (one-word) characters which mean tomb, dusk, curtain and subscription, respectively.

character are treated as diﬀerent radicals. We also assume a character is composed by adding radicals (up to four in this work) in a particular sequence as detailed later. Oﬀ-line handwritten Chinese character recognition is one of the most diﬃcult pattern recognition problems because it concerns complex structure, serious interconnection among the components, numerous pattern variations, and a large number of characters. Existing radical extraction methods of handwritten Chinese characters work either from a character skeleton or on the basis of strokes. Skeleton-based methods, such as Fukushima et al. (1991) and Chung and Ip (2001), treat a radical as a subimage of the character skeleton image. These methods aim to discover the relationship among the hierarchically represented graphs and capture their variations. Stroke-based methods, such as Liao and Huang (1990) and Wang and Fan (2001), decompose a radical further into its primitive structural parts, i.e., straight-line strokes, and then recognise the whole character by structural analysis. The advantage of the latter approach is that it requires far less computation than skeleton-based methods, but it suﬀers from a problem of ambiguity when strokes intersect. At the point of intersection, it is problematic which of the radiating lines should be associated together so that some strokes may be spurious. Also, the assumption that strokes consist solely of straight lines (although common) is a considerable simpliﬁcation.

Fukushima et al. (1991) proposed a skeletonbased radical approach to handwritten Japanese Chinese (Kanji) character recognition with the neocognitron (Fukushima, 1982), which is capable of recognising distorted patterns as well as tolerating translation. When a composite stimulus consisting of two patterns or more is presented, the neocognitron focuses its attention selectively on one of them, and recognises it. Until now, it is diﬃcult to bring neocognitron-based methods to practical use, because too much expert domain knowledge is required to design its training patterns. Although genetic algorithms were incorporated with the neocognitron to search for optimal parameters in previous work (Shi et al., 1999), it is considered unsuitable for the case of Chinese character recognition, in which a great number of training patterns is involved. Chung and Ip (2001) used snakes (Kass et al., 1987) to segment Chinese characters into radicals with energy functional minimisation. The external energy in their work consists of two diﬀerent functionals, i.e., displacement and intersection functionals. The displacement functional is employed to avoid the snake deviating too much from the original template. The intersection functional is devised to avoid the intersection of character strokes and the template, by developing a mechanism which has three levels consisting of a window of size 7 7 pixels around a pixel on the snake. The closer to the snake, the higher the intersection energy value which results. Experiments were conducted on 100 character categories written by 10 people, and the radical partitioning correct rate ranges from 75% to 95% depending on which of six structural categories (compositions) is considered. In their work, they did not give further discussion how to deal with false salient features because of broken strokes and thinning algorithms. Liao and Huang (1990) described a method for radical extraction which is not confused by spurious strokes due to stroke interconnection and the inherent defects of thinning algorithms. It consists of three parallel matching algorithms. The ﬁrst can extract radicals from a Chinese character on the basis of stroke segmentation. The second considers the fork- and end-points in the thinned image

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

character, and matches radicals by all the possible unions of the line segments. The third uses a small number of points to represent a stroke, and it can extract radicals even when inﬂection points have not been found by the stroke segmentation techniques. However, this method is quite time-consuming. One further practical issue is that it is diﬃcult to make an overall decision, given the results from these three algorithms. Wang and Fan (2001) proposed a radical-based optical character recognition system for recognising handwritten Chinese characters. Their recursive hierarchical radical extraction consists of three layers. Layer 1 is character pattern detection which classiﬁes a given character into a shape pattern, such as left-right, up-down, etc. Layer 2 is straight cut-line detection which detects gaps among radicals. A stroke clustering technique is used in Layer 3 to decompose characters that are left-right or up-down patterns into radicals. Their hierarchical radical-matching scheme also consists of three matching phases. The ﬁrst phase is radical matching which is based on a modiﬁed relaxation method to match each radical with templates in the radical database. The second phase is matching with the knowledge database. The third phase is the matching of the whole character. Using their methodology, the complexity of oﬀ-line handwritten Chinese character recognition, the templates and the size of the radical database are all greatly reduced. Cootes et al. (1995) proposed active shape models (ASMs) to capture the shape variation and exploit the linear formulation of point distribution models (PDMs) in an iterative search procedure, capable of locating the modelled structures in noisy, cluttered images––even if they are partially occluded. ASMs have similarities to snakes, in which a contour is ﬁtted to the image evidence by minimising an energy function. However, a snake only has generic prior knowledge, such as smoothness. A much greater amount of prior information can be recovered from training sets and encoded within an ASM. In our previous work (Shi et al., 2001a), ASMs were applied to radical modelling, which can be regarded as a special case of skeleton-based approaches. However, the original ASMs are only suitable for representing linear

1855

variations within the PDMs. As a matter of fact, nonlinear shape variations are common in handwriting, such as diﬀerent writing styles from person to person, and diﬀerent image distortion from time to time. In radical extraction, each class has a corresponding recognition score with respect to a given input character. The next step is to combine these radical candidates to produce a ranking of possible characters. In our research, the Viterbi algorithm (Viterbi, 1967; Forney, 1973) is applied to carry out character composition/decomposition with radicals. The Viterbi algorithm is a dynamic programming technique used to derive an optimal path with linear time complexity on the length of input sequence. In its most general form, it may be viewed as a solution to the problem of maximum a posteriori probability estimation of the state sequence of a state discrete Markov process observed in memoryless noise. Recently, Tseng and Lee (1999) used the Viterbi algorithm for on-line handwritten Chinese character recognition, and Jung and Kim (2000) applied it to on-line recognition of cursive Korean characters. The remainder of this paper is organised as follows. In Section 2, nonlinear active shape modelling is described, in which kernel principal component analysis (PCA) is employed to capture the nonlinear handwriting variations. In Section 3, we describe how the Viterbi algorithm is used to ﬁnd the optimal radical combination for character composition. Experiments and their results are given in Section 4, followed by conclusions in Section 5.

2. Radical extraction with nonlinear active shape modelling Active shape modelling extracts the eigenvectors, U, of the training examples by PCA. Then a model radical C can be generated by adjusting shape parameters, b, corresponding to the principal modes of variation: C ¼ W þ Ub;

ð1Þ

where W is the mean vector of the training examples.

1856

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

To generalise ASMs to the nonlinear case, Sozou et al. (1995a) introduced polynomial regression. The basis for the polynomial regression model is to reduce further the residuals once a linear model has been extracted, by ﬁtting a polynomial along the direction of the principal components. However, the polynomial regression model requires that the second eigenvector be modelled as a function of the ﬁrst, otherwise implausible shapes may be generated. To solve this problem, they also applied a multi-layer perceptron (MLP) (Sozou et al., 1995b) to ﬁnd the nonlinear functionals among the shape parameters, b, in Eq. (1). The disadvantages of the MLP method are possible over- or under-ﬁtting and the sensitivity to initial weight settings and number of neurons. In this section, we develop nonlinear ASMs with kernel PCA and apply them to radical extraction. In training, the handwriting variation is captured by nonlinear kernel PCA, and in recognition, each radical model will be ﬁtted to the target character by adjusting the shape parameters. 2.1. Training phase The ﬁrst step is image thinning (Jang and Chin, 1992) to get character skeletons. Then landmark points are labelled manually to represent a radical, and kernel PCA is applied to capture the radical variations. PCA is a technique for extracting structure from possibly high-dimensional data sets. It is readily performed by solving an eigenvalue problem, or by using iterative algorithms which estimate principal components (Jolliﬀe, 1986). PCA is useful when the original pattern space can be accurately described by a subspace spanned by the ﬁrst several principal eigenvectors. Often the data lie in a subspace, and if this is linear then a small number of principal components is suﬃcient to account for most of the variation in the data. Kernel PCA (Sch€ olkopf et al., 1998a,b) provides a way to extend linear PCA to nonlinear subspaces of the data. Here, linear PCA is performed in some high-dimensional feature space F, which is related to the input space by a nonlinear map U : RN ! F. The number of nonlinear components obtained by kernel PCA can be greater

than the original input dimension. The dimension of the feature space is speciﬁed by the number of training examples. However, the method will confer no advantage if the data lie in a linear subspace. The main challenge with this method is how to choose an appropriate nonlinear transformation. Given a set of examples for a particular character fe1 ; e2 ; . . . ; eM g, which are represented by N landmark points, i.e., ek ¼ ðxk0 ; yk0 ; . . . ; xkðN 1Þ ; T ykðN 1Þ Þ , thePmean vector of the set is deﬁned by M W ¼ ð1=MÞ k¼1 ek . In the feature space, the covariance matrix takes the form: C¼

M 1 X T Uðej ÞUðej Þ : M j¼1

The kth eigenvector V k of the covariance matrix C and its eigenvalue kk are solutions to: kk V k ¼ CV k . Since all solutions with kk 6¼ 0 lie in the span of Uðe1 Þ; . . . ; UðeM Þ, there exist coeﬃcients aki ði ¼ 1; . . . ; MÞ such that Vk ¼

M X

aki Uðei Þ:

i¼1

We then solve the eigenvalue problem (see Sch€ olkopf et al., 1998a,b for details) Mkk ak ¼ Kak ; where K is an M M matrix, Kij ¼ Uðei ÞT Uðej Þ, and ak is a column vector of length M. For each non-zero kk , the eigenvector expansion coeﬃcients ak are normalised, which leads to the corresponding vectors V k in F being normalised. The projections of a data point e onto the eigenvectors V k in F can be deﬁned as kT

bk ðeÞ ¼ V UðeÞ ¼

M0 X

T

aki Uðei Þ UðeÞ

i¼1

¼

M0 X

aki Kðei ; eÞ;

i¼1

where M 0 is the number of principal components (modes). Any example in the training set can be approximated using the mean vector and a weighted sum of these deviations obtained from the ﬁrst M 0 modes. Our purpose is to build up models for each handwriting class, which requires approximate

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

representations of the data in input space rather than in feature space. To this end, by introducing shape parameters, we can also generate ASMs on the basis of kernel PCA with the following two steps: (1) Generate ASMs in feature space with the mean vectors W. We deﬁne an operator PM 0 ;b by P

M 0 ;b

UðWÞ ¼

M0 X

bk ðWÞbk V k :

k¼1

(2) Find the active model C which is a pre-image in the feature space (see Sch€ olkopf et al., 1998a,b) so as to minimise qðCÞ ¼ jjPM 0 ;b UðWÞ UðCÞjj2 ¼ KðC; CÞ 2

M0 X k¼1

þ KðW; WÞ:

bk bk ðWÞ

M X

aki Kðei ; CÞ

i¼1

ð2Þ

2.2. Recognition phase Radical models can be generated by adjusting the shape parameters in Eq. (2). The recognition

1857

phase consists of performing a chamfer distance transform (Barrow et al., 1977; Borgefors, 1988) on the target image and shape parameter searching via gradient descent with dynamic tunnelling algorithm. Optimal shape parameters are obtained by minimising the mean-square distance between the model and the input character in the input space. Fig. 2 shows this procedure. In our previous work, the chamfer distance transform was introduced to enhance the basin of attraction for gradient descent search for optimal shape parameters (Shi et al., 2001a). A signiﬁcant property of this transform is its ability to handle noisy and distorted data, as the edge points of one image are transformed by a set of parametric transformations, which describe how the images can be geometrically distorted in relation to one another. For full details, including the computation of chamfer distances between test images and models (see Shi et al., 2001a). The search criterion for the optimal shape parameters is to minimise the chamfer distance between each model and a target image. Shi et al. (2001b) used a dynamic tunnelling algorithm (DTA, Yao, 1989) to overcome problems with local minima when employing gradient descent in conjunction with linear ASMs. Starting from a

Fig. 2. Illustration of radical extraction with nonlinear active shape models.

1858

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

local minimum, the DTA can jump to another basin of attraction where the new, initial search point is even lower in energy. In the nonlinear case, the shape parameters are considered in the feature space. However, the parameter vectors are not orthogonal in the input space, and there is no direct gradient information. Hence, the application of DTA in the nonlinear case is achieved by multi-point sampling.

3. Character composition with the Viterbi algorithm The output of the radical extraction stage is a set of radicals ranked by their chamfer distance to the given character. Treating Chinese character composition as a discrete Markov process corresponding to a sequence of radicals, the optimal radical combination is equivalent to the ‘best’ path in a graph made up of all possible radical combinations. The best path is determined according to estimated probabilities of initial state, transitions between states, and symbol probabilities at each state (see below). 3.1. Markov process of character composition As previously stated, any character considered here consists of up to four radicals. Each radical corresponds to a state, xk , k 2 f1; . . . ; Kg, with K ¼ 4 in this work. The composition process is Markov in the sense that the probability of being in state xkþ1 at index k þ 1, given all states up to index k, depends only on the state xk at index k: P ðxkþ1 jx0 ; x1 ; . . . ; xk Þ ¼ P ðxkþ1 jxk Þ. With such a model, Chinese characters can be viewed as the outputs of an mK -state Markov process, where m is the number of distinguishable radicals (m ¼ 200 here). This is a simpliﬁcation, because radicals are position-dependent and cannot appear anywhere else other than their characteristic position. In practice, the total number of states will be considerably less than 2004 ¼ 16 108 , i.e., there will be many zero-probability transitions. Transition probabilities are deﬁned according to the allowable sequences of radicals, or decompositions of the Chinese characters. Here, to allow the powerful Markov formalism to be used, we

make the important assumption that such sequences actually exist whereas, in fact, sequential information is entirely absent from an oﬀ-line character image. The assumed sequence in this work is L, U, R, D, TL, TR, BR, BL, SU which denote left, up, right, down, top-left, top-right, bottom-right, bottom-left and surrounding radicals, respectively. This is the order in which the composition of characters in terms of radicals is entered into our lexicon of allowable characters. We then ﬁnd transition probabilities by frequency counts in the lexicon. It is important to emphasise that our assumed sequence is arbitrary. In particular, a diﬀerent choice would almost certainly have led to diﬀerent results. Our intuition is that the diﬀerence in performance is likely to be small but this issue remains to be investigated. 3.2. Search algorithm In this research, Chinese character recognition is associated with a graph where the nodes contain radical recognition scores (i.e., chamfer distances). A one-to-one correspondence exists whereby every path through the graph corresponds to a particular legal segmentation of the input character into radicals, and conversely, every possible legal segmentation of the input character corresponds to a particular path through the graph. A lexicon is compiled in which each character consists of nine codes: 1xx; 2xx; . . . ; 9xx, representing the nine types of radical: L, U, R, D, etc. Fig. 3(a) shows an example character entry. Notice that, these codes are used by the domain expert to represent characters in the lexicon, but for convenience they are replaced by serial numbers (1–200) during computation. Fig. 3(b) shows the graph representation of this lexical entry, in which the rows indicate the position index of a possible radical and the columns indicate the speciﬁc radicals in terms of their code. The Viterbi algorithm (Viterbi, 1967; Forney, 1973) provides a convenient method for rapidly determining the bestscoring path (corresponding to an interpretation for a character). Given a character, the symbol probability of the jth radical is estimated by

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

1859

Fig. 3. Chinese characters composed by radicals. (a) A Chinese character in the lexicon consists of nine codes, representing the radicals L, U, R, D, TL, TR, BR, BL and SU, respectively; (b) its graph representation.

f ðjÞ ¼ 1

chamfer distance of radical j : sum of chamfer distance of all radicals in same position as j

Transition probabilities are estimated as

P ðði; jÞjða; bÞÞ ¼

number of transitions from ða; bÞ to ði; jÞ : total number of transitions from ða; bÞ

Table 1 shows an example of non-zero transition probabilities for radical number 10, i.e., , coded 110 in the lexicon. Here, the number of transitions from ð1; 10Þ is equal to the number of characters with ﬁrst code 10; the number of transitions from ð1; 10Þ to ð2; jÞ is equal to the number of characters whose ﬁrst and second codes are 10 and j, respectively.

The initial state probability is estimated as pðjÞ ¼

number of characters begining with radical j : total number of characters

A survivor is deﬁned as the shortest path leading to a node. According to the dynamic programming principle (Bellman, 1957), only survivors need be

Table 1 Non-zero transition probabilities, P ðð1; 10Þjð2; –ÞÞ, from radical with serial number 10 Radical Position Lexical code Serial number Transition prob.

R 303 83 0.053

R 307 87 0.031

R 308 88 0.111

R 319 99 0.048

TR 605 165 0.105

TR 606 166 0.005

BR 709 179 0.012

BR 710 180 0.143

1860

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

considered in determining optimal paths. Let us deﬁne the following symbols: xði; jÞ is the node at the ith row and jth column; x^ði; jÞ is the survivor path ending at xði; jÞ; Lði; jÞ is the survivor path value; K ð¼ 4Þ is the total number of rows in the graph; J ð¼ 200Þ is the total number of columns in the graph. A formal statement of the algorithm, modiﬁed from Jung and Kim (2000), is as follows: STEP STEP STEP STEP

1. 2. 3. 4.

Initialisation: Lði; jÞ ¼ 0; 8i; j 6¼ 0; j ¼ 1. Lð1; jÞ ¼ pðjÞ f ðjÞ; x^ð1; jÞ ¼ ð1; jÞ. i ¼ 2. Calculate: Lði; jÞ ¼ max ½Lði 1; jÞ 16m6J

P ðði; mÞj^ xði 1; jÞÞf ðjÞ; x^ði; jÞ ¼ ði; mÞ; s:t: max ½Lði 1; jÞ 16m6J

P ðði; mÞj^ xði 1; jÞÞ; STEP 5. i++; Repeat Step 3 while i 6 K. STEP 6. j++; Go to Step 2 while j 6 J . STEP 7. Termination and backtracking: The best path is x^ð1; vÞ; x^ð2; vÞ; . . . ; x^ðK; vÞ s:t: LðK; vÞ ¼ max LðK; jÞ 16j6J

4. Experiments and results We have conducted recognition experiments using the database collected by Harbin Institute of Technology and Hong Kong Polytechnic University. The complete database comprises a collection of 751,000 loosely constrained handwritten Chinese characters, consisting of 3755 categories written by 200 diﬀerent writers (Shi et al., 2001a,b). Here we use a subset of this database, as detailed below. Training uses M ¼ 60 examples of each radical. This means that some radicals encountered during test have actually been seen during training, but the proportion is tiny. (We choose not to remove training radicals from the test set because they form part only of complete characters.) Hence, to

a ﬁrst approximation, the test set examples can be considered unseen. We are now in a position to compare some representative works on radical extraction with our proposed nonlinear active shape modelling method. As our database is diﬀerent from that used by previous authors, and we do not have implementations of their previous work available to us, we can only compare their published ﬁgures with our results. Hence, the comparison can only be indicative rather than deﬁnitive. Method 1: Nonlinear ASMs with Viterbi algorithm. The experiments are conducted on 200 radicals covering 2154 loosely constrained Chinese character categories written by 200 diﬀerent writers (i.e., 430,800 characters). Method 2: Stroke-based approach (Wang and Fan, 2001). Their experiments for radical extraction were conducted on just 1856 test characters. Method 3: Snake-ﬁtting approach (Chung and Ip, 2001). Their character image database consists of 100 character categories written by 10 people (i.e., 1000 test examples only). They considered and reported results on six most common radical combination schemes, namely, vertical, left-down, surrounding, horizontal, up-left, and cover, respectively. Radical recognition results are fed into a structure-based character recogniser to give a ﬁnal performance ﬁgure. From Table 2, we can see that our method using nonlinear ASMs and the Viterbi algorithm is easily the best among the existing radical approaches. It deals with the largest number of radicals on a test set which is signiﬁcantly larger than other works have used, and still achieves the best correct matching rate. We believe that the lower correct rate of Method 2 results mainly from the problem of ambiguity when strokes intersect. At the point of intersection, it is unclear which of the radiating lines should be grouped together so that some strokes may be spurious. Method 3 suﬀers from false salient features due to broken strokes and thinning algorithms.

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

1861

Table 2 Performance comparison of diﬀerent radical approaches to Chinese character recognitiona Test set size (characters)

Number of radicals trained

% Radicals correct

% Characters correct

Method 1 (ASM, Viterbi) Method 2 (stroke-based)

2154 200 1856

200 32

96.5 92.5

Method 3 (snake ﬁtting)

1000

20

75–95

93.5 98.2 (train. set) 80.9 (test set) 79.1

a

The range of values for Method 3 reﬂects individual results for six diﬀerent characteristic radical positions.

The advantage of our method is its capability to handle individual writer variations with only a small number of shape parameters. Our models also avoid stroke extraction, as mentioned above, which is diﬃcult in handwriting recognition as there will be considerable interconnection among the strokes as well as many broken strokes. The disadvantages are the relatively high computational complexity of the matching process, the need for landmark labelling and the need to assume a sequential order ðL; U; R; D; . . .Þ.

rate obtained is 93.5% characters correct. These highly promising results beneﬁt from the avoidance of (straight-line) stroke extraction, as well as the ability to capture the nonlinear handwriting variations by only a small number of shape parameters. A benchmarking database of Chinese characters is required on which diﬀerent authors can report results. Until now, no such established database has existed. To correct this situation, we are making the database used here freely available to other researchers on CD-ROM from the corresponding author.

5. Conclusions In this paper, an approach to handwritten Chinese characters recognition is proposed in which radicals are extracted by nonlinear active shape models, and then character composition is carried out based on the Viterbi algorithm. In training, nonlinear ASMs capture the handwriting variations by kernel PCA. In radical matching, the chamfer distance transform and the DTA are employed to search for the optimal shape parameters. Treating Chinese character composition as a discrete Markov process, it can be represented by a graph, in which the rows indicate the index of a radical in a presumed sequence, and the columns indicate speciﬁc (position-dependent) radicals. Hence, the character composition sequence is obtained by ﬁnding the best path with the Viterbi algorithm. The symbol probabilities can be estimated from the chamfer distance at the radical extraction level, whereas the transition probabilities and initial state probabilities can be calculated from the lexicon. Experiments are conducted on 200 radicals covering 2154 loosely constrained characters from 200 writers, and the recognition

References Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C., 1977. Parametric correspondence and chamfer matching: two new techniques for image matching. In: Proc. 5th Internat. Joint Conf. on Artiﬁcial Intelligence, Cambridge, MA, pp. 659–663. Bellman, R., 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Borgefors, G., 1988. Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Trans. Pattern Anal. Machine Intell. 10 (6), 849–865. Chang, S.K., 1973. An interactive system for Chinese character generation and retrieval. IEEE Trans. Systems Man Cybernet. 3 (3), 257–265. Chung, F., Ip, W.W.S., 2001. Complex character decomposition using deformable model. IEEE Trans. Systems Man Cybernet. Part C Appl. Rev. 31 (1), 126–132. Cootes, T.F., Taylor, C.J., Cooper, D.H., Garaham, J., 1995. Active shape models––their training and application. Computer Vision and Image Understanding 61 (1), 38–59. Forney, G.D., 1973. The Viterbi algorithm. Proc. IEEE 61 (3), 268–278. Fukushima, K., 1982. Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition 15 (6), 455–469. Fukushima, K., Imagawa, T., Ashida, E., 1991. Character recognition with selective attention. In: Proc. Internat. Joint

1862

D. Shi et al. / Pattern Recognition Letters 23 (2002) 1853–1862

Conf. on Neural Networks, IJCNN’91, Seattle, WA, pp. A593–A598. Jang, B.K., Chin, R.T., 1992. One-pass parallel thinning: analysis, properties and quantitative evaluation. IEEE Trans. Pattern Anal. Machine Intell. 14 (11), 1129–1140. Jolliﬀe, I.T., 1986. Principal Component Analysis. SpringerVerlag, New York, NY. Jung, K., Kim, H.J., 2000. On-line recognition of cursive Korean characters using graph representation. Pattern Recognition 33 (3), 399–412. Kass, M., Witkin, A., Terzopoulos, D., 1987. Snakes: active contour models. Internat. J. Comput. Vision 1 (4), 321–331. Liao, C.W., Huang, J.S., 1990. A transformation invariant matching algorithm for handwritten Chinese character recognition. Pattern Recognition 23 (11), 1167–1188. Sampson, G., 1985. Writing Systems. Hutchinson, London, UK. Sch€ olkopf, B., Mika, S., Smola, A., R€atsch, G., M€ uller, K., 1998a. Kernel PCA pattern reconstruction via approximate pre-images. In: Niklasson, L., Boden, M., Ziemke, T. (Eds.), Proc. 8th Internat. Conf. Artiﬁcial Neural Networks. Springer-Verlag, Berlin, Germany, pp. 147–152. Sch€ olkopf, B., Smola, A.J., M€ uller, K., 1998b. Kernel principal component analysis. In: Sch€ olkopf, B., Burges, C.J.C., Smola, A.J. (Eds.), Advances in Kernel Methods. MIT Press, Cambridge, MA, pp. 327–352. Shi, D., Dong, C., Yeung, D.S., 1999. Neocognitron parameter tuning by genetic algorithms. Internat. J. Neural Systems 9 (6), 497–509. Shi, D., Gunn, S.R., Damper, R.I., 2001a. Active radical modeling for handwritten Chinese characters. In: Sixth

Internat. Conf. on Document Analysis and Recognition, ICDAR’01, Seattle, WA, pp. 236–240. Shi, D., Gunn, S.R., Damper, R.I., 2001b. A radical approach to handwritten Chinese character recognition using active handwriting models. In: Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Kauai, Hawaii, vol. 1, pp. 670–675. Sozou, P.D., Cootes, T.F., Taylor, C.J., Mauro, E.C.D., 1995a. Non-linear generalization of distribution models using polynomial regression. Image Vision Comput. 13 (5), 451– 457. Sozou, P.D., Cootes, T.F., Taylor, C.J., Mauro, E.C.D., 1995b. Non-linear point distribution modelling using a multi-layer perception. In: Proc. British Machine Vision Conf., Birmingham, UK, pp. 107–116. Suen, Y., Huang, E.M., 1984. Computational analysis of the structural compositions of frequently used Chinese characters. Comput. Process. Chinese Oriental Languages 1 (3), 1– 10. Tseng, Y.H., Lee, H.J., 1999. Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognition Lett. 20 (8), 791–806. Viterbi, A.J., 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theor. IT-13 (2), 260–269. Wang, A.B., Fan, K.C., 2001. Optical recognition of handwritten Chinese characters by hierarchical radical matching method. Pattern Recognition 34 (1), 15–35. Yao, Y., 1989. Dynamic tunneling algorithm for global optimization. IEEE Trans. Systems Man Cybernet. 19 (5), 1222–1230.

Handwritten Chinese character recognition using nonlinear active shape models and the Viterbi algorithm

Handwritten Chinese character recognition using nonlinear active shape models and the Viterbi algorithm

Recommend Documents