Fuzzy approach to solve the recognition problem of handwritten chinese characters

Pattern Recognition, Vol. 22, No. 2, pp. 133 141, 1989. 0031 3203/89 $3.00 + .00 Pergamon Press plc Pattern Recognition Society Printed in Great Bri...

Download PDF

640KB Sizes 0 Downloads 54 Views

Report

PDF Reader
Full Text

Pattern Recognition, Vol. 22, No. 2, pp. 133 141, 1989.

0031 3203/89 $3.00 + .00 Pergamon Press plc Pattern Recognition Society

Printed in Great Britain.

FUZZY APPROACH TO SOLVE THE RECOGNITION PROBLEM OF HANDWRITTEN CHINESE CHARACTERS FANG-HSUAN CHENG, WEN-HSING HSU* and CHIEN-AN CHEN Institute of Electrical Engineering, National Tsing Hua University, 101 Sec. 2, Kuang Fu Rd, Hsinchu, Taiwan 30043, R.O.C. (Received 11 December 1987; in revised form 20 May 1988; received for publication 7 June 1988) Abstract--A method based on the concept of fuzzy set for handwritten Chinese character (HCC)

recognition is proposed in this paper. Chinese characters can be viewed as a collection of line segments, called strokes. Since the strokes under consideration here are fuzzy in nature, the concept of fuzzy set is utilized in the similarity measure. Two membership functions are defined for the location measure and type measure between two strokes, and a function of fuzzy entropy is used in information measure. Although the recognition problem can be reduced to the assignment problem, some modifications are still necessary. All the similarities between the corresponding strokes can be chosen by solving the assignment problem using the cost function of fuzzy entropy, and then are averaged to derive the score of similarity between two Chinese characters. 881 classes of Chinese characters in ETL-8 (160 variations/class) are used as the test patterns, and the recognition rate is about 96%. In addition, experiments about the effects of the membership function based on the class separability are also discussed in this paper. Similarity measure Membershipfunction metric space Classseparability

Assignmentproblem

1. INTRODUCTION The use of computers for documents and data processing is one of the main trends of office automation (OA). However, the application of computer processing of Chinese text has its special difficulties in the input procedure due to the large number and great complexity of Chinese characters. Any input method through the keyboard is very inefficient. Therefore, a Chinese reading machine, Chinese OCR (optical character reader), is used to provide a more convenient way for inputting. Among different branches of character recognition,~1'2) it is much easier to recognize alphabets and numerals than Chinese characters; and moreover, machine recognition of handprints is considered as a much harder problem. In recognition of handwritten Chinese characters (HCC), the most troublesome problem is the great variation among handprints. The objectives of the research on recognition of HCC's are to deal with this variation effectively. Many pattern recognition methods are basically probabilistic. Several probability density functions have to be employed, which are sometimes known, but some parameters often have to be estimated from a training set. Obviously any assumption about a particular parametrical probability density function

*The author is also with the Institute of Information Science, Academia Sinica, Taipei, Taiwan.

Fuzzy entropy

Probabilistic

would be highly dubious here. A nonparametric method, not assuming any particular function, would surely have a better empirical basis. Another way to solve recognition problems of this kind is to leave the probabilistic concept entirely. It seems very impractical to deal with the variations in handwriting by means of probability densities. The probability of the variations can hardly be reconciled with a frequentistic concept but with a subjectivistic concept. This concept, first proposed by Zedeh, ~3)can be interpreted directly into a function - - the fuzzy membership function, representing the degree of possibility with which the actual pattern is a member of the fuzzy set. There are some arguments in comparing the statistical and fuzzy techniques,t4'5) Although statistical methods have played an important role in the field of pattern recognition, nevertheless, we select the fuzzy concept instead of statistics in this paper for the following reasons. (1) The fuzzy set theory is a well defined algebraic system providing us a set of operations which is easier and simpler to compute than Baysian statisticsJ 6'7~ (2) The features (or strokes) that we measure are fuzzy in nature. It is impossible to find the actual distribution of any measurement without guessing something. (3) Fuzzy value is a suitable way to retain the information of all the possible matches between strokes and estimate the degree of possibility for each match to be a correct one. 133

134

FANG-HSUANCHENG et al.

According to the above descriptions, a recognition method based on the fuzzy concept is proposed in this paper. Because the strokes constituting the Chinese character are fuzzy in nature, fuzzy membership function can be defined to measure the strokes' similarity. The recognition problem, which can be reduced to the assignment problem in this study, is to find the matches of strokes between two Chinese characters. In addition, an information measure, called fuzzy entropy, is used to solve the assignment problem. After solving the assignment problem, a score of similarity measure between two characters is obtained. To recognize a character, an unknown input character is matched with all the reference characters and is assigned to the class of the reference character with the highest score of similarity among all the reference characters. The remainder of this paper is divided into seven sections. Section 2 goes into more detail about the problem at hand. Section 3 gives some modifications for the Hungarian method to solve the assignment problem. Section 4 outlines the similarity measure between strokes. Section 5 determines the parameters of the membership function defined in Section 4. Section 6 states the recognition scheme. Section 7 provides some experimental results to prove the usefulness of this approach. Finally, Section 8 contains the summary and conclusion of this study.

2. FORMULATION OF THE PROBLEM

Since every Chinese character can be viewed as a collection of line segments, called strokes, these line segments can be used in similarity measure. From the similarity measure between strokes, the similarity between two characters can also be measured. The problem is to find the optimal combination of matches among these strokes to measure the similarity between two characters. The recognition procedure can be stated as follows. The unknown input character is first matched with the known reference character according to its collection of strokes, and an optimal combination of matches is chosen for the similarity measure between the input character and the reference character. After being matched with all the reference characters in the database, the unknown input character is recognized as the one with the highest similarity if this similarity exceeds a setting threshold, and is rejected if the similarity is below this threshold. Recognition error occurs when the wrong character is recognized without rejection. Now the problem is to find the optimal combination of matches between the input character and the reference characters. For simplicity, suppose the input and reference characters have m (i.e. Yl, Y2. . . . . Ym) and n (i.e. z~, z2,... , z,) strokes, respectively. We want to find the optimal one-to-one matches between {y~ li = 1...m} and {zjlj = l...n}. Let Zii = 1 denote selecting a match for y~ and z j, then a matching

matrix M = [g~j] can be constructed as shown in Fig. l(a). Let /z~j denote the similarity of the match ;~o, then a scoring matrix _S = [#~j] is constructed as shown in Fig. l(b). Then, the problem is to select the optimal matches in M such that the maximum score of similarity in S is achieved. It is noted that each row or column in M only has one match because a stroke y~ can be matched with only one line segment z~. Then the average sum of scoring matrix S selected from the matching matrix M is the score of similarity between th~ input and reference characters. This problem can be formalized as follows. Select M = [Xo] to

E~#ij'~i j

maximize

(1)

i j

subject to

~Xij = 1, for each j

where

~X~j --- 1, for each i J X~j = 0 or 1.

i

In the above formulation, u~j can be defined according to the stroke's property which will be described in Section 4. Now we want to seek for a solution matrix [X~j] to maximize the similarity ~ ~ #ij'Xi~. i j 3. PROBLEM SOLVING

The above problem can be reduced to an assignment problem in linear programming by some modifications so that the Hungarian method ta'9~ can be applied to solve it. There are three cases being considered in this recognition problem when applying the Hungarian method, which are stated as follows. (1) The Hungarian method deals with the problem

stroke

no.

i

2

1

•

.

°

m

0

•

•

o

0

1

°

.

•

0

•

°

•

•

°

°

2

•

.

ra

0

n

(a)

stroke

no.

1

1

0.5

0.9

•

.

0.4

2

0.8

0.i

•

.

0.3

n

0.6

0.3

•

.

1.O

(b)

Fig. l. (a) Matching matrix. (b) Scoring matrix.

Handwritten Chinese characters of minimum criterion, but the recognition problem is to deal with maximum criterion. (2) The Hungarian method deals with the square matching matrix, but the recognition problem usually has non-square matrix. (The numbers of strokes are often unequal for the input and reference character.) (3) Appearance of disappearance of stroke may lead to mismatch in the recognition problem. For the first problem, as we know, the Hungarian algorithm only takes care of the minimum criterion. Therefore, a modification that achieves the maximum criterion using the Hungarian method is required, ]/m -----m ax l,J

fiij =

~lm

4. S I M I L A R I T Y M E A S U R E O F T W O S T R O K E S

-- ]~ij"

minimize ~ ~/~ij' ;go,

(2)

j

because maximizing ~ ~ ]~ij" Zij is equivalent to minii j mizing ~ / ~ i j ' Z i j , In general, any monotonic i j

decreasing function may help us achieve this purpose. In Section 4.3, the function of fuzzy entropy can be used instead of cost function/~j to solve the assignment problem. The second problem may occur in the cases of n unequal size of two sets {Yi}7'=1, {Zj}j=l (m # n, i.e. non-squared matching matrix). The Hungarian method cannot be directly applied to this situation except with a modification. We may give some dummy columns or rows to the generalized matching matrix (see Fig. 2) so as to make it square. For the last problem, sometimes a pattern loses some strokes and at the same time gains other strokes than the reference's but still has exactly the same number of strokes as the reference. An example is shown in Fig. 3. Two characters have the same m 1

n

=

3

b

= 2

3

4

1

0.50

0.70

0.40

1.00

2

0.00

0.60

0.i0

1.00

3

1.00

0.83

1.00

0.02

*4

0.00

0.00

0.00

0.00

Fig. 2. Generalized matching matrix with dummy row (row-4).

#i

number of strokes, but we may note that stroke a in character No. 1 is not found in character No. 2 and stroke b in No. 2 is not found in No. 1. Wrong matches may be obtained if directly applying the Hungarian algorithm to this case. This is the inborn deficiency of the mathematical model. We may add some (one, in many cases) rows and columns to the generalized matching matrix with a particular value, say 0.5, instead of zero to avoid mismatching. This problem can also be solved before recognition if the unstable strokes are deleted in the preprocessing stage.

]/ij

Thus, the problem in equation 1 can be reduced to i

135

#2

In equation 1, pij is determined according to the similarity between the strokes y~, z~ which belong to the input and reference character, respectively. Now, a method to gain the similarity measure is presented. A stroke can be identified by its type (typically its slope or direction) and location. It would be more appealing to define two fuzzy sets, "location" and "type", and assign to the stroke a degree of membership; one would obtain two membership values in the interval [0, 1]; /Z.,o¢.tio...: x ~ [0, 1] #,.type..:x --* [0, 1].

Because of the basically subjective concept of fuzzy set, the actual fuzzy membership function assigned to the location and type measure is a heuristic choice. Then the similarity measure of two strokes is considered as the fuzzy "and" operation t~°'l~ of their location and type measure (the "and" operator being defined as a min operator) #(x) --- min

[]/"location"; ]/"type"]

(3)

4.1. Location similarity The location M i of a stroke can be defined by its middle point, i.e. Mi = (si + el)~2, where sl and ei are the starting and ending point of stroke i respectively. It is obvious to define the distance between locations of two strokes Yi, zj as (see Fig. 4(a))

di~ --- I I M i - Mill.

(4)

The membership function, ~ij = e x p ( - k 1" dij)

(5)

can be defined as the location similarity, which indicates the degree of membership that stroke y~ locates at the same position of stroke zj, and k~ is a parameter to be determined. 4.2. Type similarity

Fig. 3. Mismatching due to algorithm's deficiency.

According to our experiences, the length of a stroke is not reliable. This is particularly serious if being mismerged (see Fig. 5). So, we define another stroke similarity only concerning the stroke type. Two strokes are said to be of the same type if the

FANG-HSUANCHENG et al.

136

l/t

zj

gt

z/

(a)

(b)

Fig. 4. Definitions of (a) distance du, (b) angle Oij.

(/~t j ) 1.0

/

J wtj

Fig. 5. Mismerging causes unreliable length of stroke.

i/2

1. o

Fig. 6. Fuzzy entropy H(flij) a s a function of degree of membership/~ij. angle they form is smaller than a threshold. That is to say, the possibility of which two strokes are the same is inversely proportional to the angle, (Fig. 4(b)) Oij = IIArg(y/) - Arg(zj)II.

holds when 0 < pq <_ 1.

(6)

Hij = H(ltij) = -- glj" log(/a/j) -- (1 --/~/j)" log (1 - #ij)

Arg(') represents the argument of a vector (or a stroke here). We can define a membership function that two strokes y , z i are of the same type,

(10)

where,/%'j = 0.5 + 0.5.p~j. ~i./= exp( - k2" 0 O)

(7) If/~ij in equation 2 is replaced by H;j and the problem in equation 1 becomes

where k 2 is a parameter to be determined. 4.3. Combined similarity

select M = [Xq] to minimize ~ ~ H~j. Zij

Two strokes are said to be the same if both their types and locations are "the same". Based on fuzzy "and" operation, the degree of membership of two strokes being the same can be represented as /~ij = min[~/j; tp~j].

(8)

If two strokes being the same is considered as an event, then the information in this fuzzy event can also be measured by the fuzzy entropy °2,~aA4) as

(11)

J

subject to

where

~ Xij = 1, for each j i gij = 1, for each i J Zij = 0 or 1,

this problem can be solved according to the method described in Section 3.

n U = H(llij ) =

--p~j'log(l~ij) -- (1 -- #/j)" log(1 -- ltij).

(9)

The variation in fuzzy entropy H(#ij ) in equation 8 as a function of PO is plotted in Fig. 6. However, we need the monotonic decreasing function to solve the assignment problem, and the right half part of the fuzzy function in Fig. 6 is satisfactory. If we put p~j = 0.5 + 0.5"po. instead of ].lij into equation 9, then H(/~;j) in equation 10 becomes a monotonic decreasing function as a function of p~i, because 1/2 < / ~ j < 1

5. DETERMINATION OF PARAMETERS

In equations 5 and 7, there are two parameters, k t and k2, to be determined. How to choose the best combination of these parameters is also troublesome. Our purpose is to decide these parameters to make Hij the optimal feature. In this section, we discuss the method for determination of these parameters. At first, the class separability measure which is a good criterion for feature selection is presented.

Handwritten Chinese characters

137

P(~ I~°1) = p(~ 16;2)

P(~I~ 1)

p(~l(~ 2)

J%

(a)

(b)

Fig. 7. Class separability. (a) Completely separated classes. (b) Completely overlapped classes. 1.0

5.1. Class separability measure A good feature will lead to a good class separability, so class separability is a good measure for feature selection. Let p(~ Ic%), i = 1, 2. . . . . c, be the class conditional probability density functions (p.d.f.s.) and Pi be the priori probabilities of class i. If the case of two classes, i.e, c = 2, is considered, the two classes are separable when, p(¢[¢o 0 = 0 ,

ifp(~lco2)>0

p(~l~2)

if p(~lcO1) > 0.

=

0,

and

~ 0.5 0 . :34

.0

57.8 (pixel)

position

90.0

(12) (a) ~tJ

This case is shown in Fig. 7(a). On the other hand, the two classes are completely overlapping when, 1. (3

p(~lco~) = p(~ Ico2), for every ~.

(13)

This case is shown in Fig. 7(b). In general, the overlapping of density functions can be assessed by measuring the "distance" between p(~[~ol) and p(~1o92). Any function J(.) satisfying the definition of probabilistic metric spacd tS~ can be used as a probabilistic distance measure of class separability,t~6) An information-theoretic approach may be used in class separability measure, t~s)

D.

0.0

17.3 type

(degree)

Js(~)=-f~Ip(c%,~).log(p(oJ,,~))l.p(~).d~. (14) (b) ~tt 1 For the case of two classes, if these two classes are separable and p(c%l ~)'s satisfy equation 12, then J~ will be zero; if these two classes are completely overlapping and p(c%l ~)'s satisfy equation 13, then J~ will be unity; otherwise J~ will be in the interval (0, 1). And the optimal feature set must be obtained by minimizing the criterion J,, i.e.

J~(x)

=

mignJ~(~)

(15)

x is the optimal choice of feature sets. 5.2. One-by-one determination Now, we want to determine the parameters k, and k 2 in membership functions based on class separability measure J~, The response surface strategy (w~ can be applied to solve the multi-variable optimization problem. But it is very time-consuming. If these two parameters are mutually independent, we may use

Fig. 8. The fuzzy entropy functions ~u and ~u"

the one-by-one determination method described as follows. 1. Choose kl, say kol, for ~ij to minimizing J~. 2. Put kl = kol into ~u, then choose k2 to minimize Js. By the method proposed above, the parameters k, and k2 in the membership functions can be determined. The results are listed in Table 1, from which, kl = 0.012 and k 2 = 0.04 can be determined. 6. RECOGNITION SCHEME

After determining the parameter of the membership functions and applying the Hungarian method with

FANG-HSUANCHENG et al.

138 Table 1. Experimental results for determination of parameters kl and k2 kt ds 0.002 0.0658 0.004 0.0627 0.006 0.0620 0.008 0.0614 0.010 0.0598 "0.012 0.0592 0.014 0.0614 0.016 0.0657 0.018 0.0650 0.020 0.0641

k2

J,

0.01 0.02 0.03 *0.04 0.05 0.10 0.15 0.20 0.40 0.60 0.80

0.1767 0.1784 0.1673 0.1623 0.1693 0.1752 0.1911 0.1928 0.2098 0.2140 0.2253

It is noted that if the degree of membership #~j is replaced by the fuzzy entropy HI./, then we obtain the average information measure H(C) between the input and reference character as, 1 H(C) = m x m ~ H~./ a ( , n) ~,j,c "

HIj is the information measure of C which has the property that I and R are the same character. In the same way, let H(Ck) be the information measure of which I and Rk are of the same character class, then I is recognized as Rm if

H(Cm) = min H(Ck).

7.1. Test database For lack of standard database in Taiwan, we use the database of ETL-8 currently used in Japan for the following experiments. This database contains 881 character classes (160 variations/class) and is divided into two equal parts; one is for learning and the other for testing.

1

max(m, n~ ~ #ij. .' ltij~C

(16)

Let II(Ck) be the membership function of which I and Rk are of the same character class, where Rk is one of the reference characters, then the input unknown I is classified into class Rm if

/t(C,,,) =/vlkax I~(Ck)

(19)

7. EXPERIMENTAL RESULTS AND ANALYSIS

some modifications so solve the assignment problem, a set of optimal matches between strokes for the input and reference can be selected. Though we use fuzzy entropy H'ij to solve the assignment problem, the fuzzy value/~u can also be selected and applicable for measure. Now, we want to identify the input unknown character by a recognition scheme according to the set of/~./or HIj selected beforehand. Suppose there are an input unknown character I = {yilie[1,m]} and a reference character R = {zilj~[1,n]}, where y{s and zfs are strokes. A matching matrix M = [Z~i] between I and R is then constructed. After all of the matching pairs are selected by solving the assignment problem, a fuzzy set C = {(o~,~)[to = Xij, c~=/~ij, VZO = 1} is formed. It is claimed that each element ~o in C must have the property that stroke yi in I and stroke zj in R are the same, with the degree of membership ct = #ii. Let #(C) be the degree of membership of C which has the possibility that I and R are the same character. By definition 10 in Ref. (14), p(C) can be obtained by considering the average of all/Ju's in C, /~(C)

(18)

(17)

then we conclude that the input unknown character I is recognized as character R,~. This completes the recognition scheme.

7.2. Evaluation of the membership function The membership functions ~ij and ~i./ with the parameters kl = 0.012 and k2 = 0.04 chosen by minimizing the class separability Js are shown in Fig. 8. We find that if

~ij = e-°'°12x = 0.5 then, x = 57.8. And because the picture size is normalized to 64 x 64, the extreme case (the possible longest distance) is 64x/2 = 90.5. Take this value into ~iy, then I~lij

=

e -0'012"90"5

=

0.337

which means the boundary of the fuzziness (for extreme case) may not really approach to zero. This phenomenon shows the reason why the range of the scores (the difference between the highest and lowest scores) does not approach unity and, in fact, affects the recognition results. 7.3. Recognition results We can fill the parameters kl = 0.012, k 2 = 0.04 into the membership functions ~ij, ~ii, and then use the test data set to check the recognition rate. Two experiments are conducted in this paper. One uses the statistical method to estimate the recognition rate and the other counts the actual recognition rate. In the first experiment, the matching results can be divided into two-class set. One is the within-class matching for the same character class and the other is between-class matching for the different character

Handwritten Chinese characters

139

Table 2. Candidate lists of some matching results; (error matching is marked *) Input class 0 1 2 2* 3 4 5 6 6* 7 8* 8 9 9*

0

1

0.910 0.697 0.592 0.826 0.322 0.380 0.383 0.376 0.341 0.387 0.453 0.499 0.650 0.597 0.562 0.586 0.512 0.585 0.532 0.501 0.344 0.431 0.330 0.425 0.462 0.607 0.499 0.537

References classes 4 5 6

2

3

0.421 0.546 0.850 0.759 0.720 0.648 0.399 0.563 0.509 0.540 0.742 0.743 0.605 0.533

0.437 0.450 0.699 0.719 0.874 0.606 0.390 0.505 0.444 0.489 0.710 0.784 0.496 0.501

0.573 0.568 0.540 0.520 0.518 0.903 0.452 0.675 0.665 0.688 0.563 0.555 0.617 0.709

0.586 0.471 0.295 0.272 0.302 0.401 0.905 0.458 0.458 0.488 0.321 0.283 0.435 0.442

class. If the abscissa denotes the matching scores and the ordinate denotes the probability of two characters with certain score, then two distributions for the within-class and between-class matching can be derived, which are shown in Fig. 9. The overlapping area of the two distributions stands for the error probability. By the statistical theory, the estimate recognition rate is about 93.3%. However, the two-class estimation does not really reflect the true recognition rate. In order to get the actual recognition rate, in the second experiment, we can use equation 17 or equation 19 as the recognition scheme to test all the patterns in ETL-8. Some results are shown in Table 2, and the recognition rate is about 96%. It is noted that matching scores in the

.2

0.612 0.671 0.455 0.496 0.487 0.576 0.550 0.850 0.765 0.713 0.542 0.483 0.780 0.674

7

8

9

First order rank

0.624 0.657 0.408 0.369 0.383 0.549 0.517 0.749 0.692 0.876 0.387 0.379 0.718 0.686

0.429 0.434 0.708 0.795 0.757 0.539 0.356 0.505 0.527 0.491 0.740 0.875 0.520 0.543

0.569 0.715 0.462 0.390 0.423 0.572 0.491 0.821 0.810 0.688 0.467 0.449 0.895 0.681

0 l 2 8 3 4 5 6 9 7 2 8 9 4

case of misrecognition are still within the first three candidates (or the first three highest scores). So the recognition error approaches zero if considering the first three candidates.

8. CONCLUSIONS

In this paper, a method based on the fuzzy set theory, instead of probabilistic viewpoints in other studies, is proposed to deal with the problem of handwritten Chinese character recognition. A Chinese character is considered as a collection of strokes. But due to the uncertainty of their locations and types, strokes are considered as fuzzy features.

within-class

matching

o

rl~7 n .o

similarity

between-class

o

similarity

matching

1.0

Fig. 9. Scores distribution functions of within-class matching and between-class matching.

140

FANG-HSUAN CHENG et al.

The similarity between two strokes is viewed as the combination of the location and type similarity. The recognition problem in our study is to find the matches of the strokes between two characters. We have shown that the recognition problem can be reduced to the assignment problem, so that the Hungarian method with some modifications can be applied to solve it. After all the matched pairs between strokes have been chosen by the Hungarian method, a score of similarity measure between these two characters is obtained by averaging the similarities of these chosen pairs. The unknown input character is assigned to the class of the reference character with the highest score of similarity among all the reference characters. From the experimental results, it is noted that the shape of the membership function may affect the recognition results. So, how to choose a good membership function is a problem. Although the exponential function is not a good indicator of the degree of membership, which has been discussed in Section 7.2, a recognition rate of 96% is still acceptable. The advantages of using the fuzzy approach are summarized as follows. 1. No distribution information is needed. The similarity between strokes is indeed intuitive. 2. It provides a set of operations for the inference of the group property (the property of a fuzzy set). 3. It needs less computing time. As a conclusion, the fuzzy approach is proven to be a useful tool for the OCR researches.

SUMMARY

Machine recognition of handprints is considered a very hard problem. In recognition of handwritten Chinese characters (HCC), the most troublesome problem is the great variation among handprints. The objectives of research on recognition of HCC's are to deal with this variation effectively. Two approaches, statistics and fuzzy set theory, were often used to deal with this problem. Although statistical methods have played an important role in the field of pattern recognition, we select the fuzzy set concept instead of statistics in this paper for the following reasons. (1) The fuzzy set theory is a well defined algebraic system providing us a set of operations which is easier and simpler to compute than Baysian statistics. (2) The features (or strokes) that we measure are fuzzy in nature. It is impossible to find the actual distribution of any measurement without guessing something. (3) Fuzzy value is a suitable way to retain the information of all the possible matches between strokes and estimate the degree of possibility for each match to be a correct one. According to the above descriptions, a method based on the concept of fuzzy set for handwritten

Chinese character (HCC) recognition is proposed in this paper. Chinese characters can be viewed as a collection of line segments, called strokes. Since the strokes under consideration here are fuzzy in nature, the concept of fuzzy set is utilized in the similarity measure. Two membership functions are defined for the location measure and type measure between two strokes, and a function of fuzzy entropy is used in information measure. Although the recognition problem can be reduced to the assignment problem, some modifications are still necessary. All the similarities between the corresponding strokes can be chosen by solving the assignment problem using the cost function of fuzzy entropy, and then are averaged to derive the score of similarity between two Chinese characters. 881 classes of Chinese characters in ETL8 (160 variations/class) are used as the test patterns, and the recognition rate is about 96%. In addition, experiments about the effects of the membership function based on the class separability are also discussed in this paper. work was supported by the National Science Council of Republic of China, project no. NSC76-0408-E007-12. In addition, the authors are very grateful to Mrs Ming Y. Shih-Hsu for her valuable suggestions. Acknowledgements--This

REFERENCES

1. S. Mork, K. Yamamoto and M. Yasuda, Research on machine recognition of handprinted Characters, IEEE Trans. Pattern Analysis Math. Intell. PAMI-6, 386-405 (1984). 2. J. Mantas, An overview of character recognition methodologies, Pattern Recognition 19, 425-430 (1986). 3. L. A. Zadeh, Fuzzy set, lnf. Control 8, 338-353 (1965). 4. W. Stallings, Fuzzy set theory versus Baysian statistics, IEEE Trans. Syst. Man. Cybernetics SMC-7, 216-219 (1977). 5. R. Jain, Comments on fuzzy set theory versus Baysian Statistics, IEEE Trans. Syst. Man. Cybernetics SMC-$, 332-333 (1978). 6. P. Siy and C. S. Chert, Fuzzy logic for handwritten numeral character recognition, IEEE Trans. Syst. Man. Cybernetics SMC-4, 570-575 (1974). 7. W.J.M. Kickert and H. Koppelaar, Application of fuzzy set theory to syntactic pattern recognition of handwritten capitals, IEEE Trans. Syst. Man. Cybernetics SMC-6, 148-151 (1976). 8. J. P. Ignizio, Linear Programming in Single and Multiple Objective Systems, Chapter 14. Prentice-Hall, Englewood Cliffs, NJ (1982). 9. K. Murty, Linear and combinatorial Programming. Wiley, New York (1983). 10. A. Kaufmann, Introduction to the Theory of Fuzzy Subsets, Vol. 1. Academic Press, New York (1975). 11. H. J. Zimmermann, Fuzzy Set Theory--and its applications. Kluwer-Nijhoff, Boston (1985). 12. C. E. Shannon and W. Weaver, The Mathematical Theory of Communication. University of Illinois Press, Urbana (1963). 13. Y. Horibe, Entroy and correlation, IEEE Trans. Syst. Man. Cybernetics 15, 641-642 (1985). 14. A. D. Luca and S. Termini, An information measure for fuzzy sets, IEEE Trans. Syst. Man. Cybernetics 14, 151156 (1984).

Handwritten Chinese characters 15. B. Schweizer and A. Sklar, Probabilistic Metric Spaces. North-Holland, New York (1983). 16. P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice-Hall, London (1982).

141

17. G. E. P. Box, W. G. Hunter and J. S. Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building, Chapter 15 (1978).

About the Autbor--FANG-HSUAN CHENG was born in Hsinchu, Taiwan, on 13 June 1960. He graduated with a B.E. degree from the Department of Electrical Engineering, National Chen-Kung University, Taiwan, and an M.E. degree from the Institute of Electrical Engineering, National Tsing Hua University, Taiwan, in 1982 and 1984, respectively. Mr Cheng is now a Ph.D. candidate in the Institute of Electrical Engineering, National Tsing Hua University in Taiwan. His research interests include signal processing, image processing and pattern recognition. Mr Cheng is a member of the IEEE Computer Society. About the Author--CH1EN-AN CHEN was born in Taipei, on 11 November 1962. He received the B.E. degree in Nuclear Engineering and M.E. degree in Electrical Engineering from National Tsing Hua University, Taiwan, in 1985 and 1987, respectively. His current interests include image processing and pattern recognition.

About the Author--WEN-HSING Hsu was born in Taipei, on 17 May 1950. He received the B.E. degree in Electrical Engineering from National Cheng-Kung University, Taiwan, in 1972 and received M.E., Doctor of Engineering degrees in Electrical Engineering from Keio University, Tokyo, Japan, in 1978 and 1982, respectively. His research interests include image processing, pattern recognition and parallel processing algorithm. Since 1982, he has been an Associate Professor of Department of Electrical Engineering at National Tsing Hua University, Taiwan, and a Research Associate at the Institute of Information Science, Academia Sinica, R.O.C. He received the award of Sun Yat-Sen Cultural Foundation in Nov., 1986, and was honored as an outstanding person in Information Science of R.O.C. in 1986. Dr Hsu is a member of the Institute of Electronics, Information and Communication Engineers (IEICE), and Information Processing Society of Japan and a member of the IEEE Computer Society of U.S.A.

Fuzzy approach to solve the recognition problem of handwritten chinese characters

Fuzzy approach to solve the recognition problem of handwritten chinese characters

Recommend Documents