Application of Fuzzy Set for Recognition of Handwritten English Characters

Application of Fuzzy Set for Recognition of Handwritten English Characters

Copyright © IFAC Th"ory and Application of Digital Control N.". Ddhi . India 198~ APPLICATION OF FUZZY SET FOR RECOGNITION OF HANDWRITTEN ENGLISH CHA...

1012KB Sizes 0 Downloads 136 Views

Copyright © IFAC Th"ory and Application of Digital Control N.". Ddhi . India 198~

APPLICATION OF FUZZY SET FOR RECOGNITION OF HANDWRITTEN ENGLISH CHARACTERS N. Sunderesan and B. N. Chatterji Department of Electronics and Electrical Communication Engineen'ng, Indian Institute of Technology, Kharagpur, West Bengal, India

Abstract. The concept of fuzzy set is found to be very prom1s1ng in the character recognition area. In this paper the concept of fuzzy -similarity relation has been utilized for the recognition of uppercase english characters. This method requires standard English alphabecs. For the generation of the standard english alphabets, each character is given an ideal linguistic defintion comprising of eight standard features. The concept of fuzzy membership function is utilized to make a handwritten character a standard one.

INTRODUCTION

obtained using this feature vector. For classification standard english characters are needed which are generated using the concept of fuzzy set.

The concept of fuzzy set was introduced by Zadeh(1965). This concept is based on the philosophy that the elements of human thinking consists of transitions from membership to nonmembership in gradual fashion rather than abrupt and hence are considered to be a fuzzy set and not numbers. The concept of fuzzy set became very popular and found applications in many areas. The character recognition is an area where this concept was explored to a large extent. Some of the existing literatures are that due to (1) Tamura, Higuchi and Tanaka (1971) (ii) Siy and Chen (1974) (iii) Simura (1974), (iv) Kickert and Koppelaar (1976) (v) Pal, Dutta Majumdar and Chaudhuri (1977) and (vi) Gupta, Saridis and Gaines(1977) ,. This new approach in character recognition considers the concept of uncertainty factors for describing the variations and not the probability concept as used in other methods. This is done with the help of fuzzy membership functions.

FEATURE EXTRACTION The handwritten character is first coded into a (20x20) binary '1' and '0' matrix form. In the (20x20)frame of the pattern the bit'l' represents the presence of the character. The distance of the character is measured from eight different points of the frame. The points are (i)

Left hand top corner point of the frame, i. e., the (1,1) position of the matrix.

(ii)

Top middle pOint, i.e., the (1,10) point of the matrix.

(iii) Right hand top corner of the frame, i.e., the (1,20) position of the matrix.

The concept of 'fuzzy similarity relation' was introduced by Zadeh (1971). In this paper an attempt has been made to use this concept as a tool for classification of hand; handwrittep english characters. The feature vector consists of distances of the pattern from eight different points on the frame of the pattern. The fuzzy similarity relation is

(iv)

Right hand middle corner of the frame, Le., the (10,20) position of the matrix.

(v)

Right hand bottom corner of the frame,i.e., the (20,20)position of the matrix.

(vi)

Bottom middle point, i.e., the (20,10) point of the matrix.

(vii) Left bottom corner point of the frame, i.e., the (20,1)position of the matrix. (viii)Left hand middle corner of the 439

N. Sunderesan and B. N. Chatterji

440

frame,i.e . , the (10,1) position of the matrix.

which proves that Ps(Xi,X j ) is symmetric.

These eight distances (d 1 , d 2 , d 3 , d 4 , d 5 , d 6 , d 7 , dS) forms the unnormalized feature vector. These distances may not be same for different samples of identical characters. To avoid this problem the distances are to be normalized. The normalization technique consists of (i)

Determination of the maximum distance of the pattern from the frame, i.e., the maximum Max(d.) of the distances d l 1

To prove the transitive property we have to prove ~s(Xi ,Xj»).V ~s(Xi ,Xl) $Js(X l ,X j )

'1,

(5)

where V stands for maximum and A standsf~for minimum. Let us define

~F

'

S

k

(Xl· ,X. )=[ r (a. k)21! ij J k=l l-a j J

d 2 •• • ,d S ' (ii)

(6)

Division of the distances d i , i=l, ••• ,S by Max(d i ) giving a i =d i /Max(d i ),i=1,2, ••• ,S

and the fuzzy sets

(1)

Thus the normalized feature vector is given by and (2)

Fij=(Xi,Xj)/PFij(Xi,Xj)

( 7)

Fil=(Xi,Xl)/PFil(Xi,XI)

(S)

Flj=(XI,Xj)/PFlj(xl,Xj)

(9)

The inequality (5) modifies to

FUZZY SIMILARITY FUNCTION

I-P

F ij

Let us represent the normalized feature vector by X, i.e.,

A(l- PFlj(xl,Xj»]

{Xi}EXor XiEX for i=1,2, ••• ,S ( 3)

A fuzzy similarity function for the feature is given by

(4 )

(Xi,X.») IV[(l-P F (X., J il 1 Xl»

or or

I-Fij)[V (l-Fil)A(l-F lj )] I-Fij~V[I-(FilVF1j)]

or

I-Fij~

or

F ij ..
V[l-V(Fil,F lj )]

for Xi and XjEX This function must be (a) reflexive (b) symmetric and (c) transitive. Since

~s(Xi,Xi)=1

the function Again

S

-rr

k=l

(a~_a~)2] ! 1

~s(Xi,Xj)

or = 1,

..4:(FUVFlj) [Ref.Siy and Chen, 1972]

or

FiJ. ..~ome (F·IVFI · )[Ref Siy and 1 J Chen,1972]

1

(10)

is reflexive, In case PFil> PFlj , the inequality ( 10) be comes ( 11)

S =~-

[ r k=l

For l=j the inequality (11) becomes an identity and hence valid. Similarly when PFI.>P F . the inequality (10) becomes J 11 (12)

441

Recognition of Handwritten English Characters

Again, for l=i, the inequality (12) becomes an identity and hence valid. Thus, the inequality is proved to be valid one and in other words the fuzzy similarity function given by (4) is transitive. The reflexive, symmetric and transitive properties of the fuzzy similarity function were verified by actual simulation in TOC 316 digital computer. A set of all english handwritten uppercase alphab~ts was considered. A fuzzy similarity relation matrix S was formed for which the (i,j)th element was the fuzzy similarity function between ith and jth english alphabets. It was observed that the diagonal elements of the matrix S to be unity which verifies the reflexive property. The matrix S is symmetric which verifies the symmetric property. For verifying the transitive property SoS matrix was was formed where the operation '0' is defined by the right hand side expression of the inequality (5). It was found that SoS=S and hence the fuzzy similarity function given by(4) is transitive. CHARACTER GENERATION It will be seen in the next section that we require standard (or ideal) english alphabets during the classification stage. The generation of the standard english alphabets is done by a pattern recognition technique using the concept of fuzzy set membership function. The uppercase english characters can be generated using eight basic features like (i) horizontal stroke --, (ii)vertical stroke ~, (iii} right slant stroke I, (iv) left slant stroke' , (v) A curve A , (vi) U curve U, (vii)c curve C and (viii) D curve::>. For each alphabet node points are defined as those points where two or more of the above defined features meet. The different english characters have different linguistic definitions involving these baSic features. For example the character A can be represented as two left slant strokes, two right slant strokes and one horizontal stroke. The linguistic definitions of all the 26 english alphabets are given in Table 1. For the generation of a standard english alphabet, the handwritten character is checked whether it follows the definition given in Table 1. To do this the node points of the character is first determined. 'lOO strokeS be't1w:!en different oocle points are then determined.

Each stroke is identified as one of the eight features with the help of membership function for the different features (Pal, Dutta Majumdar and Chaudhuri, 1977). After extracting the features by this method, the character is given the linguistic definition in coded form as [f l , f2' f3' f4' f5' f6' f7' fSJ where fl and f2 are the horizontal and vertical strokes in the character~ f3 and f4 are the right slant and left slant strokes in the c&aracter and f~, ff)' f7 and fS are the J\' , 'U', C' and 'D' curves in the character. The coded definition of the character is then Checked whether it follows the definition given in Table 1. In case it does not follow, the character is written again and the whole process is repeated for checking. CLASSIFICATION For classification the standard english alphabets are generated for all the 26 characters. For the unknown character, which is to be recognized, the feature vector is first determined. The fuzzy similarity value between the unknown pattern and all the 26 standard english alphabets are then determined using eq.(4). The uQknown character is classified to be that english alphabet for which the fuzzy similarity value is maximum. ~t us consider'S' is an unknown handwritten character to be classified. The feature vector of ' , is given by

(13) The fuzzy similarity values are given by

where

vX.

1.

£

set of all standard (15 )

english alphabets and and

1

2

F(X i )= [ai' a i

3

' ai

4

' ai

'

(16 ) The unknown character ,~, is classified as the mth english character iff,

N. Sunderesan and B. N. Chatterji

442

~s(B, ~»

~s(B,

RESULTS AND CONCLUSION

Xi)

for all Xi but Xm1Xi

(17)

As mentioned in section 'Feature extraction' a 20x20 frame was used for writing the alphabets. From this, a 20x20 binary matrix was obtained

TABLE 1 Linguistic Definitions of uppercase English Alphabets ENGLISH CHARACTER

A

LINGUI~TIC

DEFINItInN

CODE

B

Two left slant+two right slant+one horizontal stroke Two D curve+two vertical stroke

0 2 0 0 0 0 0 2

C

One curve

0 0 0 0 0 0 1 0

D

One vertical stroke+one D curve

0 1 0 0 0 0 0 1

E

Three horizontal stroke+two vertical stroke

3 2 0 0 0 0 0 0

F

Two horizontal stroke+two vertical stroke

2 2 0 0 0 0 0 0

G

One C curve+one D curve

0 0 0 0 0 0 1 1

H

One horizontal stroke+four vertical stroke

1 4 0 0 0 0 0 0

I

One vertical stroke

0 1 0 0 0 0 0 0

J

2 1 0 0 0 1 0 0

N

Two horizontal stroke+one vertical str.oke +one U curve Two vertical stroke+one left slant+one right slant One horizontal stroke+one vertical stroke Two vertical stroke+one left slant+one right slant Two vertical stroke+one left slant

o

One A curve+one U curve

0 0 0 0 1 1 0 0

p

Two vertical stroke+one D curve

0 2 0 0 0 0 0 1

Q

Two left slant+one C curve+one D curve

0 0 0 2 0 0 1 1

R

Two vertical stroke+one left slant+one D curve

0 2 0 1 0 0 0 1

S

o

U

One C curve+one D curve Two horizontal stroke+one vertical stroke One U curve

y

One right slant+one left slant

001 100 0 0

w

Two right slant+two left slant Two right slant+two left slant One vertical stroke+one left slant+one right slant Two horizontal stroke+one right slant

002 2 0 0 0 0 002 2 000 0 o 1 1 1 0 000

K

L

M

T

X Y

z

RESULTS AND CONCLUSION

1 0 2 2 0 0 0 0

0 2 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 2 1 1 0 0 0 0 0 2 0 1 0 0 0 0

0 0 0 0 0 1 1 210 0 0 0 0 0 o 0 000 100

2 0 1 0 0 0 0 0

Recognition of Handwritten English Characters

and from this matrix the features Zadeh,L.A.(1971). Similarity Relawere extracted. Standard english tions and Fuzzy Ordering. alphabets were determined for all tIE 26 Information Sci. , 177. characters and their features were determined.From testing the character recognition method, different persons were asked to write the characters in the 20x20 frame. From their handwriting the features were extracted and the characters were recognized by the procedure outlined in the previous section. It has been found that the method gave very good results with recognition accuracy of about 96%. TOC 316 computer was found for the simulation study and the recognition time was found to be les less than 1 msec. This method has the advantage that i t requires no threshold value for classification and has less restrictions on the shape of the alphabets. But the method requires standard english alphabets and a large memory capacity to store the feature vectors of the standard characters. REFERENCES Gupta,M.M., G.N.Saridis and B.R. Gaines(1977). Fuzzy Autelliata and Decision Processes. North Holland, New York. Kickert, W.J.M. and H.Koppelaar(1976} Application of Fuzzy set Theory to Syntactic Pattern Recognition of Handwritten Capitals, IEEE Trans.Syst.Man. Cybernet, SMC-6, 530. Pal,S.K., D.Dutta Majumdar and B.B. Chaudhuri (19~7). Fuzzy Set in Handwritten Character Recognition. Proc.Recent Developments in Patteltn Recognition and Digital Techniques, Calcutta,63. Shimura,M. (1975). Applications of Fuzzy Sets Theory to Pattern Recognition. J.JAACE,43, 243. Siy,P. and C.S.Chen(1972). Minimization of Fuzzy Functions. IEEE. Trans.Comp. ,C-21,100. -Siy,P. and C.S.Chen(1974). Fuzzy logic for Handwritten Numerical Character Recognition. IEEE. Trans.Shst.ManCybernet, SMC'-4, 57 Tamura,S., S.Higuchi and K.Tanaka (1971). Pattern Classification Based on Fuzzy Relations.IEEE.

Trans.Syst.Man.Cybernet,S~

71. Zadeh,L.A.(1965). Fuzzy Sets. Information and Control 8, 338.

443