A simplified approach to the HMM based texture analysis and its application to document segmentation

A simplified approach to the HMM based texture analysis and its application to document segmentation

Pattern Recognition Letters ELSEVIER Pattern RecognitionLetters 18 (1997) 993-1007 A simplified approach to the HMM based texture analysis and its a...

5MB Sizes 4 Downloads 98 Views

Pattern Recognition Letters ELSEVIER

Pattern RecognitionLetters 18 (1997) 993-1007

A simplified approach to the HMM based texture analysis and its application to document segmentation 1,2 Jia-Lin C h e n 3 Department of Computer Science, Chung-Hua Polytechnic Institute, Hsinchu 30067, Taiwan

Received4 June 1996; revised 29 July 1997

Abstract

In this paper, we address a simplified approach to the HMM based texture analysis. The model complexity is reduced significantly by a simplified set of directional macro-masks and the stationary HMMs. The difficult problem of texture feature design is eased by our proposed scheme. Also, we successfully apply our scheme to a difficult document segmentation problem - text/textured background separation. © 1997 Elsevier Science B.V. Keywords: Directional texture features; Hidden Markov model (HMM); Document segmentation;Text/textured-backgroundseparation

1. Introduction

Texture related problems have been investigated widely in the past decades, and there have been many techniques proposed (Haralick, 1979; Geman and Geman, 1984; Chellappa and Chatterjee, 1985; Jain and Farrokhnia, 1991; Chen and Kundu, 1995). The model based paradigm such as Markov random fields is reported with good texture modeling capability, but usually requires a huge amount of computations. The feature based paradigm, on the other hand, is usually with the advantage of fast computation, but there is a trade-off between the feature's discriminative capability and its required computations. In this paper, we prefer the feature based

1 Electronic Annexes available. See http://www.elsevier.nl/locate/patrec. 2 This work was supported by National Science Council, Republic of China under Grant NSC84-2213-E-216-006. 3 Email: [email protected].

paradigm because of its advantage of fast computation. Our study is then focused on the investigation of a simple but robust feature design method for texture analysis. Generally speaking, in the feature based paradigm, there are two stages of texture feature extraction: (1) transformation of a texture image to the associated feature images; and (2) computation of texture features from the feature images. There have been many techniques proposed for the feature image transformation such as the spatially local masks of fast computation (Frei and Chen, 1977; Laws, 1980; Unser and Eden, 1989) such as Laws' masks, and the localized spatial filters for characterizing various frequency responses and simulating human visual neurons (Coggins and Jain, 1985; Bovik et al., 1990; Jain and Farrokhnia, 1991; Ohanian and Dubes, 1992) such as Gabor filters. The energy over a square area centered at a pixel on a feature image is usually computed as a texture feature for that pixel. For some approaches, the design a n d / o r implementation

0167-8655/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved. PIIS0167-8655(97)00124-4

994

J.-L Chen / Pattern Recognition Letters 18 (1997) 993-1007

for the feature image transformation need to be sophisticated so that simply the energy feature is sufficient for discrimination (Bovik et at., 1990, Jain and Farrokhnia, 1991). However, for some applications such as texture segmentation, the feature image transformation is usually designed to have the advantage of fast computation to reduce the computational load (Unser and Eden, 1989). In this case, a simple feature such as energy following the fast feature image transformation may not be discriminative sufficiently for texture identification. Subsequently, more sophisticated features are usually required for the discrimination purpose. The discrimination performance is strongly coupled with the feature image transformation and the associated features, which leads to the difficult problem of texture feature design. Can we ease the design problem by using

simple but discriminative features that can work with any fast feature image transformation? In our previous work (Chen and Kundu, 1995), an approach called the directional macro-masks was proposed to compute the directional texture features from feature images so as to improve the effectiveness of extracted features. This approach subsequently resulted in a sequence of feature vectors for a pixel rather than simply a feature vector. Accompanied by a particular classifier, the hidden Markov model (HMM) (Rabiner, 1989), which was good at processing sequential data, their approach achieved good segmentation results. Also, the directional macro-masks could enhance the discriminative capability of features; therefore, the selection of the feature image transformation was not so crucial and indeed "exchangeable". A set of spatially local masks, called Laws' masks (Laws, 1980), was employed for the feature image transformation in the experiments. Though with the advantage of fast transformation, this approach required a huge amount of memories and computations for the employment of the non-stationary HMMs. In this paper, to improve the efficiency, we propose a simplified approach to compute the directional texture features and subsequently reduce the redundancy of the computed directional texture features. In this new scheme, the stationary first order HMMs are used instead. In Section 2, we will evaluate the discrimination performance of our improved scheme by the experiments of texture classification.

Jain and Bhattacharjee (1992) argued that the texts, graphics and images appearing in the documents had distinct textural properties. Hence, the texture segmentation technique was applicable to the problem of document segmentation by analyzing the dissimilarities among various textures. The texture segmentation based technique has some advantages over the traditional structure based techniques (Wahl et al., 1982; Nadler, 1984; Fletcher and Kasturi, 1988). The texture segmentation is based on the spatial relationship of a pixel with respect to its neighborhood. Therefore, the segmented regions are not limited to any pre-defined layouts such as rectangles which is a requisite of text areas in the structure based methods. Consequently, no skew correction is necessary when processing slanted document images. Good experimental results were reported in (Jain and Bhattacharjee, 1992). The approaches to the document segmentation proposed thus far do not take into account the background interference on the printing materials. Schiirmann et al. (1992) pointed out that the document segmentation became extremely difficult if the background of a document had textural properties whose existence could not be eliminated simply by intensity thresholding. They also suggested that the possible solution to this problem would be based on texture analysis techniques. However, no research has been proposed for this problem since then. In this paper, we apply our proposed texture analysis scheme to the problem of text/textured-background separation, and show the advantages of our HMM based scheme for this problem. The proposed procedures for document segmentation are described in Section 3; and the conclusions are drawn in Section 4.

2. Directional texture features and hidden Markov model classifier

Following a proper feature image transformation, the energy in a square area centered at a pixel on a feature image is usually computed as a texture feature for that pixel. However, computing the energy feature in a square area is indeed an averaging procedure which cannot distinguish the directional difference in the square area on a feature image. In

J.-L. Chen / Pattern Recognition Letters 18 (1997) 993-1007

our previous work, we proposed to compute the directional energy features from 16 overlapped areas in a square window on the feature image (shown in Fig. l(a)), which represented corresponding textural information in the 16 directions of the central pixel in that square window. The mask covering one area was called a directional macro-mask. Consequently, the amount of features was increased from 1 (non-directional energy) to 16 (directional energies). However, it is obvious that the overlaps among the areas covered by these directional macro-masks are so large that there is redundancy among these energy features. For instance, there is about 86.67% overlap between areas covered by the first and the second directional macro-masks; 55.67% between the first and fourth directional macro-masks. Therefore, the inter-relationship among the directional energy features need to be processed and modeled properly, which results in the modeling complexity. In this paper, to reduce the redundancy of the directional energy features and modeling complexity, we modify the way of computing directional features as shown in Fig. l(b). The amount of directional features is 8. In this modified scheme, there is 40% overlap between the first and second directional macro-masks; and 4% between the first and fourth directional macro-masks. The redundancy of the texture features is reduced significantly.

2.1. Texture feature vector versus sequence of texture feature vector The proposed modified directional macro-masks are rotated in a counterclockwise fashion, every 45 °, to compute the texture features. In this case, a sequence of these computed features can be formed in terms of the rotation angles. Subsequently, there is a sequential relationship in-between the features of the sequence. If the sequential relationship is not taken into account, the proposed modified directional texture features, i.e., 8 feature vectors of dimension n × 1, can be concatenated into a single feature vector of dimension 8 n × 1. In this case, traditional classifiers for pattern recognition such as the Bayesian classifier and multi-layer perceptron (MLP) neural network can be used (Schalkoff, 1992). On the other hand, if the sequential relationship in-between the directional texture features is incorporated,

995

the hidden Markov model (HMM) classifier is a natural candidate (Chen and Kundu, 1995). The directional texture features ordered sequentially are treated as being generated by a Markov source; therefore, an HMM can be used to model the sequential relationship of texture features, which is similar to the application of HMM to the speech modeling problem (Rabiner, 1989).

2.2. Stationary hidden Markov model classifier In the HMM based scheme, the HMM exploits the sequential relationship in-between the directional feature vectors and is used as a model for a texture. There is one model for each type of texture. In the training stage, a maximum-likelihood estimation based algorithm called the Baum-Welch algorithin (A* = maxP(O ] A)), which is widely used in HMM related problems (Rabiner, 1989), is employed to obtain the best matched model parameters A* for a given feature sequence O. In the testing stage, an unclassified feature sequence is fed into M trained HMMs. The HMM giving the highest likelihood probability identifies the class for the feature sequence, i.e., m = argmaxm= 1,... ,M P((} [ Am). The Forward-Backward algorithm is used in this stage (Rabiner, 1989). There exists a well-defined metric known as the discrimination information (DI) that provides a similarity metric between two HMMs (Rabiner, 1989; Chen and Kundu, 1995),

DI(Ai,Aj)-[D(Ai,A/)+D(Aj,Ai)]/2,

(1)

where

D(Ai,Aj) 1

= ~ EllogP(O~ I Az) - logP(O~ I ~j)l,

(2)

k i

i

i

where O k = {Ok,,Ok2. . . . . OkT} is the kth feature sequence generated from HMM Ai and T is the length of the sequence. When HMMs I i and Aj are similar, DI(Ai,Aj) is small; otherwise, DI(Ai,Aj) is large.

2.2.1. Stationary uersus non-stationary HMM In our previous work, the first order HMMs were used for modeling textures. However, the directional texture features were highly correlated due to the way they were computed. To avoid rough modeling,

J.-L. Chert/Pattern Recognition Letters 18 (1997) 993-1007

996

(a)

[] Central point

~

(b)

[]

NPIIIIII

~rlll~

~__1

NNtN

I I I I ' I

I I I I I

Central Point

,lillJ,~illlE i i li ~ i I L R ! I I ~

~J

111111

I

I I I

I I 1 1 L i 1 1 1 1 1

! L I ~ I I

L

ill

LI!

I

!

I

I lqi

J.-L. Chert~Pattern Recognition Letters 18 (1997) 993-1007

(bl)

(b2)

(b3)

(b4)

(b5)

(b6)

(b7)

(b8)

(b9)

997

(blO) Fig. 2. 10 textures for classification experiments.

Fig. 1. Two sets of directional macro-masks: (a) 16 areas of a square window for computing directional features in our previous scheme; (b) 8 areas of a square window for computing directional features in our proposed method. The darkest block represents the central pixel with weight 1. Pixels belonging to gray regions have a weight of 1; otherwise the weight is 0.

998

J.-L. C h e n / P a t t e r n R e c o g n i t i o n L e t t e r s 18 (1997) 9 9 3 - 1 0 0 7

the non-stationary first order HMMs were used. However, there are some limitations when using the non-stationary HMMs. The non-stationary model incorporates the information of sequential order about the data sequence, and the number of model parameters for the non-stationary model is proportional to the length of the data sequence. When the number of parameters is large, in addition to the huge amount of computations required, the training procedure will stop at a local optimum. Besides, the non-stationary model is not a good choice if a large set of data sequences for training purpose is not available or the sequence lengths are short. In this paper, the redundancy among the proposed modified directional texture features is reduced significantly; therefore, there is no need to employ the non-stationary HMMs and we use the stationary models which do not have the limitations of non-stationary models.

justify the discriminative capability for the modified directional feature vectors based on one particular technique for fast feature image transformation (Laws' masks) and the stationary HMM classifier; (2) performance of feature vector sequence and feature vectors: compare the discrimination performance for the stationary HMM classifier (feature vector sequence) and the MLP neural network (feature vectors) based on the transformation used in (1) and the modified directional texture features; (3) performance of various fast feature image transformation techniques: justify the "exchangeability" of fast transformations based on the modified directional texture features and the stationary HMM classifier. 10 texture images are used for experiments as shown in Fig. 2, which includes a text image, 2 synthesized textures and 7 natural textures (Brodatz, 1965). Each image is of size 256 X 256 and has 256 gray levels. From each texture image, we randomly extract 36 32 × 32 non-overlapping subimages, 18 for training and the others for testing. To obtain more samples for training, we split every subimage

2.3. Experiments for texture classification In this experiment, we plan to justify the following items: (1) performance of the proposed scheme:

E5L5

=

(a) li -4 R5R5

(b)

Ill

=

6

iI

0

0

0

8 4

12 6

8 4

L5S5

-4 6 -4 -11 16 -24 16 -41 -24 36 -24 6 , 16 -24 16 -4 ! -4 6 -4 l !

Ill

~i1

Ill

=

E5g5 =

-

0

-

0 12 0 0 8 0 0 2 0

-1 0 0 0

8

2 4 0

0

-

0 0 -2 0

0 -2 0

11i

I J1

111

E~ ~0 ;1i [+°!100 E:~010~ ~ ~ [~ 0;,]0 (c)

L~,~ 0~ ~1 [~,010~ ~0 ~ ~, L~~4~ +i ~!2~:1 ~ 1 ~ [i~:] ,, Fig. 3. There fast feature image transformations. (a) Law's masks. (b) Discrete Hadamard transform masks. (c) Frei-Chen edge detection masks.

J.-L. Chen / Pattern Recognition Letters 18 (1997) 993-1007

999

Table 1 The experimental results of classification accuracy for Laws' masks and 6-state HMM. The entity in each row represents the number of classified samples against the classes in each column

bl b2 b3 b4 b5 b6 b7 b8 b9 bl0

bl

b2

b3

b4

b5

b6

b7

b8

b9

bl0

72 0 0 0 0 0 0 0 0 0

0 72 0 0 0 0 0 0 0 0

0 0 72 0 0 0 0 0 0 0

0 0 0 66 0 0 0 0 1 0

0 0 0 1 66 0 0 0 2 0

0 0 0 2 0 71 0 0 0 0

0 0 0 0 0 0 69 0 0 0

0 0 0 3 0 0 2 72 4 0

0 0 0 0 6 0 1 0 65 0

0 0 0 0 0 1 0 0 0 72

Table 2 The experimental results of classification accuracy for Laws' masks and MLP neural network with learning rate 0.01. The entity in each row represents the number of classified samples against the classes in each column

bl b2 b3 b4 h5 b6 b7 b8 b9 bl0

bl

b2

b3

b4

b5

b6

b7

b8

b9

bl0

68 0 0 0 0 0 0 0 0 0

0 72 0 0 0 0 0 0 0 0

4 0 72 0 7 0 0 1 3 0

0 0 0 66 0 0 0 0 2 1

0 0 0 1 62 1 1 0 2 0

0 0 0 1 0 70 0 1 0 0

0 0 0 0 0 0 67 0 3 0

0 0 0 1 0 0 0 69 0 0

0 0 0 3 3 0 4 1 62 0

0 0 0 0 0 1 0 0 0 71

Table 3 The experimental results of classification accuracy for discrete Hadarnard transform masks and 6-state HMM. The entity in each row represents the number of classified samples against the classes in each column

bl b2 b3 b4 b5 b6 b7 b8 b9 bl0

bl

b2

b3

b4

b5

b6

b7

b8

b9

bl0

72 0 0 0 0 0 0 0 0 0

0 72 0 0 0 0 0 0 0 0

0 0 72 0 0 0 0 0 0 0

0 0 0 64 0 0 0 0 0 0

0 0 0 0 66 3 0 0 3 2

0 0 0 0 0 58 0 0 0 0

0 0 0 0 0 0 72 0 0 0

0 0 0 0 0 5 0 69 0 0

0 0 0 8 3 6 0 3 69 0

0 0 0 0 3 0 0 0 0 70

J.-L. Chen / Pattern Recognition Letters 18 (1997) 993-1007

1000

Table 4 The experimental results of classification accuracy for Frei-Chen edge detection masks 6-state HMM. The entity in each row represents the number of classified samples against the classes in each colurrm

bl b2 b3 b4 b5 b6 b7 b8 b9 bl0

bl

b2

b3

b4

b5

b6

b7

b8

b9

bl0

72 0 0 0 0 0 0 0 0 0

0 72 0 0 0 0 0 0 0 0

0 0 72 0 0 0 0 0 0 0

0 0 0 70 0 0 0 0 0 0

0 0 0 0 68 0 0 0 0 0

0 0 0 0 0 67 0 2 0 0

0 0 0 0 0 0 72 2 0 0

0 0 0 0 0 0 0 68 0 0

0 0 0 2 4 5 0 0 72 0

0 0 0 0 0 0 0 0 0 72

into 4 blocks and extract one set of directional feature vectors from each block. That is, 4 feature samples are computed from every subimage. Totally, we have 72 training and 72 testing samples. For the experiments, the HMM classifiers have 4, 6 and 8 states; the MLP neural networks have 32 input nodes, 10 output nodes and one hidden layer of 16 nodes, trained with 0.1 and 0.01 learning rates, respectively; and the fast feature image transformation techniques are Laws' masks (Fig. 3(a)), discrete Hadamard transform masks (Fig. 3(b)) (Unser and Eden, 1989) and Frei-Chen edge detection masks (Fig. 3(c)) (Frei and Chen, 1977). The classification results are shown in Tables 1-4. Table 1 shows the experimental results of Laws' masks and 6-state HMM; Table 2 shows the results of Laws masks and MLP neural network with 0.01 learning rate; Table 3 shows the results of discrete Hadamard transform masks and 6-state HMM; and Table 4 shows the results of Frei-Chen edge detection masks and 6-state HMM. It should be noted that the 4-state, 8-state HMMs and MLP neural network with 0.1 learning rate have the similar performance, and it seems too trivial to show all the detailed experimental results. The averaged classification accuracy for the HMM classifiers is 96.11% for 4 states, 96.81% for 6 states

li 23 i

5

7

8

Fig. 4. 8 connected neighbors for block i.

and 95.14% for 8 states; 94.31% for MLP neural networks with 0.01 learning rate and 92.36% for the case of 0.1 learning rate; and 95% for discrete Hadamard transform masks and 97.92% for FreiChen edge detection masks, both with 6-state HMM. The results show: (1) The high classification accuracy is obtained based on the proposed directional texture features when the inter-relationship among the features is exploited by the appropriate classifier such as the HMM classifier or the MLP neural network. (2) The discrete Hadamard transform masks and Frei-Chen edge detection masks have high and comparable classification accuracy to the Laws' masks based on the 6-state HMM, and thus show the "exchangeability" of the fast feature image transformation techniques. (3) The high classification accuracy of our proposed method is very comparable to other existing feature-based approaches such as Gabor filters (Ohanian and Dubes, 1992), which usually requires sophisticated selection of the filter parameters (Bovik et al., 1990; Jain and Farrokhnia, 1991; Jain and Bhattacharjee, 1992; Ohanian and Dubes, 1992). However, our approach is free of this design difficulty of texture features but still with high and comparable classification accuracy.

Wl

W2 \/ / \

W3

W4

Fig. 5 . 4 subwindows for central pixel x.

Z -L. Chert / Pattern Recognition Letters 18 (1997) 993-1007

3. Application to document segmentation problems In contrast to most document segmentation approaches which assume that the background of a document can be easily separated by intensity thresholding, in this paper, we apply our proposed texture analysis scheme to a particular document segmentation problem - text/textured-background separation where the intensity thresholding for separating the background is no longer applicable. In this particular problem, the text extraction and background separation must be accomplished simultaneously, consequently the structure based approach may not be applicable. Whereas, the supervised texture segmentation is feasible if reasonable constraints are imposed such as types of texture background, e.g. patterned paper used, are known a priori. Obviously, the effectiveness of segmentation performance relies on the robust texture discrimination.

1O01

Laws' masks for feature image transformation. Each pixel in the feature image is represented by a sequence of length 8 with each feature vector of dimension 4 × 1. As shown in the previous section, in addition to Laws' masks, other feature image transformation techniques can be used as well. Stage 2: Coarse segmentation: This stage is a block-based segmentation which can quickly locate the candidate blocks of text. Three types of blocks are defined in this stage: text, background and boundary between text and background. Thus, the candidate blocks of text are the text and boundary blocks which will be processed in detail in the next stage. Stage 3: Fine segmentation: This stage is a pixelbased segmentation which processes only the candidate blocks of text. The contextual information for each pixel is exploited so as to label the text pixels correctly.

3. l. Supervised segmentation procedure The implementation of supervised texture segmentation is based on the texture classification for every pixel. However, performing the segmentation on a pixel basis requires a huge amount of computations. Instead, we speed up the procedure by a coarse-to-fine strategy. A block-based coarse segmentation is performed first to segment out possible text areas, then a pixel-based fine segmentation is performed only on the pixels in the candidate text areas. As shown in the experiments in Section 2, based on the directional texture feature vectors, the HMM classifier and MLP neural network both have excellent and comparable performance. However, the MLP neural network learns all texture classes in one network, that is, if a new class of textures is added, the network has to be re-trained for all classes. Whereas, one HMM is designed and trained for one texture class. In case a new class is added, only the new class need be trained individually. Consequently, we prefer the HMM classifier for the supervised texture segmentation. Our system scheme can be described as follows:

Stage 1: Feature image transformation: We first compute the directional texture feature vectors using

2Z2~

2,T Iv ,. ~

r'

.......

2.zZ

D1

E2

E1

E3

Fig. 6. Text and texture images: D1 is text images, El, E2 and E3 are structural texture images.

1002

J.-L. Chen / Pattern Recognition Letters 18 (1997) 993-1007

3.2. Coarse-to-fine segmentation

this stage, the text area candidates are the blocks of pure text and the fuzzy blocks. We label a block as a specific texture if all its 8 connected neighboring blocks are labeled as the same textures. This assumption is based on the realization that the texture is a phenomenon of local homogeneity observed within a local area, and the central part of this local area is more likely to be a " p u r e " texture. The procedures of coarse segmentation can be described as follows:

3.2.1. Coarse segmentation At the coarse segmentation stage, we hope to locate candidate areas of text, or conversely, locate strictly "pure" background. Hence, only blocks having homogeneously textural property are labeled as either background or text areas. The blocks located near the textural boundaries usually cover both the texts and the backgrounds; therefore these blocks have nonhomogeneously textural property. We label non-homogeneous blocks as fuzzy blocks and have them processed in the later stage. At the output of

D1

Step 1: Split the feature image into numerous small non-overlapping blocks. 8 × 8 block size is used in our scheme.

D2

D3

g

!,2

D6

D5

D4

D7

D8

Fig. 7. Various text images: text images with various sizes, fonts, rotations and languages.

J.-L. Chen / Pattern Recognition Letters 18 (1997) 9 9 3 - 1 0 0 7

Step 2: Label each block i: 1. Take the feature sequence of the central pixel of block i as the representative feature sequence for the whole block, denoted 0 i. 2. For a pre-trained HMM Am, compute P(Oi[ h m) and P(OjI A.,), j = 1 . . . . . 8. Blocks j ' s are 8-connected neighbor blocks of block i as shown in Fig. 4. 3. Label block j as group m* if

labeled as ~ o u p m* ; otherwise, label block i as fuzzy.

3.2.2. Fine segmentation The output of the coarse segmentation stage is the candidate areas of text where the fine segmentation will be performed to segment out the texts more accurately. The contextual maximum likelihood probability, proposed in our previous work (Chen and Kundu, 1995) for labeling each pixel, has the advantage of incorporating neighborhood information while preserving the directionality. We adopt this procedure for our fine segmentation, which can be described briefly as follows:

m+ = argmaxm= 1 ..... M p j m , j = 1,2 . . . . . 8,

(3)

where

Pjm=[P(Ojl1.+)+PCOil~.m)]/2.

(4)

Step 1: Split a square window centered at pixel x into 4 subwindows, named w 1, w2, w 3 and w4,

4. Label block i as group m* if all blocks j ' s are

(a)

'

-

~

: ~ ~ , + ,

""

K 17+t~lTl[-J~l---,._

2+Z;;. . . . . . . . .

_

~ + +

.........

I I _~

::::::::It+• t h t + ~ N . . ~ h ~

-ff+a-~ -i4 ~ .........

.

"~ ..............................................

~

E

.

.

.................

T - - - - E -

m

~

-

~

::::::::::::::::::::::

. . . . . . . . . .

~

.

.

.

........

• ===========================

• ..~;~:~::r~: + ~,

.

(b)

w. . . . I .....................

;.;+;++Y2+ ,222+ :J_:72+22 2 +~ IiI ++.,2.I+U+[12~IIl NI

.

1003

.

......

.

JLm

.

.

==========================

.

.

.

~til i L l . L + I | I t ~ r ~ - U l l l M I +

.

.

.

.

~AL--__~J

~1 ¥ t l N

(c)

:LI I OLil,l.llflit~Jl~G--U!!~[Ipiml |1.,.~m!+~,

Fig. 8. Text extraction with textured background I. (a) Original document image. (b) Result of coarse segmentation. (c) Result of fine segmentation.

1004

J.-L. Chen / Pattern Recognition Letters 18 (1997) 993-1007

respectively, as shown in Fig. 5. Each subwindow is of size 3 X 3 in our scheme. For each subwindow w~, i = 1,2,3,4, compute the mean and variance of P ( O I , ~ ) over all pixels in wf against H M M Am, named as /x¢ and o-~z, respectively. Step 2: Replace the P(O l A~) of pixel x as /xi if cri2 is minimum of 4 subwindows, i.e.,

Px(Ol,~m)=t.t~i*, i * = a r g m i n i = a ..... 4o'i 2. Step 3: Label pixel x as group m*, if

(5)

m* = argmaxm= ~..... i P~(O I )t~).

(6)

3.3. Experiments of text separation from textured background For the supervised text/textured-background separation, we have an experiment of analyzing the (a)

Table 5 The DI values between the images shown in Fig. 6

D1

E1

E2

E3

239.57

208.83

47.24

discrimination information (DI) metrics of various textures to realize the dissimilarities among various to-be-segmented textures (HMMs) including various types of texts and backgrounds. Fig. 6 shows 4 different textures, a text image and 3 structural textures which are used as examples of backgrounds. Table 5 shows the corresponding DI values of one text image (D1) against three other texture images. Fig. 7 shows 8 different types of text, including (b)

:'I;

,~ ~,

"

"

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 ~ N N J

(c)

Fig. 9. Text extraction with textured background 2. (a) Original document image. (b) Result of coarse segmentation. (c) Result of fine segmentation.

1005

J.-L. C h e n / P a t t e r n R e c o g n i t i o n L e t t e r s 18 ( 1 9 9 7 ) 9 9 3 - 1 0 0 7

Table 6 The DI values between the text images shown in Fig. 7. The entry is the DI value between D1 and D j, j = 2 . . . . . 8.

D1

D2

D3

D4

D5

D6

D7

D8

5.04

5.42

7.38

4.73

8.32

8.43

13.2

different size, font, and rotation of English and Chinese text. Table 6 shows the corresponding DI values of one text image (D1) against other text images (D1-D8). From Tables 5 and 6, it is obvious that the DI values between various text images are small compared to those between text and textures shown in Fig. 6. That is, an HMM can represent a variety of texts since they are "similar" in the sense of HMM. Thus, we need only train one H M M with text o f one

(a)

size, font, rotation, even one language to represent texts. There is no need to learn all kinds o f text.

In the following experiments, we use real examples of document images to show the performance of our proposed scheme. All the document images for the experiments are of size 256 × 256 and have 256 gray levels. As mentioned previously, only one size and font of text need be trained for an HMM of text. We use an HMM trained by the D1 image shown in Fig. 6 to be the text model for all images through the segmentation experiments. Figs. 8-10 are document images with different textured backgrounds. Fig. 8(a), Fig. 9(a) and Fig. 10(a) are the original images. Fig. 8(b), Fig. 9(b) and Fig. 10(b) are the coarsely segmented images. The fuzzy blocks have the darkest gray level; the background blocks are the gray areas; and the text areas are represented by the original

(b)

, •

.~ •

~ "

................................ '

~

'

T " "

"

~

(c)

Fig. 10. Text extraction with textured background 3. (a) Original document image. (b) Result of coarse segmentation. (c) Result of fine segmentation.

1006

J.-L. Chen / Pattern Recognition Letters 18 (1997) 993-1007

(a)

(b)

images with multiple languages and multiple textured backgrounds can be located correctly. It should be noted that the Gabor filter based texture segmentation techniques may also be applicable to the problem of text/textured-background separation (Bovik et al., 1990; Jain and Farrokhnia, 1991; Jain and Bhattacharjee, 1992). Whereas, as shown in this paper, our proposed HMM based scheme is very robust on modeling many types of texts without careful selections of the texture features, and only one type of text is required for training. Besides, the DI values provide a very natural insight for analyzing the similarities among texts, which will be beneficial to the development of an automatic scheme. However, these advantages of our

(a)

Fig. 11. Document image with un-trained Chinese fonts and various rotations. (a) Original document image. (b) Result of fine segmentation.

(b) image. Fig. 8(c), Fig. 9(c) and Fig. 10(c) show the results after the fine segmentation. /t is obvious that our scheme can locate the text areas accurately with various textured backgrounds.

Fig. 11 is an un-trained Chinese text image with 45 °, 90 ° and 180° of rotations. The image in Fig. 12 has two different textured backgrounds, two different languages (Chinese and English) of various rotations, fonts and sizes. The fonts and sizes of the texts in Figs. 11 and 12 are not pre-trained. Fig. ll(a) and Fig. 12(a) are the original document images. Fig. 1l(b) and Fig. 12(b) are the segmented results which show that all text areas including the slant text areas can be located accurately, even the texts are not trained a priori. Also, the text areas in the document

Fig. 12. Document image with two different languages, various 2onts, rotations and two different backgrounds. (a) Original document image. (b) Result of fine segmentation.

J.-L. Chen / Pattern Recognition Letters 18 (1997) 993-1007

proposed scheme have not b e e n seen in the G a b o r filter based and other approaches.

4. Conclusions In this paper, we propose a simplified set of directional m a c r o - m a s k s for c o m p u t i n g directional texture features. The proposed scheme has the advantage of little d e p e n d e n c e to the feature i m a g e transformation, and p e r f o r m excellently with various fast transformations. Therefore, the difficult p r o b l e m of feature design is relaxed. A c c o m p a n y i n g with H M M classifier for exploiting inter-relationship i n - b e t w e e n the c o m p u t e d directional texture features, our scheme has steady and excellent performance on texture classification. A p p l y i n g to a d o c u m e n t segmentation problem, t e x t / t e x t u r e d - b a c k g r o u n d separation, our scheme can locate the text areas accurately even in appearance of multiple textured backgrounds. It is the advantage of our scheme that only one H M M is capable of m o d e l i n g a variety of texts i n c l u d i n g various languages, fonts, sizes and rotations. In the future, more work should be emphasized on (1) unsupervised t e x t / t e x t u r e d - b a c k g r o u n d separation, and (2) m o r e detailed separation of texts, graphics and images.

Acknowledgements The author w o u l d like to thank the a n o n y m o u s reviewers for their very careful reading of the m a n u s c r i p t and their sound advice.

References Bovik, A.C., Clark, M., Geisler, W.S., 1990. Multichannel texture analysis using localized spatial filters. IEEE Trans. Pattern Anal. Machine Intell. 12, 55-73.

1007

Brodatz, P., 1965. Textures - A Photographic Album for Artist and Designers. Dover, New York. Chellappa, R., Chatterjee, S., 1985. Classification of textures using Gaussian Markov random fields. IEEE Trans. Acoust. Speech Signal Process. 38, 959-963. Chen, J.-L., Kundu, A., 1995. Unsupervised texture segmentation using multichannel decomposition and hidden Markov model. IEEE Trans. Image Process. 4, 603-619. Coggins, J.M., Jain, A.K., 1985. A spatial filtering approach to texture analysis. Pattern Recognition Lett. 3, 195-203. Fletcher, L.A., Kasturi, R., 1988. A robust algorithm for text string separation from mixed text/graphics image. IEEE Trans. Pattern Anal. Machine Intell. 10, 910-918. Frei, W., Chen, C., 1977. Fast boundary detection: A generalization and a new algorithm. IEEE Trans. Comput. 26, 988-998. Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distribution, and Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6, 721-741. Haralick, R.M., 1979. Statistical and structural approaches to texture. Proc. 1EEE 67, 786-804. Jain, A.K., Bhattacharjee, S.K., 1992. Text segmentation using Gabor filters for automatic document processing. Machine Vision Appl. 5, 169-184. Jain, A.K., Farrokhnia, F., 1991. Unsupervised texture segmentation using Gahor filters. Pattern Recognition 24, 1167-1186. Laws, K.L., 1980. Rapid texture identification. Proc. SPIE 238, 376-380. Nadler, M., 1984. A surey of document segmentation and coding techniques. Comput. Vision Graphics Image Process. 28, 240262. Ohanian, P.P., Dubes, R.C., 1992. Performance evaluation for four classes of textural features. Pattern Recognition 25, 819833. Rabiner, L.R., 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257-286. Schalkoff, R., 1992. Pattern Recognition: Statistical, Structural and Neural Approaches. Wiley, New York. Schiirmann, J., Bartneck, N., Bayer, T., Franke, J., Mandler, E., Oberlander, M., 1992. Document analysis - From pixels to contents. Proc. IEEE 80, 1101-1119. Unser, M., Eden, M., 1989. Multiresolution feature extraction and selection for texture segmentation. IEEE Trans. Pattern Anal. Machine Intell. 11,717-728. Wahl, F.M., Wong, M.K.Y., Casey, R.G., 1982. Block segmentation and text extraction in mixd text/image documents. Comput. Vision Graphics Image Process. 20, 375-390.