Page segmentation using texture analysis

Page segmentation using texture analysis

Pattern Recoonition, Vol. 29, No. 5, pp. 743 770, 1996 Elsevier Science L'td Copyright © 1996 Pattern Recognition Society Printed in Great Britain. Al...

14MB Sizes 36 Downloads 239 Views

Pattern Recoonition, Vol. 29, No. 5, pp. 743 770, 1996 Elsevier Science L'td Copyright © 1996 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031-3203/96 $15.00 + 00

Pergamon

0031-3203(95)00131-X

PAGE S E G M E N T A T I O N U S I N G TEXTURE ANALYSIS ANIL K. JAIN and YU Z H O N G Department of Computer Science, Michigan State University, East Lansing, MI 48824, U.S,A. (Received 11 April 1995; in revised form 7 August 1995; received for publication 29 August 1995) Abstract--We propose a new texture-based language-free page segmentation algorithm which automatically extracts the text, halftone, and line-drawing regions from input greyscale document images. This approach utilizes a neural network to train a set of masks which is optimal for discriminating the three main texture classes in the page segmentation problem: halftone, background, and text and line-drawing regions. The text and line-drawing regions are further discriminated based on connectivity analysis. We have applied the algorithm to successfully segment English and Chinese document images. We also demonstrate that the masks can perform language separation (English/Chinese) when appropriately trained. Document analysis

Neural network

Page segmentation

I. INTRODUCTION

Texture

Learning

and document image understanding can be found in Haralick. 191 In this paper, the page layout segmentation is posed as a texture segmentation problem. A neural network is employed to train a set of "optimal" texture discrimination masks which minimizes the classification error for the given texture classes in the page segmentation problem. Texture features are obtained by convolving the trained masks with the input image. These features are then used in the classification. Because the text in most languages consists of individual characters which are formed by small curves and dots, it represents a relatively consistent texture which is quite different from that of the halftone and background regions. Therefore, a texture-based page segmentation algorithm can be designed to be insensitive to the language of the document. We propose a simple language-free page segmentation algorithm which is based on texture analysis. This method is directly applied to the input gray level document images, which avoids the loss of information during binarization. The proposed segmentation method also shows potential in the problem of language separation, where texts of different languages need to be discriminated and identified. There exist some textural differences between languages because of the possible differences in the construction of fundamental elements (letters, characters, etc.) of languages or the placement rules of the elements. If the textural difference is large enough to be captured by the texture discrimination masks, then the two languages can be separated. The proposed page segmentation method is distinguished from other published algorithms by the following desirable properties: (i) we are able to successfully segment the input image into four distinct types of regions (text, halftone, graphics and background), (ii) we directly process the input gray level image instead of first converting it to a binary image,

Document image understanding is an important research area which is receiving increasing attention. Page segmentation is a document processing technique which is used to automatically determine the format of a page. A scanned page image is first divided into blocks which are then classified as text, halftone, or line-drawing. Segmentation of an input image into coherent regions is the first step before applying a classification algorithm. Therefore, many higher level analyses of the input document image are based on the output of a page segmentation system. For example, the extracted text regions can be the input to an intelligent character recognition (ICR) system to retrieve the ASCII characters printed on the page. The spatial relationships of the segmented blocks along with other features can be used in logical page organization analysis to group the components of a document image appropriately and recover the correct reading order. Many techniques for page segmentation have been proposed in the literature/1 4) Early approaches to page segmentation, based on the assumption that an input document image consists of right rectangular blocks, were typically applied to binary images. These methods can be generally classified as either topdown ~2,sl or bottom-up approaches36) When the segmentation is performed in a top-down strategy, a page is first split into major regions, and major regions into subregions, and so on. On the other hand, in a bottomup strategy, connected components are merged into small regions based on local evidence, and the small regions are then successively merged into larger regions. Recent hybrid approaches °'<7"8~ to page segmentation have utilized both local and global strategies to cope with complicated page segmentation problems. A survey of page segmentation techniques 743

744

A.K. JAIN and Y. ZHONG (

/

)

,..

(

i

,,,

(

/

)

Output

Layer (Classes)

'\

',

Layer2

7" (

~

--7U ...

(

)

Layer 1 (Masks)

\

Input i m a g e

M \

\ I<---

M--

21

Fig. 1. Three-layer neural network for texture classification. (iii) it is robust to the language of the document; acceptable segmentation results on both English and Chinese document images are achieved at a low scanning resolution (lOOdpi), and (iv) it can be trained to perform language separation. The rest of the paper is organized as follows. In Section 2, we describe the learned masks-based page segmentation approach. This approach is used to perform both page segmentation and language separation. A postprocessing algorithm is also introduced which calculates the layout bounding boxes based on the page segmentation results. Experimental results on English and Chinese document images are presented in Section 3 to illustrate the performance of our algorithm. Section 4 concludes the paper with a discussion. 2. SEGMENTATIONBASEDON TEXTUREANALYSIS An image region is textured if it contains some repeating gray level patterns/1°~ Therefore, text regions possess a unique texture because they typically follow a specific arrangement rule: each region consists of text lines of the same orientation with approximately the same spacings between them and each text line consists of characters of approximately the same size. This specific texture property makes text regions distinguishable from the non-text regions) ~17At the same time, the non-text regions, such as halftone and background can be considered to represent different textures. The above observations suggest that the primary components of page layout--halftone, background, and text and line-drawings can be discriminated by texture segmentation or classification methods. A widely used texture segmentation method is the multichannel Gabor filtering technique) 12) In this method, different Gabor filters are tuned to capture desired local spatial frequency and orientation characteristics of a textured region. Jain and Bhattacharjee have successfully used the multichannel Gabor filters

Three-class segmentation using Gabor filters or neural network

Smooth the labeled image

Merge nearby regions

Remove small components

Separate text from line drawings

Place bounding boxes around regions (Page skew is known)

Input image with bounding boxes around different regions Fig. 2. Page segmentation algorithm.

Page segmentation using texture analysis to extract text and halftone regions from the input document image. ~111 They employed a set of twenty Gabor filters with four different orientations and five different spatial frequencies to extract the texture features. These feature vectors are then used to segment the input image. Gabor filtering technique, while popular in texture analysis because of its optimal localization in both the spatial and spatial frequency domains, does not guarantee that it is the best for a given texture segmen-

745

tation or classification task. Also, certain heuristics and domain-specific information have to be used to find a set of filters which gives good performance in a specific application. We mean by "domain-specific information" the characteristic and discriminative features of the classes in this particular segmentation task. A general filter bank with predefined parameters may not be able to catch these distinctive features, and as a result, may not be effective for this segmentation. Instead of the general filters, such as the Gabor filters,

(a) Fig. 3. Training image for supervised page segmentation. (a) training image of size 1754 x 1488(100dpi); (b) ground truth image of (a) [three classes: text and line-drawing (dark gray), halftone (light gray) and background (black)].

746

A.K. JAIN and Y. ZHONG

(b) Fig. 3. (continued) there may exist a smaller number of "special purpose" filters, whose classification accuracy is comparable to that of the Gabor filters, but with a large reduction in the computational requirements and processing speed. In an attempt to design a small number of simple "special purpose" filters optimized for the given textures, Jain and Karu ~137have used a multilayer feedforward neural network that is trained to minimize texture classification error. The filters which the network learns perform the same feature extraction task as the Gabor filters, but now, as opposed to a general filter bank, they are tuned for the given texture classification problem. Reduction in computational

resources is not only due to the smaller number of filters, but also because the learned masks have smaller size than Gabor filters,* and the relatively slow knearest neighbor classifier is replaced by a faster multilayer feed-forward neural network. 2.1. Learnin9 texture discrimination masks The above approach for learning texture discrimination masks has been applied to the page segmenta*Convolution with Gabor filters is usually performed via Fourier transform.

Page segmentation using texture analysis tion and language separation problem as follows. A set of masks which best discriminates between halftone, background, and text and line-drawing regions can be obtained by training a neural network on sample data from these three classes, During the classification stage, each pixel in the input image, based on its neighborhood information, is classified by the network into one of the three classes. A schematic diagram of the neural network classifier is shown in Fig. 1. The input layer of the network is presented with pixels in a spatial configuration presented by the bold dots inside an M x M window of the input image as shown in Fig. 1. It is assumed that the image texture is homogeneous in this window. Choosing this specific configuration instead of the entire M x M window

747

reduces the number of connection weights (parameters) and results in an improved generalization performance and classification speed. Each node in the first hidden layer corresponds to a mask, where the weights of the links from the input nodes to the hidden layer nodes are the coefficients of the mask. Each node in the output layer corresponds to each of the output classes. Analogous to the Gabor filtering, the first layer of the network performs feature extraction, while the part of the network above the first hidden layer acts as classifier. The neural network operates as a standard multilayer perceptron and it is trained with the backpropagation algorithm. We start with a neural network with 20 masks, and then see if there is any mask which does not contribute much to the classification. If

(a) Fig. 4. Learned masks for three-class page segmentation. (a) frequency responses of the sixteen masks; (b) average feature vectors for background, text and halftone regions using the set of 16 learned masks.

748

A.K. JAIN and Y. ZHONG

Average Feature Vectors in 3-class Page Segmentation o>

d

,=,

/~ckground

"/"""

A

,

......

..

~o

/"

¢5

i

/

"~ /~

.

i

h

-

~

.

~,

"''"

/ ....

.

..' ".

J

'

"

d i

~

i

.ed

i

..I "

text 143

d

c~

:

:

i :.1

\ I".

I';

:

II

:

i

/"

i' ~

':: i "

t

i

O t

I

i

'I

! :.

!

~

,:

, / I

~.; ":

i

i' ~

~ "

I

~.; ~

"

~

n ~

~

J

:

i

t

e •

,

:

+

i

'~ ~',, t

1

:

i

".

io

! :' ~/

t ~,;~

i; t

halftone

.

: I

i

I

"

i I

:i .:

,.

I,.

' #

tl

I

t

~

t

I

,, I

i"

I

~' / ' " ' " . ~ ": ~ " r'.. i

:

I.-

i

I

',

tl

,

,,

¢5

mask

(b) Fig+4. (continued)

so, this node is removed. We used the node pruning method described by Mao et al.(1+) The algorithm defines the saliency of each node as the increase in the mean-square error E when the node is removed. The saliency of a node is approximated in a back-propagation like manner by propagating the first- and secondorder derivatives of the error E with respect to the activation values Vti back in the network. After the training algorithm has converged, the node with the lowest saliency can be pruned. Each node removal, of course, requires retraining the netWork. By running the node pruning algorithm, we determine an empirical relationship between the classification error and the number of nodes used in the network. Then we decide the appropriate number of nodes as the one which gives acceptable correct classification rate with the smallest number of input nodes. The node pruning process results in a compact and efficient network. It finds a smaller set of masks which has better or equivalent performance compared to Gabor filters. The number of masks we actually used in classification is the number of nodes remaining in the first hidden layer after node pruning. One important parameter of the system is the size of the texture discrimination masks. A larger mask means

that pixels in a larger neighborhood participate in classifying the pixet in the center, and as a result, the mask becomes less discriminative. In addition, inappropriately large masks usually smear the border of the segmented regions. On the other hand, as image texture is a neighborhood property, the mask must be large enough to capture the texture characteristics. Our initial experimental results showed that masks of size 7 x 7 are sufficient for discriminating the texture classes in our problem. The training session of the neural network automatically tunes the weights of the masks so that the classification error is minimized. Domain specific information is incorporated in the network during the training. We have trained two sets of masks, one for the three-class page segmentation, and another for the two-class English/Chinese text separation. These sets of masks are then convolved with the input image to give the feature vectors for classification. 2.2. Page segmentation algorithm We have used a hierarchical approach to classify text, line-drawing, halftone, and background regions. In this approach, text and line-drawing regions are

Page segmentation using texture analysis

749

Fig. 5. Segmentation using texture discrimination masks on image UW: S029GRY (sixteen 7 x 7 masks); (a) input image (1080 x 780, 100dpi); (b) three-class classification result; (c) final page layout.

first put in the same category, because of the textural similarity of text and line-drawings in a 7 x 7 neighborhood. The three classes (text and line-drawing, halftone, and background) are discriminated by a neu-

ral network with a set of 7 x 7 masks. Text and linedrawings are further discriminated based on connectivity features. One reliable discriminator for text versus line-drawings is given by Wahl 115) who used the

750

A.K. JAIN and Y. ZHONG

(b) Fig. 5. (continued)

border-to-border distance within a connected component of the binarized document image. We have exploited a similar measurement motivated by the observation that line-drawing regions usually include irregular lines and curves which form connected components with larger sizes than those of indi-

vidual characters. For each region classified as text or line-drawing, we threshold it to get a binary image and then perform a connected component analysis. If a region includes connected components whose sizes (either in the horizontal direction or in the vertical direction) are larger than a threshold, then that region

Page segmentation using texture analysis

751

(c) Fig. 5. (continued)

is classified as a line-drawing region, otherwise, a text region. The threshold size is empirically determined based on the size of the characters in the text. This simple heuristic (as well as all the processing done before) is invariant to rotation. Therefore, we can approximately locate areas of text, graphics and halftone in the input image

even if they are tilted or form irregular regions. The robustness of the whole system, however, depends on the postprocessing that is used to construct the bounding boxes surrounding different components of the image. The classification results obtained using the learned texture discrimination masks are not directly suitable

752

A.K. JAIN and Y. ZHONG

(a) Fig. 6. Segmentation using texture discrimination masks on image UW:S04NGRY (sixteen 7 x 7 masks). (a) input image (1080 x 780, lOOdpi);(b) three-class classification result; (c) final page layout. for ICR processing or image compression because of the presence of noise and spurious regions. The classification stage only transforms the image into a representation from which the desired blocks can be easily extracted. The postprocessing consists of two main steps. The first step removes small noisy elements and merges neighboring regions. The second step places

bounding boxes around the labeled image regions. A block diagram of the complete page segmentation algorithm is shown in Fig. 2. To speed up the processing without deteriorating the classification performance, we have applied all the postprocessing operations on a 512 x 380 (50dpi) subsampled segmented image.

Page segmentation using texture analysis

753

(b) Fig. 6. (continued)

The segmentation result using texture discrimination masks is first smoothed. Each pixel is assigned to the majority class among its 3 x 3 neighbors. This process eliminates the specklelike noise in the segmented image. In order to merge neighboring regions (such as text lines) and smooth the corners, a morphological closing

operation with a 1 x 3 structuring element is applied first, followed by an opening with a 2 x 2 square. The final postprocessing step is to remove small components halftone, text and line-drawing components with an area less than 10 pixels are labeled as background. The preceding steps ensure that the remaining connected components are

754

A.K. JAIN and Y. ZHONG

(c) Fig. 6. (continued)

sufficiently large coherent regions. Text and linedrawing regions are separated according to the size of their connected components, as described in the previous section. Finally, under the assumption that document images are made up of right rectangular blocks, we can now easily place bounding

boxes around different regions based on the refined segmentation results. The minimum and maximum vertical and horizontal coordinates for each region and used as the coordinates of the top-left an bottom-right corners of the bounding box.

Page segmentation using texture analysis

755

(a) Fig. 7. Segmentation using texture discrimination masks on an image from the ACM:Computinft Surveys (sixteen 7 x 7 masks). (a) input image (939 x 606, lOOdpi);(b) three-class classification result; (c) final page layout. 3. EXPERIMENTAL RESULTS

We have applied the texture discrimination masks-based segmentation algorithm to a number

of English and Chinese document images. These test images were either scanned using a Sharp JX-300 flat-bed scanner or taken from the University of Washington (UW) English document image

756

A.K. JAIN and Y. ZHONG

(b) Fig. 7. (continued) database/16) The images come from a variety of journals and magazines including the IEEE Transactions on Pattern Analysis and Machine Intelligence, Reviews of Geophysics, A C M Computing Surveys, and Reader's Digest (Chinese). Initial page seg-

mentation experiments based on the texture discrimination masks gave acceptable result for 100 dpi images. So, the resolution of the images from the U W image database was reduced from the original 300 dpi to 100 dpi to reduce the computational requirement.

Page segmentation using texture analysis

757

(c) Fig. 7. (continued) 3.1. English and Chinese document page segmentation In this subsection, we use the proposed algorithm to perform the four-class segmentation of English and Chinese document images. Here, both

English and Chinese texts are considered belonging to the same texture class. The algorithm partitions an input image into four types of distinct regions: text, line-drawing, halftone, and background.

758

A.K. JAIN and Y. ZHONG

(a) Fig. 8. Segmentation using texture discrimination masks on a Chinese document image from the Reader's Digest (sixteen 7 x 7 masks). (a) input image (760 x 496, 100 dpi); (b) three-class classification result; (c) final page layout. A collage of portions of scanned pages (100 dpi) from the IEEE Transactions on Pattern Analysis and Machine Intelligence [Fig. 3(a)] was used to train the texture discrimination masks in the neural network. The corresponding ground truth image of the three

classes (text and line-drawing, halftone, and background) is presented in Fig. 3(b). In the training session, 1,000,000 training configuration patterns, as illustrated in Fig. 1, were randomly selected with replacement from the training image in

Page segmentation using texture analysis

759

(b) Fig. 8. (continued)

Fig. 3. They are then fed to the neural network and the weights are modified using the back-propagation algorithm. We start the network with twenty nodes in the first and second hidden layers respectively, and three

nodes in the output layer. After node pruning, we kept sixteen nodes in the first hidden layer. This means that only 16 texture discrimination masks were used. Each of the mask, presumably, captures some discriminative

760

A.K. JAIN and Y. ZHONG

(c) Fig. 8. (continued)

feature of the page segmentation task. To better investigate the properties of the masks, we show in Fig. 4(a) the frequency responses of these 16 masks. For each input configuration as described in Fig. 1, we obtain

a 16-dimensional feature vector, which is the convolution outputs from the 16 masks. This vector encodes the characteristic and discriminative information at the configuration, i.e., in the neighborhood of the pixel

Page segmentation using texture analysis

Fig. 9. Segmentation using texture discrimination masks on a Chinese document image from the Reader's Digest (sixteen 7 x 7 masks). (a) input image (760 x 496, 100 dpi); (b) three-class classification result; (c) final page layout.

761

762

A.K. JAIN and Y. Z H O N G

(6) Fig, 9. (continued)

Page segmentation using texture analysis

(c) Fig. 9. (continued)

763

764

A.K. JAIN and Y. ZHONG

Fig. 10. Trainingimage for English/Chinesetext segmentation. (a) training image of size 332 x 264(100dpi); (b) ground truth image of the training image (two classes: Chinese text and English text).

of interest. The average feature vectors calculated from the training image (Fig. 3) for the three classes are shown in Fig. 4(b). The test images include both English and Chinese document images. Note that there is no overlap between the training image and the test images. Furthermore, the test images are different from the training images in scanning conditions, and sometimes, in language. The sixteen 7 x 7 texture discrimination masks which were learnt are used to segment the input document image into three classes: (i) halftone, (ii) background and (iii) text and line-drawing regions. The resulting text and line-drawing regions are then

thresholded at a gray level value of 150 to obtain the binary images of characters, line segments, and curves. Further segmentation of text and line-drawings is based on the sizes of the connected components in the region. The threshold for text/line-drawing separation, is determined empirically based on the distribution of the size of the connected components in the text and line-drawing regions in our document image database. We select the threshold value so that it minimizes the misclassification between the text and line-drawing regions. Ifa connected component in a region exceeds 46 pixels both horizontally and vertically at the 100 dpi resolution, then this region is labeled as line-drawing, otherwise

Page segmentation using texture analysis

765

(b) Fig. 10. (continued)

text. We must point out that this threshold assumes that the image has been binarized properly, so that no characters touch each other, and line-drawings are not broken because of the binarization. When these conditions do not hold, our separation method will not work. A few representative page segmentation results using masks trained by the neural network are presented in Figs 5 9. In part (a) of each figure, the input image is shown. The 3-class segmentation based on the texture discrimination masks is presented in part (b). Part (c) displays the resulting page layout with bounding boxes for each labeled region, where white, gray, and black bounding boxes are used to denote halftone,

line-drawing, and text regions, respectively. These results show that the final page segmentations are correct and consistent for the test images considered here, even though the test images are different from the training image in scanning conditions, source journal, fonts, and language. As the page classification results consist of solid blocks with small speckles, a classification based on appropriately subsampled pixels is adequate. Therefore, we only classify pixels in a 540 x 390 subsampled subimage of the original 1,080 x 780 image (100 dpi). The postprocessing steps are applied to the subsampled image as well. Excluding the training time for the neural network, it takes approximately

766

A.K. JAIN and Y. ZHONG

(a) Average Mask Responses of English/Chinese Text

~

hinese

~

E

--d

i

2

3

4

5

6

7

8

9

10 1'1 1'2 1'3 1'4 1'5

mask

(b) Fig. l l. Learned masks for language separation. (a) frequency responses of the fifteen masks; (b) average feature vectors for English and Chinese text using the set of learned masks.

35 s to obtain the 3-class classification using the neural network and another 25 to 50 s to label the text and line-drawing regions and compute the bounding boxes on a Sun Spare 20 workstation.

3.2. Language separation In Section 3.1, we view texts of different languages as belonging to the same texture class. However,

Page segmentation using texture analysis

~1<}t Cllai:i~e~ whei~

c~, .~ ~;,1

.:,.t~l:l,l,.e:~, a r c [cXltl:r:c

tll.ll.¢: . . . . 1: t l l S

assttml)tlO~/

~..:~.):v~r~,d w : [ i : ~

{.el.

767

l:~racLical

a'

~~i~i~

" : . ' ....... :

'....

~,.xpcrie,:nct'<~:' ..........

!h ~ w i~

u..,c e d

lhoal
{.h.o~:try.

s

~

l[llc.~,rnta.tiOll

:iS~ 7

SO1llettNLc~:~

CVel/

%~ .h.e:tl eS{ 1i3] {it !.|lg

(a) Fig. 12. English/Chinese text segmentation. (a) test image 1 of size 253 x 253 (lOOdpi); (b) segmentation result of image 1using fifteenlearned masks;(c) test image 2 of size 160 x 320 ( 100 dpi); (d) segmentation result of image 2 using fifteen learned masks. (Segmentation results shown here have been processed by a 5 x 5 median filter and removal of small connected components.)

languages may differ from each other by the shape of the individual characters, and the way they are grouped into words, words into sentences, etc. For example, Chinese text usually consists of columns of square characters, where each character is more complex than an English letter. Also, there is no spacing between Chinese characters, except at the punctuation marks. These properties make Chinese text distinguishable from English text using texture analysis. Since the texture discrimination masks are capable of learning the texture characteristics and differences in a specific application, we may be able to train a set of masks to capture the subtle differences between the English text and text of another language. Some initial experiments have been conducted to explore this possibility of language separation. The experimental results on English/Chinese texts segmentation are encouraging. We used the neural

network architecture in Fig. 1 to train a small number of masks which discriminate between the two languages. The training image and the corresponding two-class ground truth image are presented in Fig. 10. Again, we used 1,000,000 patterns (with replacement) in the training session. After training and node pruning, fifteen 7 x 7 masks were retained according to the node saliency and the classification error. The frequency responses of the masks and the average feature vector values for the two classes are visualized in Fig. 11. This set of fifteen learned masks is then applied to segment English text from Chinese text. Figure 12(a) is a test image which has a block of Chinese text nested inside the English text. The segmentation result is given in Fig 12(b). Figure 12(c) presents another image which contains text of both languages; the corresponding segmentation result is shown in Fig 12(d). Note that the output of the neural network classifier has

768

A.K. JAIN and Y. ZHONG

(b) Fig. 12. (continued)

been processed by a 5 x 5 median filter to smooth the 2-class segmentation result shown in Fig. 12(b). Small connected components are also removed to enhance the segmentation results. 4. CONCLUSIONSAND DISCUSSION We have presented a new method for page segmentation which partitions an image into regions containing text, halftone, line-drawing and background regions. In our approach, the page segmentation is regarded as a gray-level texture discrimination problem. A neural network is used to train a moderate number of texture discrimination masks which optimally discriminate the three texture categories (text + line-drawing, halftone, and background) These masks are then used to provide a robust and efficient classification of the textured regions. This approach is relatively insensitive to the types of languages presented in the document. Preliminary experimental results on both English and Chinese document pages from five different journals and magazines show that,

regardless of the language used in the document, text, halftone and line-drawing regions can be identified correctly. The page segmentation algorithm can also perform language separation, utilizing the textural difference between texts of different languages. Preliminary experimental results to segment Chinese text from English text are encouraging. The proposed texture-based page segmentation method is able to accommodate a variety of fonts, formats, and types of languages of document images from different journals. Although, we currently assume that the text skew is known a priori, it is not an inherent limitation of our approach since the segmentation is based on the texture information which can be considered as rotation invariant. The computational requirement is moderate. It takes approximately 70 seconds to segment an input image (1080 x 780). Note that the feed-forward neural network can either be implemented in hardware, or ported on commercially available accelerators to achieve "real-time" performance.

Page segmentation using texture analysis

769

(c)

(d) Fig. 12. (continued)

An objective and quantitative performance evaluation of a page segmentation system is necessary and important. Unfortunately, this is not an easy problem. (9,1v,18) Page segmentation is only the first step in a document image processing system, so the evaluation cannot be independent of the rest of the system, especially the ICR system which is used to recognize characters in the extracted text regions.(x9) It is important that the page segmentation algorithm be able to locate all the text regions in a document image and provide their correct reading order. So far, we have only done a qualitative evaluation of our page segmentation algorithms from this perspective. The proposed approach can correctly locate the text regions in our test images. In addition, our segmentation results have the following properties: (i) there is no horizontal merging of the two text columns; such a merging will pose a serious problem to an ICR system; (ii) vertically neighboring paragraphs are not merged into a larger

block unless the spacing between them is small. Even if such a merger takes place, it does not affect the performance of the ICR system since the reading order is still maintained; (iii) there is no merging between titles and paragraphs, captions and paragraphs, or between sections, so the logical structure of the page can be recovered easily; and (iv) the halftone regions and linedrawing regions are correctly identified. The authors would like to thank Ms Yao Chen for conducting some initial experiments. The authors are also grateful to Mr Kalle Karu for his help in programming and editing the manuscript. Acknowledgement

REFERENCES

1. J. L. Fisher, S. C. Hinds and D. P. D'Amato, A rule-based system for document image segmentation, Proc. lOth Int. Conf. Pattern Recognition (ICPR), 567 572, Atlantic City, NewJersey (June 1990).

770

A.K. ,lAIN and Y. Z H O N G

2. G. Nagy, S. Seth and M. Viswanathan, A prototype document image analysis system for technical journals, IEEE Comput. 25, 10-22 (July 1992). 3. Lawrence O'Gorman, The document spectrum for page layout analysis, IEEE Trans. Pattern. Anal. Mach. lntell. 15, 1162-1173 (November 1993). 4. T. Pavlidis and J. Zhou, Page segmentation and classification, CVGIP: Image Understanding 54, 484-486 (November 1992). 5. D. Wang and S. N. Srihari, Classification of newspaper image blocks using texture analysis, Comput. Vision Graphics Image Process. 20, 327 352 (1989). 6. K. Y. Wong, R. G. Casey and F. M. Wahl, Document Analysis System, I B M J. Res. Devel. 6, 642-656 (November 1982). 7. O. T. Akindele and A. Belaid, Page segmentation by segment tracing, Proc. 2nd Int. Conf. Document Anal. Recog. 91-94, Tsukuba Science City, Japan, (October 1993). 8. D.J. Ittner and H. S. Baird, Language-flee layout analysis, Proc. 2nd Int. Conf Document Anal. Recog. 336-340, Tsukuba Science City, Japan, (October 1993). 9. Robert M. Haralick, Document image understanding: geometric and logical layout, Proc. IEEE Comput. Soc. Conf. Comput. l~sion Pattern Recognition ( CVPR ), 385390, Seattle, Washington (June 1994). 10. M. Tuceryan and A. K. Jain, Texture analysis. In Handbook of Pattern Recognition and Computer Vitsion, C. H. Chen, L. F. Pau and P. S. P. Wang, eds, pp. 235 276, World Scientific Publishing (1994).

11. A.K. Jain and S. Bhattacharjee, Text segmentation using Gabor filters for automatic document processing, Mach. Vision Appl. 5, 169-184 (1992). 12. A. K. Jain and F. Farrokhnia, Unsupervised texture segmentation using Gabor filters, Pattern Recognition 24, 1167-1186 (1991). 13. A. K. Jain and Kalle Karu, Learning texture discrimination masks, IEEE Trans. Pattern Anal. Mach. Intell. (submitted). 14. J. Mao and A. K. Jain, Texture classification and segmentation using multiresolution simultaneous autregressive models, Pattern Recognition 25, 173-188 (1992). 15. F.M. Wahl, A new distance mapping and its use for shape measurement on binary image data, I B M Res. Report, RJ3438, San Jose, California (1982). 16. I. T. Phillips, S. Chen and R. M. Haralick, CD-ROM document database standard, Proc. 2nd lnt. Conf. Document Anal. Recog. 478-483, Tsukuba Science City, Japan (October 1993). 17. J. Kanai, T. A. Nartker, S. V. Rice and G. Nagy, Performance metrics for document understanding systems Proc. 2nd Int. Conf. Document Anal. Recog. 424-427, Tsukuba Science City, Japan, (October 1993). 18. S. Randriamasy and L. Vincent, Benchmarking page segmentation algorithms, Proc. IEEE Computer Soc. Conf. Comput. 14sign Pattern R ecog. ( C VP R ), 411-416, Seattle, Washington (June 1994). 19. Oivind Trier and A. K. ,lain, Goal-directed evaluation of binarization methods, IEEE Trans. Pattern Anal. Mach. lntell. 17, 1191-1201(December 1995).

About the Author--ANIL JAIN is a University Distinguished Professor in the Department of Computer

Science at Michigan State University. Dr .lain has made contributions in the following areas: statistical pattern recognition, exploratory pattern analysis, neural networks, Markov random fields, texture analysis, remote sensing, interpretation of range images, and 3D object recognition. He received the best paper awards in 1987 and 1991, and received certificates for outstanding contributions in 1976, 1979 and 1992 from the Pattern Recognition Society. Dr ,lain served as the Editor-in-Chief of the IEEE Transactions on Pattern Analysis and Machine Intelligence (1991-1994), and currently serves on the editorial boards of Pattern Recognition journal, Pattern Recognition Letters, Journal of Mathematical Imaging, Journal of Applied Intelligence and IEEE Transactions on Neural Networks. He is the co-author of the book Algorithms for Clustering Data (Prentice-Hall,,1988), has edited the book Real-Time Object Measurement and Classification (Springer-Verlag, 1988), and has co-edited the books, Analysis and Interpretation of Range Images (SpringerVerlag, 1989), Neural Networks and Statistical Pattern Recognition (North-Holland, 1991), Markov Random Fields: Theory and Applications (Academic Press, 1993), and 3D Object Recognition (Elsevier, 1993). Dr,lain is a Fellow of the IEEE and is currently serving as a speaker in the IEEE Computer Society Distinguished Visitors Program for Asia-Pacific.

About the Author--YU Z H O N G received the B. S. and M. S. degrees in computer science and engineering

from Zhejiang University, Hangzhou, China in 1988 and 1991, the M. S. degree in statistics from Simon Fraser University, Burnaby, Canada in 1993. She is currently a doctoral student at the Pattern Recognition and Image Processing group of Michigan State University. Her research interests include image processing and machine vision.