Document image binarization based on topographic analysis using a water flow model

Document image binarization based on topographic analysis using a water flow model

Pattern Recognition 35 (2002) 265}277 Document image binarization based on topographic analysis using a water #ow model夽 In-Kwon Kim, Dong-Wook Jung,...

989KB Sizes 0 Downloads 74 Views

Pattern Recognition 35 (2002) 265}277

Document image binarization based on topographic analysis using a water #ow model夽 In-Kwon Kim, Dong-Wook Jung, Rae-Hong Park* Department of Electronic Engineering, Sogang University C.P.O. Box 1142, Seoul 100-611, South Korea Received 7 April 2000; accepted 18 December 2000

Abstract This paper proposes a local adaptive thresholding method based on a water #ow model, in which an image surface is considered as a three-dimensional (3-D) terrain. To extract characters from backgrounds, we pour water onto the terrain surface. Water #ows down to the lower regions of the terrain and "lls valleys. Then, the thresholding process is applied to the amount of "lled water for character extraction, in which the proposed thresholding method is applied to gray level document images consisting of characters and backgrounds. The proposed method based on a water #ow model shows the property of locally adaptive thresholding. Computer simulation with synthetic and real document images shows that the proposed method yields e!ective adaptive thresholding results for binarization of document images.  2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Water #ow model; Local adaptive thresholding; Binarization; Document image

1. Introduction Segmentation subdivides an image into a number of parts or objects. Segmentation algorithms for binarization are based on discontinuity or similarity of gray values [1}6]. Algorithms using discontinuity segment an image based on abrupt changes in gray level, whereas algorithms using similarity are based on thresholding [2}5], region growing, and region splitting and merging. Edge detection [7}10] is one of the techniques to detect discontinuity, and thresholding is one of the important approaches to image segmentation. Segmentation algorithms using the watershed concept have been presented [11}16]. Watershed algorithms are based on the behavior of water that always #ows down to lower regions and the watershed divides regions based on

夽 This work was supported in part by the Ministry of Information and Communication, Korea. * Corresponding author. Tel.: #82-2-705-8463; fax: #82-2705-8629. E-mail address: [email protected] (R.-H. Park).

the minima that water approaches. This paper proposes the binarization method of document images based on a water #ow model, in which an document image is divided into two regions, characters and backgrounds, by thresholding the amount of "lled water. Binarization of gray level images is a special case of segmentation with two labels [17}20]. For example, to extract characters from document images for character recognition, binarization of an input image is needed [19]. In document images, two distinct regions are de"ned as characters and backgrounds. Characters are objects that we desire to extract, recognize, and represent. The remaining regions are backgrounds of these objects. For most cases, digital images are obtained by a scanner or a camera. The quality of the digital images depends on image acquisition conditions such as illumination, characteristics of an input device, discriminating characteristics of objects, and so on Ref. [19]. In some applications, the obtained gray level image is converted to a binary image for the next main processing step such as image analysis or recognition [21}23]. The preprocessing, binarization, determines the success of the main processing step to some extent. In most cases, preprocessing and main processing steps are processed

0031-3203/01/$20.00  2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 1 ) 0 0 0 2 7 - 9

266

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

independently, where it is di$cult to evaluate the e!ectiveness of preprocessing. In this paper, character extraction is applied to our binarization results to compare the performance of several binarization methods. The proposed thresholding method shows a locally adaptive characteristic [24]. To mimic the behavior of water #ow, a gray level image is treated as a threedimensional (3-D) terrain that is composed of mountains and valleys, where the mountains represent background regions and the valleys correspond to the character regions. If water falls onto the terrain, it travels down on a terrain and "lls valleys. Then we can extract the valleys of a terrain by the amount of "lled water, where the amount of "lled water is de"ned by the di!erence between the water-"lled terrain and the original terrain. Valleys in a terrain with large amount of "lled water are extracted as the objects. The proposed water #ow model is applied to extract characters in a document image. The rest of the paper is structured as follows. In Section 2, the proposed thresholding method based on a water #ow model is presented and its properties are shown by computer simulations with synthetic images. Simulation results of the proposed method applied to real documents and the performance comparison with other binarization methods are shown in Section 3. Finally conclusions are given in Section 4.

2. Proposed local thresholding based on a water 6ow model The proposed method is based on the property of water that always #ows down to lower regions. In gray level images, we assume that lower gray levels (darker regions) denote characters whereas higher gray levels (brighter regions) represent backgrounds, or vice versa. The proposed method consists of two main processes: enhancement process and thresholding process. The former extracts the local characteristic of an input gray level image by simulating the behavior of rainfall. Rainfall is the process to "ll water in lower regions and it makes a lot of ponds on the terrain. In equilibrium, the regions of "lled water correspond to local valleys, whereas the regions of dried plain are backgrounds. After the water #owing process, the "nal result is obtained by thresholding the amount of "lled water. Because the amount of "lled water represents the local characteristic of an original terrain, the proposed method yields locally adaptive thresholding results. In thresholding the amount of "lled water, for simplicity, Otsu's algorithm [2], a nonparametric and unsupervised automatic threshold selection method, is used to automatically select an optimal threshold value. The threshold value selected by Otsu's method for segmentation can be considered as the amount of vaporization, in the sense that after vaporization by the amount

of the threshold value, the regions with remaining water are regarded as objects. In Otsu's method [2], the optimal threshold kH is selected so that it maximizes the variance  between two histogram segments, which is @ de"ned by "a a (m !m ), (1) @     where m and m represent the means of segments 1 and   2, respectively. The ratio a of the area of segment j to the H total area is given by a " p , j"1, 2, (2) H G GZ!H where p denotes the normalized histogram of gray level i, G i.e., '\ p "1, with I representing the total number of G G gray levels. C (C ) represents the set where the pixel   value is less than or equal to (greater than) the threshold k. Note that the segment mean m is given by H (3) m " i ) p /a , j"1, 2. G H H GZ!H The optimal threshold value kH is determined by searching for the peak of . The criterion measure proposed by @ Otsu is equivalent to the uniformity measure [25]. If the water level over#ows the maximum height f of RMN the terrain, the thresholding result of the proposed method is equivalent to that of Otsu's method, except for the inversion of a terrain. Inversion of a terrain f (x, y) gives f !f (x, y), resulting in region exchange of characRMN ters and backgrounds. Because the amount of "lled water is related to the inverted version of the original terrain, the amount of water w needed to over#ow the top of the  terrain is given by w " 

+\ ,\ ( f !f (x, y)) V W RMN , M)N

(4)

where M and N represent the horizontal and vertical maximum distances of a terrain, respectively, and U qV denotes the smallest integer larger than or equal to q. For simplicity, w is regarded as an integer, 0)w )f .   RMN Thus Otsu's method can be considered as a special case of the proposed method with w"w and inversion of  a terrain. The "nal results depend on w, which is determined experimentally. 2.1. Water yow model To extract characters, we "rst pour water onto an image surface which is considered as a 3-D terrain. Then water #ows down to valleys representing object regions, increasing the height of water at valleys. Water does not "ll plains that represent the relatively smooth regions, further running down to neighboring lower regions. In the proposed thresholding method, the amount of "lled water, i.e., the di!erence between an original terrain and

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

267

Fig. 2. Algorithm of the proposed water #ow model.

Fig. 1. Flowchart of the proposed binarization method.

a water-"lled terrain is thresholded by a global thresholding method such as Otsu's method [2]. Other global thresholding techniques [4,5] can also be employed. Fig. 1 shows the #owchart of the proposed thresholding method, where the threshold value of parameter w is selected experimentally. Each pond connected with a water-"lled pixel is labeled and its average water level is computed. Then pond's water level is assigned to the average water level. After this process, a thresholding step is followed depending on the amount of "lled water. If the amount of water, w, 0(w(w , is sprinkled on the  terrain, the water #ows into the valleys which correspond to low gray level regions in the document images. In consequence of water #owing, each water "lled pond is generated. This water "lled ponds are merged with one segment or small ponds are extinguished depending on

the amount of "lled water and location of each pond. Then labeling process is applied to the generated ponds. After the labeling process is completed, the average water level of each pond is calculated and assigned to the average value. In this way, the character regions of the low gray level are raised, and the gray levels of the original image are subtracted from the gray levels of the water "lled image. Then the result re#ects local adaptive characteristics in the document image. Finally, the locally adaptive thresholding results realized by Otsu's thresholding method is applied to water "lled ponds. Fig. 2 describes the proposed algorithm based on the water #ow model. If a drop of water falls down at (x, y) of an image, the height of the terrain f (x, y) is compared with the height f (x#i, y#j),!s)i, j)s, of neighboring positions, where the size of a mask is set to (2s#1);(2s#1). The position (m, n) in the mask with the lowest height is selected as the next position to which water #ows. This process is repeated by moving the center position of a mask to (m, n), until water reaches the position with the locally lowest height value. For each mask, (2s#1);(2s#1) comparisons are needed and for each drop (or pixel), one addition is needed to generate a water drop. The proposed method requires NM[(2s#1)D#w] additions, where one comparison is counted as one addition, D denotes the average number of mask operations, and N;M represents the image size. Fig. 3 shows an example of the search process. If a drop of water falls down to position 1, mask A "nds the lowest gray level around position 1. If position 2 is the position having the lowest gray level in mask A, a drop of water #ows to position 2 and new mask B detects the new position with the lowest gray level in the mask. This process is repeated until the center position of the mask has the lowest gray level. In Fig. 3, position 4 has the lowest gray level in mask D, thus dropped water at position 1 is accumulated at position 4. Then the water drop "lls lower terrain at (m, n), i.e., f (m, n)"f (m, n)#1,

268

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

Fig. 3. Searching process of the lowest gray level (s"2).

where f (m, n) represents a water-"lled terrain. This procedure is applied w times to all pixels of an image by sequential scanning, where the parameter w represents the amount of rainfall. If a central position of a mask is surrounded by a small hill, then water will tunnel under the hill. Thus, as s increases, the e!ect by small noise decreases. However if s is too large, small object signals cannot be detected, and it takes time to mimic the behavior of water #ow. The amount of rainfall depends on the distributions of the gray level in the document image. If the variance of intensity is large, w becomes large. In the proposed thresholding method, the amount of rainfall, w, and the parameter of mask size, s, are determined experimentally. 2.2. Binarization based on a water yow model The performance of the proposed thresholding algorithm is tested with two synthetic images expressed as



S (x, y)"Aabs 



    

r r ! #B cos ; <

r r S (x, y)"A ! #B#C cos  ; <

,

,

(5)

(6)

where S (x, y) and S (x, y) represent synthetic images   1 and 2, respectively. Constants ; and < determine the terrain characteristics and r is equal to x#y, with (0,0) corresponding to a top left point. Constants A, B, and C are selected to adjust the gray level in the range between 0 and 255, i.e., I"256. Fig. 4 shows two synthetic images, where A"255, B"1.0, ;"1.5;10, and <"5;10 are used to generate S (x, y), whereas 

Fig. 4. Synthetic images for binarization. (a) Synthetic image 1 (S ), (b) synthetic image 2 (S ).  

A"127, B"1.5, C"0.5, ;"9;10, and <"2;10 are used to obtain S (x, y).  White lines superimposed in Fig. 4 denote the diagonal lines represented by (x, x), along which terrain pro"les are shown in Fig. 5. Solid lines in Figs. 5(a) and (b) show pro"les of original terrains of synthetic images 1 and 2 along the diagonal direction, i.e., S (x, x) and S (x, x),   respectively. Results of water #ow with w"10 are also shown in Fig. 5, where the horizontal line between the valleys represents the water-"lled terrain with the amount of "lled water at valleys being represented by dotted regions. The water-"lled terrain is obtained by postprocessing that makes the water level #at. The di!erence between the horizontal line and solid line corresponds to the amount of "lled water, and it has a large value in valley regions. For S (x, y), to detect all  valleys, a threshold value must be set as small as possible. But with too small threshold values, the region of the

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

269

method except for exchange of objects and backgrounds. For S (x, y), the result with w"1 shows too thin valleys  as shown in Fig. 6(a). Similarly, for S (x, y), the thre sholding results with w"1 shows comparatively thin valleys as shown in Fig. 6(d). But with w"5, all valleys of S (x, y) (S (x, y)) are well extracted by the proposed algo  rithm as shown in Figs. 6(b) (6(e)), whereas valleys at large r are not properly detected by Otsu's global thresholding algorithm (see Figs. 6(c) and (f )). Large w yields thick detected valleys. If we select a proper w value for each input gray level image, we can e!ectively convert it to a binary one even if gray level varies locally.

3. Simulation results and discussions

Fig. 5. Pro"les of the original terrain and water "lled terrain of synthetic images S and S (w"10). (a) S , (b) S .    

deepest valley at small r is extracted too thin or disconnected. On the other hand, if the threshold value is too high, valleys at large r are merged into a single segment. For S (x, y), positions of detected valleys vary depending  on a threshold value. If the threshold value is too large, valleys at large r are merged into a single segment. On the other hand, if a threshold value is too small, valleys at small r are merged to a single segment. Thus global thresholding methods using a single threshold value over an entire image cannot e!ectively detect local valleys of these two synthetic images. From simulation results, it is observed that most of dropped water #ows down to deep valleys. We can detect all valleys by thresholding the amount of "lled water instead of thresholding an original terrain value. Note that "lled water represents the local property of an original terrain. Figs. 6(a), (b) and (c) show "nal results of the proposed thresholding method with w equal to 1, 5, and w , respec tively, for S (x, y), whereas Figs. 6(d), (e) and (f ) for  S (x, y). The parameter value w is equal to 140 and 125   for S (x, y) and S (x, y), respectively. The threshold   values for Figs. 6(a), (b) and (c) are 8, 18 and 140, whereas those for Figs. 6(d), (e) and (f) are 4, 11 and 122, respectively, where the optimal threshold value kH is chosen by Otsu's method. Figs. 6(c) and (f) show the results of over#ow cases, which are the same as those of Otsu's

For experiments, input document images are obtained by a GT-9500 scanner with 200}300 dots per inch(dpi). The 256;256 test images are quantized to eight bits to preserve intensity information. We simulate experiments on a Pentium II 400 MHz using programs written in C language. To illustrate the e$ciency of the proposed algorithm, we experiment with several document images, some of which are shown in Fig. 7. In Fig. 7(a), the background has a wide range of gray level. Especially, owing to upper left backgrounds, it is di$cult to separate characters from the background. The histogram of the document image in Fig. 7(a) is shown in Fig. 9(a), in which gray level is widely spread because of complex backgrounds. Fig. 7(b) shows the test image 2 containing uniform backgrounds and undesirable graphic regions which are complex for character extraction. The histogram of Fig. 7(b), shown in Fig. 10(a), indicates that both characters and graphic regions give a small peak, thus separation of characters and graphic regions is di$cult. Fig. 7(c) shows the test image 3 containing the alternating background levels along each character region and uneven gray level distribution. The histogram of Fig. 7(c) is shown in Fig. 11(a), in which gray level is widely distributed on the whole. For e$cient binarization of document images, we propose a locally adaptive thresholding method based on a water #ow model. The performance of the proposed method is compared with that of four binarization methods: one global and three local thresholding methods. In Otsu's method, the global threshold is determined by histogram analysis [2]. Note that global thresholding methods derive a single threshold value for the whole image. Three local thresholding methods are tested for performance comparison. Local thresholding methods set a threshold value for each pixel on the basis of the information of neighboring pixels. In Nakagawa and Rosenfeld's method, the original image is partitioned into a number of smaller subimages [3]. In Niblack's method, decision of threshold values varies over the

270

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

Fig. 6. Binarization results of the proposed method for synthetic images S and S . (a) w"1 (kH"8, S ), (b) w"5 (kH"18, S ),     (c) w"w "140 (kH"140, S ), (d) w"1 (kH"4, S ), (e) w"5 (kH"11, S ), (f) w"w "125 (kH"122, S ).      

Fig. 7. Real test images for binarization. (a) Test image 1 (¹ ), (b) test image 2 (¹ ), (c) test image 3 (¹ ).   

image, which is based on the local mean and standard deviation of the intensity [26,27]. A postprocessing step in Yanowitz and Bruckstein's method can be incorporated into Niblack's method, in which the postprocessing step is used to improve the quality of binary images by removing `ghosta objects [28]. For Liu and Srihari's method, the threshold value of document images is determined by text features and the run-length histogram of binary images [22]. For performance evaluation of binarization results, two methods are used. The results of several binarization

methods are evaluated using visual criteria [23]. We extract characters from binarized images and use the character extraction rate as a quantitative measure [29]. 3.1. Visual criteria We use visual criteria for performance evaluation of each binary image obtained by "ve di!erent thresholding methods. The evaluation process is a blind test. The results of binarized images are evaluated using the

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

271

following four criteria [23]: (1) Broken text: Existence of undesirable gaps in text. Small gaps are given high scores. (2) Blurring of text: Low rate of blurring is desirable. (3) Loss of complete text: A large number of losses are not desirable. (4) Noise in background area: A small number of false objects is desirable. Based on the above criteria, each item is graded from 1 to 5 points, with 5 being the highest. With four items, the maximum score of each binarized image is 20 points, and the maximum total score for each method with ten images is 200 points. To apply the proposed thresholding method based on a water #ow model, we need the preprocessing step, in which we sprinkle the document image with rainfall. In that case, water #ows into the local valley having lower gray level in the document image. Fig. 8(a) shows the pro"les of the original terrain and water-"lled terrain corresponding to the vertical white line in superimposed Fig. 7(a). Even if the change of gray level is large, water #ows into the text region. After the preprocessing step, regions of "lled water correspond to local valleys, i.e., text regions, whereas regions of dried plain are backgrounds. In this case, an excess of dropped water over#ows into the background because of small di!erence of gray level between the text and the background. We experimentally set the amount of rainfall parameter w to 10 and the mask parameter s set to three. Fig. 8(b) shows the pro"les of the original terrain and water-"lled terrain corresponding to the vertical white line in Fig. 7(b). In Fig. 7(b), on the whole the background does not have "lled water whereas text and graphic regions have one. Because the text region has lower gray level than the graphic region, a large amount of water is "lled in the text region. Experimentally selected parameters are used for a water #ow model: w"20 and s"3. Fig. 8(c) shows the pro"les of the original terrain and water-"lled terrain corresponding to white vertical line of Fig. 7(c). In Fig. 8(c), water "lls text regions and partial backgrounds of lower gray level, where we experimentally set the amount of rainfall parameter w to 20 and the mask parameter s set to three. The "rst experimental results for Fig. 7(a) are shown in Fig. 9. Fig. 9(a) shows the histogram of Fig. 7(a). Fig. 9(b) shows the result by Otsu's thresholding algorithm, global thresholding method, in which the performance is not satisfactory as we expected. Fig. 9(c) shows the result by Nakagawa and Rosenfeld's method, in which many false objects are detected at the upper left corner. Fig. 9(d) shows the result by Niblack's method with postprocessing, where considerably improved results are obtained compared with Fig. 9(c). Fig. 9(e) shows the result by Liu and Srihari's method, in which many false objects are detected at the upper left corner as in Fig. 9(c). Fig. 9(f)

Fig. 8. Pro"les of the original terrain and water "lled terrain of test images ¹ , ¹ , and ¹ (s"3). (a) ¹ (w"10), (b) ¹      (w"20), (c) ¹ (w"20). 

shows the result by the proposed local thresholding method based on a water #ow. In the upper left region in Fig. 7(a), water searches for lower regions and #ows into the text regions as stated above. The thresholding process is based on the amount of "lled water. Note that the proposed algorithm yields the best result. The second experimental results for Fig. 7(b) are shown in Fig. 10. Although distinct di!erences in gray level are observed between the text and the background, graphic regions coexist in backgrounds, so it is hard to separate the text from graphic regions. Fig. 10(a) shows the histogram of Fig. 7(b). Figs. 10(b) and (c) show the

272

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

Fig. 9. Binarization results of Fig. 7(a). (a) Histogram, (b) Otsu's method, (c) Nakagawa and Rosenfeld's method, (d) Niblack's method with postprocessing, (e) Liu and Srihari's method, (f) proposed method.

results by Otsu's and Nakagawa and Rosenfeld's methods, respectively, in which the text coexists with graphic regions. Fig. 10(d) shows the result by Niblack's method with postprocessing, in which an improved result is observed, compared with Figs. 10(b) and (c). However edges of graphic regions are still observable after thresholding. Fig. 10(e) shows the result by Liu and Srihari's method, in which most of the text is not preserved. Fig. 10(f) shows the result by the proposed thresholding method based on a water #ow model. The text is preserved and the undesirable graphic region is removed also because water #ows into regions with the lower gray level region, where text and graphic regions coexist, with the threshold value set to the amount of relative "lled water. In this case, the proposed algorithm yields the best result. The third experimental results for Fig. 7(c) are shown in Fig. 11. Fig. 11(a) shows the histogram of Fig. 7(c). Fig. 11(b) shows the result by Otsu's method, in which because of the gray level similarity between text and alternating background level along each character region, the global thresholding method gives bad result. Fig. 11(c) shows the result by Nakagawa and Rosenfeld's method, in which an improved result is observed, compared with Fig. 11(b), with the background still coexisting in the text region. Fig. 11(d) shows the result by Niblack's method with postprocessing, where good result

is obtained except for small broken characters. Fig. 11(e) shows the result by Liu and Srihari's method, in which the undesirable background is removed and a large number of broken characters are obtained. Fig. 11(f ) shows the result of the proposed method. The proposed method eliminates the undesirable background and preserves well text regions because of e!ective thresholding based on the amount of "lled water. The text regions contain the relatively large amount of "lled water compared with that of dark background area, extracting objects of a binarized image. Noise around text depends on the statistical characteristics of document images. For evaluating the performance of various segmentation methods, visual criteria are used. Table 1 shows the performance comparison of various binarization methods in terms of the quantitative measure. Each score denotes the sum of scores based on four visual criteria. For evaluating the score, we show the images obtained by di!erent binarization methods in any arbitrary order, and ten students score relative points. For each criterion, scores of 10 students are averaged. In Table 1, the scores of the test images ¹ , ¹ , and ¹ are obtained from    Figs. 9, 10, and 11, respectively. As shown in Table 1, the proposed local thresholding method yields the highest score. Using the property that water tries to "nd its level, the text and the background can be successfully separated by the proposed method.

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

273

Fig. 10. Binarization results of Fig. 7(b). (a) Histogram, (b) Otsu's method, (c) Nakagawa and Rosenfeld's method, (d) Niblack's method with postprocessing, (e) Liu and Srihari's method, (f) proposed method.

Fig. 11. Binarization results of Fig. 7(c). (a) Histogram, (b) Otsu's method, (c) Nakagawa and Rosenfeld's method, (d) Niblack's method with postprocessing, (e) Liu and Srihari's method, (f) proposed method.

274

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

Table 1 Scores for quantitative performance comparison of each binarization method Test image

¹  ¹  ¹  ¹  ¹  ¹  ¹  ¹  ¹  ¹  Total score

Method M 

M 

M 

M 

M 

14 14 14 15 19 15 15 19 17 14

14 14 15 16 19 15 15 19 17 15

15 15 16 16 17 16 16 18 17 16

14 12 14 17 19 14 16 20 17 14

17 16 16 18 19 15 17 20 18 16

156

159

162

157

172

M : Otsu's method, M : Nakagawa and Rosenfeld's method,   M : Niblack's method with postprocessing, M : Liu and   Srihari's method, M : proposed method. 

The amount of "lled water in the proposed method depends on w, s, and shape of a terrain. As s increases, the e!ect by small noise decreases. For large s, small object signals may be missed, requiring a high computational load to mimic the behavior of water #ow.

3.2. Character extraction As another performance evaluation, we use the character extraction rate for binarized document images obtained by "ve di!erent binarization methods. This character extraction process separates a graphic image part from a text region by considering chain codes of connected components [29]. We compare the performance of various methods in terms of the character extraction rate, in which the character extraction rate is de"ned by the ratio of the number of correctly extracted characters to the total number of characters considered. We obtain the character extraction rate with ten test document images after the extraction process. The extracted characters are surrounded by boxes. The character extraction results of Fig. 9 are shown in Fig. 12. We cannot extract characters from Fig. 9(b) due to severe loss in the text area, thus Fig. 9(b) is not considered in character extraction experiments. Figs. 12(a), (b) and (c) are obtained from Figs. 9(c), (d), and (e), respectively. Because of the false objects in the text area, the character extraction rate are not high. The character extraction result of Fig. 9(f ) is shown in Fig. 12(d), in which the character extraction rate is higher than that in Figs. 12(a)}(c), because most of the undesirable background is eliminated by the proposed locally adaptive thresholding method. The character extraction results of Fig. 10 are shown in Fig. 13. Figs. 13(a) and (b) are obtained from Figs. 10(b)

Fig. 12. Character extraction results. (a) Character extraction from Fig. 9(c), (b) character extraction from Fig. 9(d), (c) character extraction from Fig. 9(e), (d) character extraction from Fig. 9(f).

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

275

Fig. 13. Character extraction results. (a) Character extraction from Fig. 10(b), (b) character extraction from Fig. 10(c), (c) character extraction from Fig. 10(d), (d) character extraction from Fig. 10(e), (e) character extraction from Fig. 10(f).

Fig. 14. Character extraction results. (a) Character extraction from Fig. 11(b), (b) character extraction from Fig. 11(c), (c) character extraction from Fig. 11(d), (d) character extraction from Fig. 11(e), (e) character extraction from Fig. 11(f ).

276

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277

Table 2 Character extraction rate for binarized document images obtained by each binarization method (%) Test image

Method M 

M 

M 

M 

M 

¹  ¹  ¹  ¹  ¹  ¹  ¹  ¹  ¹  ¹ 

* 39 62 87 84 33 76 97 91 36

67 43 65 89 86 35 76 97 87 38

70 51 93 86 86 63 77 92 82 77

66 19 44 85 83 11 69 98 74 21

86 64 89 90 85 70 85 98 88 37

Average rate

67

68

78

57

79

M : Otsu's method, M : Nakagawa and Rosenfeld's method,   M : Niblack's method with postprocessing, M : Liu and   Srihari's method, M : proposed method. 

and (c), respectively. The character extraction results are not satisfactory because graphic regions are still observed in Figs. 10(b) and (c). Fig. 13(c) is obtained from Fig. 10(d). Because of edges in the graphic regions, it is di$cult to determine which edges of the graphic regions are regarded as the text. In consequence, a large number of incorrectly extracted characters are obtained. In Fig. 13(d), the character extraction result of Fig. 10(e), a large number of broken characters are observed. Fig. 13(e) represents the character extraction result of Fig. 10(f ). Due to small character size, a small number of spots are obtained, resulting in di$culty in character extraction. Except for the spots, a good extraction result is observed with the highest character extraction rate. The character extraction results of Fig. 11 are shown in Fig. 14. Figs. 14(a) and (b) show character extraction results of Figs. 11(b) and (c), respectively. Text information is lost due to the dark repetitious background. Fig. 14(c) shows the character extraction result of Fig. 11(d), which yields the highest character extraction rate. Fig. 14(d) shows the character extraction result of Fig. 11(e), in which several parts of texts are incorrectly extracted. The character extraction result of Fig. 11(f ) is shown in Fig. 14(e), which shows results similar to those in Fig. 14(c), except for a few text parts. Additional experimental results of character extraction are listed in Table 2, in which the character extraction rate is obtained for 10 test document images. In Table 2, the character extraction ratios of the test images ¹ , ¹ ,   and ¹ are obtained from Figs. 12, 13 and 14, respect ively. On the whole the proposed method gives better result than other conventional methods considered.

4. Conclusions This paper presents a water #ow approach to thresholding of a document image, where an image is regarded as a 3-D terrain and its local property is characterized by a water #ow model. The water #ow model locally detects the valleys corresponding to regions that are lower than neighboring regions. The deep valleys are "lled with dropped water whereas the smooth plain regions keep up dry. This physical property of water can be applied to e!ective extraction of objects such as characters, tools, and so on. Experimental results with various tests show that the proposed approach is e!ective for dealing with document images, especially showing local or uneven illuminations. Further research will focus on the investigation of selection of the optimal amount of water.

References [1] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Addison-Wesley, New York, 1992. [2] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans. Systems Man Cybernet. 9 (1) (1979) 62}66. [3] Y. Nakagawa, A. Rosenfeld, Some experiments on variable thresholding, Pattern Recognition 11 (1979) 191}204. [4] S.U. Lee, S.Y. Chung, R.-H. Park, A comparative performance study of several global thresholding techniques for segmentation, Comput. Vision Graphics Image Process. 52 (2) (1990) 171}190. [5] P.K. Sahoo, S. Soltani, A.K.C. Wong, Y.C. Chen, A survey of thresholding techniques, Comput. Vision Graphics Image Process. 41 (2) (1988) 233}260. [6] T. Pavlidis, Y.-T. Liow, Integrating region growing and edge detection, IEEE Trans. Pattern Anal. Mach. Intell. 12 (3) (1990) 225}233. [7] G.S. Robinson, Edge detection by compass gradient masks, Comput. Vision Graphics Image Process. 6 (5) (1977) 492}501. [8] P.V. Henstock, D.M. Chelberg, Automatic gradient threshold determination for edge detection, IEEE Trans. Image Process. 5 (5) (1996) 784}787. [9] P. Perona, J. Malik, Scale-space and edge detection using anisotropic di!usion, IEEE Trans. Pattern Anal. Mach. Intell. 12 (7) (1990) 629}639. [10] D.J. Park, K.M. Nam, R.-H. Park, Edge detection in noisy images based on the co-occurrence matrix, Pattern Recognition 27 (6) (1994) 765}775. [11] L. Vincent, P. Soille, Watersheds in digital space: an e$cient algorithm based on immersion simulations, IEEE Trans. Pattern Anal. Mach. Intell. 13 (6) (1991) 583}598. [12] H. Ancin, T.E. Dufresne, G.M. Ridder, J.N. Turner, B. Roysam, An improved watershed algorithm for counting objects in noisy, anisotropic 3-D biological images, Proceedings of ICIP 95, Washington DC, USA, 1995, pp. 172}175.

I.-K. Kim et al. / Pattern Recognition 35 (2002) 265}277 [13] T. GeH raud, J.-F. Mangin, I. Bloch, H. Mai) tre, Segmenting internal structures in 3D MR images of the brain by Markovian relaxation on a watershed based adjacency graph, Proceedings of ICIP 95, Washington DC, USA, 1995, pp. 548}551. [14] M. Baccar, L.A. Gee, R.C. Gonzalez, A. Abidi, Segmentation of range images via data fusion and morphological watersheds, Pattern Recognition 29 (10) (1996) 1673}1687. [15] P.T. Jackway, Gradient watersheds in morphological scalespace, IEEE Trans. Image Process. 5 (6) (1996) 913}921. [16] L. Najaman, M. Schmitt, Geodesic saliency of watershed contours and hierarchical segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 18 (12) (1996) 1163}1173. [17] C.K. Lee, S.P. Wong, Morphological approach for thresholding noisy images, in: Proceedings of SPIE, Visual Communications and Image Processing, Vol. 2501, Taipei, Taiwan, 1995, pp. 1773}1784, SPIE, Bellingham, WA, USA. [18] N. Papamarkos, B. Gatos, A new approach for multilevel threshold selection, Graphical Models Image Process. 56 (5) (1994) 357}370. [19] L. O'Gorman, Binarization and multithresholding of document images using connectivity, Graphical Models Image Process. 56 (6) (1994) 494}506. [20] A. Beghdadi, A.L. NeH grate, P.V. De Lesegno, Entropic thresholding using a block source model, Graphical Models Image Process. 57 (3) (1995) 197}205. [21] ".D. Trier, T. Taxt, A.K. Jain, Recognition of digits in hydrographic maps-binary versus topographic analysis,

[22]

[23]

[24]

[25]

[26] [27]

[28]

[29]

277

IEEE Trans. Pattern Anal. Mach. Intell. 19 (4) (1997) 399}404. Y. Liu, S.N. Srihari, Document image binarization based on texture features, IEEE Trans. Pattern Anal. Mach. Intell. 19 (5) (1997) 540}544. ".D. Trier, T. Taxt, Evaluation of binarization methods for document images, IEEE Trans. Pattern Anal. Mach. Intell. 17 (3) (1995) 312}315. I.K. Kim, R.-H. Park, Local adaptive thresholding based on a water #ow model, Proceedings of Second JapanKorea Joint Workshop on Computer Vision, Takamatsu, Japan, 1996, pp. 21}27. W.S. Ng, C.K. Lee, Comment on using the uniformity measure for performance measure in image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 18 (9) (1996) 933}934. W. Niblack, An Introduction to Digital Image Processing, Prentice-Hall, New Jersey, 1986. ".D. Trier, A.K. Jain, Goal-directed evaluation of binarization methods, IEEE Trans. Pattern Anal. Mach. Intell. 17 (12) (1995) 1191}1201. S.D. Yanowitz, A.M. Bruckstein, A new method for image segmentation, Comput. Vision Graphics Image Process. 46 (1) (1989) 82}95. D.-G. Sim, Y.K. Ham, I.K. Kim, R.-H. Park, Analysis of mixed Korean document using the branch and bound algorithm based on DP matching, Comput. Vision Image Understanding 71 (3) (1998) 373}384.

About the Author*IN-KWON KIM received the B.S. degree in Physics and the M.S. and Ph.D. degrees in Electronic Engineering from Sogang University, Seoul, Korea, in 1992, 1994, and 1998, respectively. He was a senior research engineer with Hyundai Electronics Industry Co., Ltd. from 1998 to 2000. He is currently a senior research engineer with Varovision Co., Ltd. His current research interests are image coding and video communication. About the Author*DONG-WOOK JUNG was born in Seoul, Korea, in 1971. He received the B.S. degree in Electronic Engineering from Sogang University, Seoul, Korea, in 1999. He is currently working towards his M.S. degree in Electronic Engineering from Sogang University, Seoul, Korea. His current research interests are pattern recognition and computer vision. About the Author*RAE-HONG PARK received the B.S. and M.S. degrees in Electronics Engineering from Seoul National University, Seoul, Korea, in 1976 and 1979, respectively. He received the M.S. and Ph.D. degrees in Electrical Engineering from Stanford University, Stanford, California, in 1981 and 1984, respectively. He joined the faculty of the Department of Electronic Engineering at Sogang University, Seoul, Korea, in 1984, where he is currently a professor. In 1990, he spent his sabbatical year at the Computer Vision Laboratory of the Center for Automation Research, University of Maryland, College Park, Maryland, as a visiting associate professor. He received a scholarship from the Korean Government, Ministry of Education, from 1979 to 1983, and a Post-Doctoral Fellowship from the Korea Science and Engineering Foundation (KOSEF) in 1990. He received the Academic Award in 1987 from the Korea Institute of Telematics and Electronics (KITE) and the Haedong Paper Award in 2000 from the Institute of Electronic Engineers of Korea (IEEK). Also he received the First Sogang Academic Award in 1997 and the Professor Achievement Excellence Award in 1999, from Sogang University. He served as Editor for the KITE Journal of Electronics Engineering in 1995-1996. His current research interests are computer vision, pattern recognition, and video communication.