A hidden Markov model-based character extraction method

A hidden Markov model-based character extraction method

Pattern Recognition 41 (2008) 2890 – 2900 www.elsevier.com/locate/pr A hidden Markov model-based character extraction method Songtao Huang ∗ , Majid ...

1MB Sizes 3 Downloads 173 Views

Pattern Recognition 41 (2008) 2890 – 2900 www.elsevier.com/locate/pr

A hidden Markov model-based character extraction method Songtao Huang ∗ , Majid Ahmadi, M.A. Sid-Ahmed Electrical and Computer Engineering, University of Windsor, Essex Hall 401 Sunset Avenue, Windsor, Ontario, Canada N9B 3P4 Received 22 June 2006; received in revised form 12 January 2008; accepted 3 March 2008

Abstract In this paper a hidden Markov model (HMM)-based binarization algorithm is presented. This algorithm performs well for images with nonuniform background. To test the usefullness of the proposed technique some images of composite documents of printed characters were used. These characters were extracted through the proposed binarization algorithms and used in a commercial OCR. A comparative study of various binarization techniques is also presented. Crown Copyright 䉷 2008 Published by Elsevier Ltd. All rights reserved. Keywords: Binarization; Thresholding; HMM; OCR; Stroke

1. Introduction Document analysis has been studied by many scholars over the past few decades. A general document processing and analysis system is depicted in Fig. 1. The binarization stage of document processing is vital for the performance of the system. For example the performance of OCR can be degraded if the characters in binarized image are broken, blurred or overlapping. There are many factors that affect the performance of binarization algorithm, such as complex signal-dependent noise and variable background intensity, which are caused by nonuniform illumination, shadow, smear, smudge or low contrast. Thresholding algorithms have been used for extraction of objects from background [1,2]. In this approach the optimal thresholds of the entire image or parts of the image under test are determined using the image histograms. The effectiveness of these approaches depend on the bimodality of image histogram. This unfortunately is not always the case for real world images and as a result, the histogram-based image binarization techniques are not very effective. Alternative methods have been proposed in the literatures to alleviate this problem, such as clustering-based methods [3–5], object attribute-based [6–11] neural networks-based binarization [12]. Comparative ∗ Corresponding author. Tel.: +1 519 2533000 2592; fax: +1 519 971 3695.

E-mail addresses: [email protected],[email protected] (S. Huang).

studies of these techniques were presented by many researchers in Refs. [13–20]. The comparison is carried out in terms of speed, performance and robustness of the presented algorithms. It is shown that while some of the techniques work well for certain classes of images, they can fail for others. In this paper a binarization method for document images of text on watermarked background is presented using HMM. The paper is organized as follows. Section 2 briefly reviews some existing thresholding techniques. Section 3 describes the proposed HMM-based binarizarion algorithm. Section 4 demonstrates experimantal results. Conclusions are presented in Section 5. 2. Survey of binarization techniques The binarization methods reported in the literatures can generally be categorized into four different classes, which are: (1) (2) (3) (4)

Histogram-based methods [1,2]. Clustering-based methods [3,4]. Object attribute-based methods [6,7]. Discrimination based on local pixel’s characteristics [12,21].

Histogram-based methods are divided into two types: histogram entropy-based algorithms [22] and histogram shape-based algorithms [23]. Histogram entropy-based algorithms consider

0031-3203/$30.00 Crown Copyright 䉷 2008 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.03.004

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

Document Acquisition

Pre-Processing

Binarization

Page Segmentation (Layout Analysis)

Character Recognition or Object Recognition

Post-Processing Fig. 1. A common procedure of document analysis.

certain measures of the entropy of the original image and that of the binarized image. Various types of entropy-based binarization have been proposed. This includes entropic thresholding [22,24], fusion of three different entropies [25], co-occurrence matrix for second order entropies [26], normalized entropy [27], utilization of information measure [28], entropic thresholding block source model [29], and minimum cross entropy [30,31]. Shape of the histogram can be used to determine threshold level for the binarization of the images. These methods use peak detection [23], valley-seeking threshold selection [32], histogram concavity analysis [33], histogram shape analysis [34], valley detection using wavelet transform [35], histogram modification by enhancing the concavity [36], and histogram modification via partial differential equations [37]. The above methods are regarded as global thresholding methods, which yield good performance when the histogram of the image is clearly bimodal. Unfortunately, this attribute can be missing for images with non-uniform backgrounds or when in images that are degraded by noises. For these types of images, local histogram analysis methods have been proposed [38]. Local histogram methods divide an image into different zones through layout analysis so that the pixels in one zone are homogeneous. These methods are known to outperform the global methods. The major drawbacks with the local approaches are the determination of the correct windows size and overcoming the block effect. Some additional algorithms with higher complexity have also been proposed, such as adaptive [39] and iterative [40,41] and multi-level thresholding methods [42,43]. However, there is no single thresholding method that can be applied successfully to all types of images. In the clustering-based methods, the gray-level samples are clustered in two parts as background and foreground, or alternatively are modeled as a mixture of two Gaussians iterative thresholds. The Otsu method [3] is the most referenced thresholding method. In this method the weighted sum of within-class variances of the foreground and background pixels should be

2891

minimized to establish an optimum threshold. The minimization of within-class variances is equivalent to the maximization of between-class scatter. This method gives satisfactory results when the numbers of pixels in each class are similar. Additional methods include iterative thresholding [43], minimum error thresholding [4,5,44,45], fuzzy clustering thresholding [46]. In Ref. [47] the cluster is based on the information contained in a small window around each pixel. Object attribute-based methods search for a measure of similarity between the gray-level and the binarized image. The threshold value is based on some attribute quality or similarity measure between the original image and the binarized version of the image. These attributes can take the form of edge matching [6], shape compactness [48,49], gray-level moments [50–52]. Other algorithms directly evaluate the resemblance of the original gray-level image to binary image with fuzzy measure [53–55] or resemblance of the cumulative probability distributions [56], or in terms of the quantity of information revealed as a result of segmentation [57]. For local pixels adaptive algorithms, a threshold is calculated at each pixel, which depends on some local statistics like range, variance, or surface-fitting parameters of the neighboring pixels. A surface fitted to the gray-level landscape is used as a local threshold, as in Yanowitz and Bruckstein [21] and Shen and Ip [58]. The traditional classifiers such as neural networks [12] are introduced to discriminate the pixels into background and foreground, according to the characteristics around every pixel. It should be mentioned that all of the algorithms described above are based on the hypothesis that the foreground is much darker than the background, which is true in most of the cases. Though much effort has been devoted to the binarization, extraction of characters from noisy document images is still a big challenge because of the complexity of real world images. In the next section a new binarization scheme for a robust character extraction method will be presented: 3. The proposed binarization method The proposed algorithm is composed of two stages. In the first stage, a coarse global thresholding method is used to discriminate the bright part of the whole image from the foreground pixels which have lower values. In the second stage, the remaining pixels which are supposed to be a mixture of foreground and background are applied to the HMM pixel classifier to obtain the attribute of each pixel. In doing so only part of the whole image pixels are needed for further testing and thus the whole processing time will be much lower without sacrificing quality. 3.1. First stage Since the first stage of the proposed technique is regarded as a pre-processing of the binarization, the speed issue is of major concern in this stage instead of accuracy. The only requirement is that the threshold should be high enough to avoid classifying some foreground into background. The mean value can be

2892

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

regarded as the preliminary threshold value of the whole process, and calculated for 8 bits gray-level images as 255 ih(i) Mean = i=0 (1) 255 i=0 h(i) where h(i) is the number of pixels in the image with grey-level i where 0 i 255

in comparison with the conventional methods such as variance [61] and moments [39]. v1 (i) =

j =1 or −1

v2 (i) =

(2)

This method does not take into account histogram shape, obviously the results are suboptimal. For the documents with texture information the number of pixels in the foreground is usually smaller than the number of pixels in the background. Thus the mean value will result in under thresholding, which is a requirement in this case.

|i−3| 

|i−3|  j =1 or −1

v3 (i) =

|i−3|  j =1 or −1

v4 (i) =

|i−3|  j =1 or −1

3.2. The second stage 3.2.1. Feature extraction Instead of only considering the pixel value as the feature in the conventional histogram binarization method, in the pixel classification strategy, thresholding is based on the features around each pixel, such as range, variance, or surface-fitting parameters of the pixel neighborhood. A proper feature is critical for the performance of classification algorithms. Selected features utilized by other researchers [59,60] for classification include intensities, statistical features, feature from gradient and compass operators, local contrast feature and moment. In this paper we will propose a novel feature extraction method suitable for HMM-based image binarization. Inside a window of N by N, we take four feature vectors V1 , V2 , V3 , V4 from four directions: horizontal, left up to right down, vertical, right up to left down, around each of which contains N elements. Various sizes of windows have been tested. The simulation results show smaller window size derives smoother strokes with less computation cost. However it does not eliminate some specks in the background and makes the algorithm vulnerable to noise. For the value of N =7, the window is shown in Table 1. For example for the first vector the 7 elements are obtained as below: v1 (0) = (P(0,−3) + P(0,−2) + P(0,−1) )/3 − P(0,0) v1 (1) = (P(0,−2) + P(0,−1) )/2 − P(0,0) v1 (2) = P(0,−1) − P(0,0) v1 (3) = P(0,0) v1 (4) = P(0,1) − P(0,0) v1 (5) = (P(0,1) + P(0,2) )/2 − P(0,0) v1 (6) = (P(0,1) + P(0,2) + P(0,3) )/3 − P(0,0) Similarly features using the other three directions can be extracted as follow. Except the center element in which the original pixel value is used, other elements are chosen as the difference of the average value of the sliding window to the central pixel. The general equations are shown as in Eqs. (3)–(6), where j begins from 1 when i > 3, and j begins from −1 when i < 3. By taking this approach the computation cost is reduced

P(0,j ) − P(0,0) , |i − 3|

−3 i 3, i = 3

(3)

P(j,j ) − P(0,0) , |i − 3|

−3 i 3, i  = 3

(4)

P(j,0) − P(0,0) , |i − 3|

−3 i 3, i = 3

(5)

P(j,−j ) − P(0,0) , |i − 3|

−3 i 3, i  = 3

(6)

To make the features more independent of the illumination variation, we can find the range of candidate pixel values after the thresholding to be between a minimum value in the given image Mini to the mean value of the image pixels Mean. In this case vi (3) ∈ [Mini, Mean],

i = [1, 4]

(7)

An enhancement process is proposed as follows: vi (3) = (vi (3) − Mini)2 ×

255 (Mean − Mini)2

(8)

Noting that the largest pixel value is 255. Through the observation of Figs. 2–4 one can conduct that the 3D surface around pixels in a stroke shown in Fig. 4 is completely different from pixels in the background (see Figs. 2 and 3). Since the features from the foreground and background are different from each other, it is possible for the HMM recognizer to discriminate them. 3.2.2. Hidden Markov model-based binarization The 1-D hidden Markov models (HMMs) were proposed in the 1960s by Baum et al. [62–65]. In the last decades, it has become the predominant approach to the automatic speech recognition [66]. In this paper we focus on the application of HMM to image binarization. The HMM consists of finite set of states, each of which is associated with a probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. In order to define an HMM, the following elements are needed. The number of states of the model, N, and the number of observation symbols in the alphabet, M. A set of state transition probabilities. aij = p{qt+1 = j |qt = i},

1 i, j N

where qt denotes the current state.

(9)

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

2893

Table 1 Demonstration of feature extraction P (−3, −2) P (−2, −2) P (−1, −2) P (0, −2) P (1, −2) P (2, −2) P (3, −2)

P (−3, −1) P (−2, −1) P (−1, −1) P (0, −1) P (1, −1) P (2, −1) P (3, −1)

P (−3, 0) P (−2, 0) P (−1, 0) P (0, 0) P (1, 0) P (2, 0) P (3, 0)

P (−3, 1) P (−2, 1) P (−1, 1) P (0, 1) P (1, 1) P (2, 1) P (3, 1)

P (−3, 2) P (−2, 2) P (−1, 2) P (0, 2) P (1, 2) P (2, 2) P (3, 2)

210 200 Pixel values

190 180 170 160 150 140 130 25 20 15 10

x

5 0

5

0

15

10

20

25

y

Fig. 2. Pixel values in a noisy background.

150 Pixel values

P (−3, −3) P (−2, −3) P (−1, −3) P (0, −3) P (1, −3) P (2, −3) P (3, −3)

100 50 0 25 20 15 x

10 5 0 0

5

10

15 y

Fig. 3. Pixel value in a dark background.

20

25

P (−3, 3) P (−2, 3) P (−1, 3) P (0, 3) P (1, 3) P (2, 3) P (3, 3)

2894

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

Pixel values

200 150 100 50 0 25 20 15 x

10 5 0

0

5

10

20

15

25

y

Fig. 4. Pixel values in one segment of a stroke of a character.

A probability distribution in each of the states, B = {bj (k)}. bj (k) = p{ot = vk |qt = j },

1 j N, 1k M

(10)

The initial state distribution  = {i } where i = P [q1 = Si ],

1i N

(11)

An HMM recognizer works in two phases, training phase and evaluation phase. 3.2.3. The training phase Generally, the training problem is to adjust the HMM parameters  (which are presented in Eqs. (9)–(11)), so that the given set of observations O (called the training set) is represented by the model in the best way for the intended application. Baum–Welch Algorithm [63] can be derived using simple “occurrence counting” arguments or using calculus to maximize the auxiliary quantity over ¯ which is the estimated parameters [64]:  ¯ = ¯ Q(, ) p{q|O, } log[p{O, q, }] (12) q

The special feature of the algorithm is the guaranteed convergence. The Baum–Welch algorithm, which is also known as the Forward–Backward algorithm, is one of the most successful training algorithms for the HMM and is used in our algorithm. In the previous section we have explained how features from a pixel can be computed using Eqs. (3)–(6). We have scanned images with various degrading sources from newspaper and magazines to setup a database. Twenty images from the database are randomly selected. From these images 40,000 pixels are selected. Half of the pixels are from strokes, while the other half are from various backgrounds. Each pixel is

assigned to four feature vectors, therefore 160,000 vectors are generated to form a code book. The K-Mean algorithm [67] is used to segment this vector space into 10 partitions represented by a set of cluster centers. Then the four feature vectors for each pixel will be quantized into one observation sequence with four elements according to their distances to the 10 cluster centers. All of the observation sequences will be inputted into the HMM to estimate the parameters of the HMM with the Baum–Welch algorithm. 3.2.4. The evaluation phase In this phase given the HMM model as  = (A, B, ) and a sequence of observations O = o1 , o2 , . . . , ot , our aim is to find p{O|}. The forward algorithm [64] is usually utilized, which is described in this section. An auxiliary variable, t (i) called the forward variable is defined as the probability of the partial observation sequence o1 , o2 , . . . , ot , when it terminates at the state i. Mathematically, t (i) = p{o1 , o2 , . . . , ot , qt = i|}

(13)

Then it is easy to see that following recursive relationship holds. t+1 (j ) = bj (t+1 ) 1 j N,

N 

t (i)ij

i=1

1 t T − 1

(14)

where 1 (j ) = j bj (o1 ),

1 j N

(15)

Using this recursion one can calculate T (j ),

1 j N

(16)

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900 Original Histogram

and then the required probability is given by p{O|} =

N 

T (i)

2895

12000

(17)

10000

For the conventional HMM, the observation sequence should be inputted to the HMM to calculate the possibility of the sequence belonging to a class. In this application the length of the sequence is only 4 and the number of observations is 10, therefore the observation sequences vary from [0 0 0 0] to [9 9 9 9]. At the same time there are only two classes to be identified—background and foreground. Therefore it is possible to implement a criterion to speed up the whole process. All of the observation sequences from [0 0 0 0] to [9 9 9 9] are inputted into the trained HMM engine and the classification results will be saved in a reference set with 0 representing the background and 1 the foreground. The size of the reference set is only 10 K bits. In the recognition stage after feature extraction, the feature vectors are quantized into 10 observation through comparison of their distance to the centers of the 10 clusters determined in the training stage. Instead of calculating the possibility for each observation sequence, the recognition result can be looked up in the reference set to obtain the recognition result with an efficient computational cost.

8000

i=1

6000

4000

2000

0 0

50

100

150

200

250

300

Fig. 6. The histogram of a gray historical document image with low contrast and signal-dependent noise.

4. Simulation results To test the performance of the proposed binarization technique a comparative study is carried out. In our experimentation, Otsu global thresholding [3], local adaptive thresholding [68], Kittler thresholding algorithms [5] were utilized as benchmarks for this comparison. Our simulation results show the HMM-based segmentation method can provide the satisfying results in all of the tested images while others fail to provide acceptable results for some of the images in our database. To test the robustness of proposed algorithm in presence of noise sources we have conducted the

Fig. 7. An original document image with inhomogeneous background.

same comparative studies with the other three binarization techniques as described as below: 4.1. Database used

Fig. 5. An original historical document image with low contrast and signal-dependent noise.

To test the utility of the proposed algorithm, three kinds of noisy images were tested. The first picture in Fig. 5 is a historical document image with blur strokes, complex signaldependent noises and low contrast. The histogram is shown in Fig. 6. In the other image we have inhomogeneous and complex background as shown in Fig. 7. From the histogram as shown in Fig. 8, we observe the lack of bimodality which is a fundamental requirement for a global thresholding algorithms.

2896

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900 Original Histogram

Original Histogram 8000

3000

7000

2500

6000 2000

5000 4000

1500

3000

1000

2000 500

1000

0

0 0

50

100

150

200

250

300

Fig. 8. The histogram of a gray document image with inhomogeneous background.

0

50

100

150

200

250

300

Fig. 10. The histogram of a gray document image under bad illuminating condition.

Fig. 11. Binary document image extracted with the Otsu algorithm from the original image of Fig. 5. Fig. 9. An original document image under bad illuminating condition.

The picture in Fig. 9 is an image taken by camera with varying illumination in different parts of the image. We can find the distribution of the histogram in Fig. 10, which is clear not to be bimodal. 4.2. Comparison of the binarization results For the image as shown in Fig. 5, the Kittler algorithm yielded unacceptable result. The binary images obtained with the Otsu and Local thresholding algorithm are shown in Figs. 11 and 12 which are too noisy for any character recognition scheme. From Fig. 13 we can see that the binary image from HMM-based binarization algorithm is clearer. The binarization results of the image in Fig. 7 which has complex background is shown in Figs. 14–17. We can observe that the Kittler algorithm and HMM-based binarization have identical performance while others yield unsatisfactory results. The binarization results of image in Fig. 9 which has

Fig. 12. Binary document image extracted with the local thresholding algorithm from the original image of Fig. 5.

non-uniform illumination is shown in Figs. 18–21. We can clearly observe that the proposed algorithm performed better than others.

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

2897

Fig. 13. Binary document image extracted with the HMM based thresholding algorithm from the original image of Fig. 5.

Fig. 15. Binary document image extracted with Local thresholding from the original image of Fig. 7.

Fig. 14. Binary document image extracted with the Otsu algorithm from the original image of Fig. 7.

4.3. Quantitative study A handful of binarization performance criteria [69–71] have been proposed, where OCR-based criteron is one of the most acceptable method. To further prove the efficiency of the proposed binarization method, we process 495 images from the database of ICDAR [72] for our test. It should be mentioned that all of the methods used here work on gray images, thereby the color images are converted into gray level images before binarization. After binarization the binary images are sent to three commercial OCR softwares, which include Readiris 11 professional, SimpleOCR, ABBYY 8.0. The recognition results can be regarded as a reliable criterion to evaluate the performances of different binarization methods. There are total 11,407

Fig. 16. Binary document image extracted with the Kittler algorithm from the original image of Fig. 7.

machine printed characters in the tested images. The OCR test results are reported in Table 2. 5. Conclusion In this paper a HMM-based binarization technique was presented. The test results conducted on a large number of noisy images yielded satisfactory results. A comparative study of the proposed technique with three different binarization scheme was also carried out. The result of the extracted characters from all binarization techniques were applied to a commercial OCR

2898

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

Fig. 20. Binary document image extracted with the Kittler algorithm from the original image of Fig. 9.

Fig. 17. Binary document image extracted with the HMM based thrsholding algorithm from the original image of Fig. 7.

Fig. 21. Binary document image extracted with the HMM based thresholding algorithm from the original image of Fig. 9.

Fig. 18. Binary document image extracted with the Otsu algorithm from the original image of Fig. 9.

Fig. 19. Binary document image extracted with the Local thresholding algorithm from the original image of Fig. 9.

Table 2 OCR results from different binarized images OCR recognition rate

HMM-based binarization

Kittler [5]

Local [68]

Otsu [3]

Readiris Simple OCR ABBYY

71.6% 63.6% 75.9%

21.2% 18.7% 23.5%

44.2% 38.3% 41.8%

51.0% 46.3% 52.8%

engine. The result shows the proposed method outperforms the reference methods. This can be attributed to the fact that in this method local features around every pixel are selected for binarization. As can be seen the characteristics of a neighborhood around a pixel in the background illustrated in Figs. 2 and 3 are completely different from the pixels in strokes shown in Fig. 4. This makes it easy to distinguish between background and pixels in characters. Our quantitative study from three different independent OCR engines also shows recognition rates obtained using the proposed technique are much higher than other conventional methods.

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

References [1] J.M.S. Prewitt, M.L. Mendelsohn, The analysis of cell images, Ann. New York Acad. Sci. 128 (1966) 1035–1053. [2] W. Doyle, Operation useful for similarity-invariant pattern recognition, J. Assoc. Comput. Mach. 9 (1962) 259–267. [3] N. Otsu, A threshold selection method from gray-level histogram, IEEE Trans. Syst. Man Cybern. 9 (1979) 62–66. [4] J. Kittler, J. Illingworth, On threshold selection using illustering criteria, IEEE Syst. Man Cybern. 15 (1985) 652–655. [5] J. Kittler, J. Illingworth, Minimum error thresholding, Pattern Recognition 19 (1986) 41–47. [6] L. Hertz, R.W. Schafer, Multilevel thresholding using edge matching, CVGIP 44 (1988) 279–295. [7] A. Pikaz, A. Averbuch, Digital image thresholding based on topological stable state, Pattern Recognition 29 (1996) 829–843. [8] A.D. Brink, Grey-level thresholding of images using a correlation criterion, Pattern Recognition Lett. 9 (1989) 335–341. [9] F. Morii, An image thresholding method using a minimum weighted squared-distortion criterion, Pattern Recognition 28 (1995) 1063–1071. [10] L.K. Huang, M.J.J. Wang, Image thresholding by minimizing the measures of fuzziness, Pattern Recognition 28 (1995) 41–51. [11] M. Sezgin, B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, J. Electron. Imaging 13 (1) (2004) 146–165. [12] H. Yan, J. Wu, Character and line extraction from color map images using a multi-layer neural network, Pattern Recognition Lett. 15 (1994) 97–103. [13] S.U. Le, S.Y. Chung, R.H. Park, A comparative performance study of several global thresholding techniques for segmentation, Graph. Models Image Process. 52 (1990) 171–190. [14] J.S. Weszka, A. Rosenfeld, Threshold evaluation techniques, IEEE Trans. Syst. Man Cybern. 8 (1978) 627–629. [15] P.W. Palumbo, P. Swaminathan, S.N. Srihari, Document image binarization: evaluation of algorithms, Proc. SPIE 697 (1986) 278–286. [16] P.K. Sahoo, S. Soltani, A.K.C. Wong, Y. Chen, A survey of thresholding techniques, Comput. Graph. Image Process. 41 (1988) 233–260. [17] C.A. Glasbey, An analysis of histogram-based thresholding algorithms, Graph. Models Image Process. 55 (1993) 532–537. [18] O.D. Trier, A.K. Jain, Goal-directed evaluation of binarization methods, IEEE Trans. Pattern Anal. Mach. Intell. 17 (1995) 1191–1201. [19] B. Sankur, A.T. Abak, U. Baris, Assessment of thresholding algorithms for document processing, in: ICIP99, 1999, pp. 580–584. [20] J.R. Parker, Gray level thresholding in badly illuminated images, IEEE Trans. Pattern Anal. Mach. Intell. 13 (8) (1991) 813–819. [21] S.D. Yanowitz, A.M. Bruckstein, A new method for image segmentation, Comput. Graph. Image Process. 46 (1989) 82–95. [22] T. Pun, A new method for gray-level picture threshold using the entropy of the histogram, EUFL4SIP: Signal Process. 2 (3) (1980) 223–237. [23] M.I. Sezan, A peak detection algorithm and its application to histogrambased image data reduction, CVGIP 29 (1985) 47–59. [24] T. Pun, Entropic thresholding, a new approach, Comput. Graph. Image Process. 16 (1981) 210–239. [25] P. Sahoo, C. Wilkins, J. Yeager, Threshold selection using Renyi’s entropy, Pattern Recognition 30 (1997) 71–84. [26] N.R. Pal, S.K. Pal, Entropic thresholding, EURASIP: Signal Process. 16 (2) (1989) 97–108. [27] J.N. Kapur, P.K. Sahoo, A.K.C. Wong, A new method for gray-level picture thresholding using the entropy of the histogram, Graph. Models Image Process. 29 (1985) 273–285. [28] A.G. Shanbag, Utilization of information measure as a means of image thresholding, CVGIP 56 (1994) 414–419. [29] A. Beghdadi, A.L.P. Negrate, V. DeLesegno, Entropic thresholding using a block source model, CVGIP CMIP 57 (1995) 197–205. [30] A.D. Brink, N.E. Pendock, minimum cross entropy threshold selection, Pattern Recognition 29 (1996) 179–188. [31] C.H. Li, C.K. Lee, Minimum cross-entropy thresholding, Pattern Recognition 26 (1993) 617–625.

2899

[32] S.C. Sahasrabudhe, K.S.D. Gupta, A valley-seeking threshold selection technique, in: A. Rosenfeld, L. Shapiro (Eds.), Computer Vision and Image Processing, Academic Press, New York, 1992, pp. 55–65. [33] A. Rosenfeld, P. De laTorre, Histogram concavity analysis as an aid in threshold selection, IEEE Syst. Man Cybern. 13 (1983) 231–235. [34] M.J. Carlotto, Histogram analysis using a scale-space approach, IEEE Pattern Anal. Mach. Intell. 9 (1987) 121–129. [35] J.-C. Ohio, Automatic threshold selection using the wavelet transform, CVGIP 56 (1994) 205–218. [36] J. Weszka, A. Rosenfeld, Histogram modification for threshold selection, IEEE Syst. Man Cybern. 9 (1979) 38–52. [37] G. Sapiro, V. Casselles, Histogram modification via partial differential equations, in: Proceedings of IClP 95, 1995, pp. 632–635. [38] L.A. Fletcher, R. Kasturi, A robust algorithm for text string separation from mixed textigraphics images, IEEE Trans. Pattern Anal. Mach. Intell. 10 (6) (1988) 910–918. [39] Y. Yang, H. Yan, An adaptive logical method for binarization of degraded document images, Pattern Recognition 33 (2000) 787–807. [40] T.W. Ridler, S. Calvard, Picture thresholding using an iterative selection method, IEEE Trans. Syst. Man Cybern. 8 (1978) 630–632. [41] C.K. Leung, F.K. Lam, Performance analysis of a class of iterative image thresholding algorithms, Pattern Recognition 29 (9) (1996) 1523–1530. [42] H.J. Trussel, Comments on picture thresholding using iterative selection method, IEEE Trans. Syst. Man Cybern. 9 (1979) 311. [43] M.K. Yanni, E. Horne, A new approach to dynamic thresholding, EUSIPCO’94: 9th European Conference on Signal Processing, vol. 1, Wiley, NY, 1994, pp. 34–44. [44] D.E. Lloyd, Automatic target classification using moment invariant of image shapes, Technical Report, RAE IDN AW126, Farnborough, UK, December 1985. [45] S. Cho, R. Haralick, S. Yi, Improvement of Kittler and Illingworths’s minimum error thresholding, Pattern Recognition 22 (1989) 609–617. [46] C.V. Jawahar, P.K. Biswas, A.K. Ray, Investigations on fuzzy thresholding based on fuzzy clustering, Pattern Recognition 30 (1997) 1605–1613. [47] E. Giuliano, O. Paitra, L. Stringa, Electronic character reading system, U.S. Patent 4, 047,15, 6 September, 1977. [48] K. Pal, A. Rosenfeld, Image enhancement and thresholding by optimization of fuzzy compactness, Pattern Recognition Lett. 7 (1988) 77–86. [49] A. Rosenfeld, The fuzzy geometry of image subsets, Pattern Recognition Lett. 2 (1984) 311–317. [50] W.H. Tsai, Moment-preserving thresholding: a new approach, Graph. Models Image Process. 19 (1985) 377–393. [51] S.C. Cheng, W.H. Tsai, A neural network approach of the momentpreserving technique and its application to thresholding, IEEE Trans. Comput. C-42 (1993) 501–507. [52] E.J. Delp, O.R. Mitchell, Moment-preserving quantization, IEEE Trans. Commun. 39 (1991) 1549–1558. [53] C.A. Murthy, S.K. Pal, Fuzzy thresholding: a mathematical framework, bound functions and weighted moving average technique, Pattern Recognition Lett. 11 (1990) 197–206. [54] R. Yager, On the measure of fuzziness and negation. Part I: membership in the unit interval, Int. J. Gen. Syst. 5 (1979) 221–229. [55] K. Ramar, S. Arunigam, S.N. Sivanandam, L. Ganesan, D. Manimegalai, Quantitative fuzzy measures for threshold selection, Pattern Recognition Lett. 21 (2000) 1–7. [56] X. Fernandez, Implicit model oriented optimal thresholding using Kolmogorov–Smirnov similarity measure, in: ICPR 2000: International Conference on Pattern Recognition, Barcelona, 2000, pp. 466–469. [57] C.K. Leung, F.K. Lam, Maximum segmented image information thresholding, Graph. Models Image Process. 60 (1998) 57–76. [58] D. Shen, H.H.S. Ip, A Hopfield neural network for adaptive image segmentation: an active surface paradigm, Pattern Recognition Lett. 18 (1997) 37–48. [59] W. Niblack, An Introduction to Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1986 pp. 115–116. [60] J. Sauvola, M. Pietaksinen, Adaptive document image binarization, Pattern Recognition 33 (2000) 225–236.

2900

S. Huang et al. / Pattern Recognition 41 (2008) 2890 – 2900

[61] J.M. White, G.D. Rohrer, Imager segmentation for optical character recognition and other applications requiring character image extraction, IBM J. Res. Dev. 27 (4) (1983) 400–411. [62] L.E. Baum, An inequality and associated maximization technique in statistical estimation for probabilistic functions of finite state Markov chains, in: Inequalities III, Academic Press, New York, 1972, pp. 1–8. [63] L.E. Baum, T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, Ann. Math. Stat. 37 (1966) 1554–1563. [64] L.E. Baum, J.A. Eagon, An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology, Bull. Am. Math. Stat. 37 (1967) 360–363. [65] L.E. Baum, T. Petrie, G. Soules, N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Stat. 41 (1) (1970) 164–171. [66] R. Cole, L. Hirschman, L. Atlas, M. Beckman, et al., The challenge of spoken language systems: research directions for the nineties, IEEE Trans. Speech Audio Process. 3 (1995) 1–21. [67] J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proceedings of the 5th Berkeley

[68]

[69] [70]

[71] [72]

Symposium on Mathematical Statistics and Probability, vol. 1, University of California Press, Berkeley, CA, 1967, pp. 281–297. A.E. Savakis, Adaptive document image thresholding using foreground and background cluster, in: The 7th International Conference on Electronics, Information, and Communications (ICEIC04), Proceedings of International Conference on Image Processing ICIP98, 1998, pp. 438–441. W.A. Yasnoff, J.K. Mui, J.W. Bacus, Error measures for scene segmentation, Pattern Recognition 9 (1977) 217–231. M.D. Levine, A.M. Nazif, Dynamic measurement of computer generated image segmentations, IEEE Trans. Pattern Anal. Mach. Intell. 7 (1985) 155–164. Y.J. Zhang, A survey on evaluation methods for image segmentation, Pattern Recognition 29 (1996) 1335–1346. S.M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, CDAR 2003 robust reading competitions, in: Proceedings of the ICDAR, 2003, pp. 682–687.

About the Author—SONGTAO HUANG received his B.Sc. degree from Wuhan University, Wuhan, China in 1993 from the electrical and computer engineering. He completed his M.Sc. and Ph.D. at the University of Windsor in 2003 and 2007, respectively. His research areas are signal and image processing, high dynamic range imager, pattern recognition. Now he works at the vision group at Sensata Technologies as vision algorithm architect. About the Author—MAJID AHMADI (S’75-M’77-SM’84-F’02) received the B.Sc. degree in electrical engineering from Arya Mehr University, Tehran, Iran, in 1971 and the Ph.D. degree in electrical engineering from Imperial College of London University, London, UK, in 1977. He has been with the Department of Electrical and Computer Engineering, University of Windsor, Windsor, Ontario, Canada, since 1980, currently as University Professor and Director of Research Center for Integrated Microsystems. His research interests include digital signal processing, machine vision, pattern recognition, neural network architectures, applications, and VLSI implementation, computer arithmetic, and MEMS. He has co-authored the book Digital Filtering in 1 and 2 dimensions; Design and Applications (New York: Plennum, 1989) and has published over 400 articles in these areas. He is the Regional Editor for the Journal of Circuits, Systems and Computers and Associate Editor for the Journal of Pattern Recognition, and the International; Journal of Computers and Electrical Engineering. Dr. Ahmadi was the IEEE-CAS representative on the Neural Network Council and the Chair of the IEEE Circuits and Systems Neural Systems Applications Technical Committee (2000). He was a recipient of an Honorable Mention award from the Editorial Board of the Journal of Pattern Recognition in 1992 and received the Distinctive Contributed Paper award from the Multiple-Valued Logic Conference Technical Committee and the IEEE Computer Society in 2000. He is a Fellow IET (UK). About the Author—MAHER A. SID-AHMED was born in Alexandria, Egypt, in March 1945. He received the B.A.Sc. degree in electrical engineering from the University of Alexandria, Egypt, in 1968 and the Ph.D. degree in digital signal processing from the University of Windsor, Windsor, Ontario, Canada, in 1974. After graduation worked at the Alberta Government telephones and the University of Alexandria, Egypt. He joined University of Windsor in 1978 as a faculty member and is presently the department head for the ECE program. Dr. Sid-Ahmed holds four US patents, published a book on Image processing with McGraw-Hill and authored and co-authored over 100 papers. His areas of interest include image processing, robotic vision, pattern recognition and Improved Definition Television.