Neurocomputing 212 (2016) 96–106
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Dynamic character grouping based on four consistency constraints in topographic maps Pengfei Xu a, Qiguang Miao b,n, Ruyi Liu b, Xiaojiang Chen a, Xunli Fan a a b
School of Information Science and Technology, Northwest University, 710127, China School of Computer, Xidian University, Xi’an, Shaanxi 710071, China
art ic l e i nf o
a b s t r a c t
Article history: Received 19 November 2015 Received in revised form 9 January 2016 Accepted 31 January 2016 Available online 30 June 2016
In optical character recognition, text strings should be extracted from images first. But only the complete text strings can accurately express the meanings of the words, so the extracted individual characters should be grouped into text strings before recognition. There are lots of text strings in topographic maps, and these texts consist of the characters with multi-colored, multi-sized and multi-oriented, and the existing methods cannot effectively group them. In this paper, a dynamic character grouping method is proposed to group the characters into text strings based on four consistency constraints, which are the color, size, spacing and direction respectively. As we know that the characters in the same word have similar colors, sizes and distances between them, and they are also on some curve lines with a certain bending, but the characters in different words are not. Based on these features of the characters, the background pixels around the characters are expanded to link the characters into text strings. In this method, due to the introduction of the color consistency constraint, the characters with different colors can be grouped well. And this method can deal with curved character strings more accurately by the improved direction consistency constraint. The final experimental results show that this method can group the characters more efficiently, especially for the case in which the beginning or the end characters of words are close to the characters of the other words. & 2016 Elsevier B.V. All rights reserved.
Keywords: Character grouping Consistency constraint Color information Character size Character spacing Text direction Character expandability Topographic maps
1. Introduction Text recognition techniques have been widely used in both academic research and commercial software development. In classic text recognition systems, the first step is to analyze the layout of the image for locating and ordering the text blocks. Next, each of the identified text blocks containing text lines is processed for text recognition [1,2]. These previous methods can work well on extracting text lines from homogeneous texts or some others in specific cases, such as straight text lines, multi-oriented but similar-sized characters. Most of the text processing methods do not take character grouping into consideration, because they aim to extract or organize the single character. And some of them group the characters into text strings, but these texts are on horizontal direction, so a dilation operation can be used to achieve this task [3]. In topographic maps, there are lots of texts, which are important information for understanding the topographic maps and n
Corresponding author. E-mail addresses:
[email protected] (P. Xu),
[email protected] (Q. Miao),
[email protected] (X. Chen),
[email protected] (X. Fan). http://dx.doi.org/10.1016/j.neucom.2016.01.118 0925-2312/& 2016 Elsevier B.V. All rights reserved.
their special geographic attributes. For these texts, the traditional text processing methods can be applied to extract, separate or recognize them. These methods are limited to process the single character, because they do not consider character grouping. But in topographic maps, only the whole text strings can express the accurate meanings of the geographic elements. Besides, recognizing individual characters separately fails to take advantage of the word contexts, which can utilize a dictionary to help recognize the grouped characters. So character grouping is much helpful for text understanding and recognition [4,5]. But there are few works about character grouping, especially for the texts in topographic maps, because most texts in topographic maps are on horizontal or straight text lines, and the background is so simple that a dilation operation can achieve character grouping. However, in some other topographic maps, the distribution of characters is very complex. Such as the map shown in Fig. 1(a), there are lots of texts with multi-color, multi-sized and multioriented. In addition, there are broken and touching characters in the segmented maps, as shown in Fig. 1(c). All these facts adversely affect character grouping, it is possible for most previous methods that some characters are mistakenly grouped into other text strings or leaved out, especially when some text strings are on curved lines.
P. Xu et al. / Neurocomputing 212 (2016) 96–106
97
Fig. 1. Texts in topographic maps.
In order to solve these problems, various character features such as character distribution, color, size, and direction in topographic maps are analyzed carefully. Combined with the relative merits of the previous methods, this paper proposes a dynamic character grouping method to group the characters into text strings based on four consistency constraints, which are color, size, spacing and direction respectively. According to these features of the characters, the background pixels around the characters are expanded to link the characters into text strings. This method can deal with the characters with more complex distribution, especially for the beginning or the end characters of the words.
2. Related works Text recognition from topographic maps containing nonhomogeneous texts is a difficult task, while recognizing individual characters separately fails to take advantage of the whole word context, and the recognition results cannot represent the meaning of the words. Therefore researchers proposed character grouping methods to group characters into text strings, then these strings can be recognized to represent the meaning of the words more accurately [3,6–9]. In 1988, Fletcher and Kasturi [10] used Hough transform to group characters into text strings. This method is robust to the text font style, size, and orientation, but cannot be applied on curved strings, because the Hough transformation only detects straight lines. Based on the constraints that the characters lie on a straight line and their centroids are separated by about 1.4 times their average width, Li [11,12] proposed a method to assemble characters into word boxes to form the possible aligned strings. Lu [13] grouped the characters together just like “brushing” them in the horizontal and vertical directions. Caprioli grouped characters into strings based on the size and spacing of the characters [14] in Italian topographic maps. In 1999, Goto proposed a method called Extended Linear Segment Linking, which is able to
extract text strings in arbitrary orientations and curved lines [15]. This method works on touching characters effectively, and requires that the sizes of the characters are similar. Generally, the gaps between words are larger than those within words, so Cao [16] used a dilation operation to perform on the character images. And the isolated characters are clustered into individual words [17]. A bottom-up approach was proposed by Pal [18], but it cannot work on the curved text strings. In 2007, Pouderoux put forward the idea of undirected graphs for character grouping based on the character sizes only [19]. In 2008, Roy proposed a method based on the foreground and background information of the characters to extract individual text strings from multi-oriented and curved text documents [20]. In 2009, another method was presented by him to segment English multi-oriented touching strings into individual characters by using convex hull information [21]. These methods can deal with curved strings, but the directions of the strings were detected only in four directions. In 2004, a method for separating and recognizing the touching/overlapping characters was proposed by Velázquez [22]. In this method, OCR was applied to define the coordinate, size and orientation of the character strings, and four straight lines or curves were extrapolated to separate those attached symbols. In 2011, Pezeshk grouped the individual characters into their respective strings using pyramid decomposition with Gaussian kernels [23–25], but this method cannot distinguish different text strings which are close to each other. According to the analysis above, it is known that most researchers focused on the study of text separation and recognition, but the research on grouping text strings does not delve deeply. Chiang has done lots of work on grouping characters [26] and text recognition [27], and a conditional dilation algorithm was presented for grouping characters into text strings [26]. Compared with other methods, Chiang's method can get better results. But there are still some problems, for example, the color information of characters is not considered, and the string curvature condition
98
P. Xu et al. / Neurocomputing 212 (2016) 96–106
Fig. 2. The framework of the proposed method.
Fig. 3. One neighbor character of α1 or α 2 .
P. Xu et al. / Neurocomputing 212 (2016) 96–106
99
Fig. 4. Neighbor characters of α1 and α 2.
Fig. 5. The results obtained by different methods from artificial images A.
is not perfect. In order to solve these problems, this paper proposes a method to group characters into text strings based on the consistency constraints of the character color, size and directions. The remainder of this paper is organized as follows: in Section 3, the proposed method is described in detail. And the experimental results and analysis of the performance are given in Section 4. Finally, the concluding remarks are given in Section 5.
3. Dynamic character grouping based on four consistency constraints 3.1. The overview of the proposed method This section gives a new method for grouping characters into text strings. Based on the facts that the characters in one text
100
P. Xu et al. / Neurocomputing 212 (2016) 96–106
Fig. 6. The results obtained by different methods from artificial images B.
string have similar color, size and spacing, and the centers of these characters are on a curved line, whose curvature is in a numerical range. So the four consistency constraints, which consist of color, size, spacing and direction, are used to group the characters into their own text strings. In this method, the color information, which is ignored by existing methods, is used as an additional constraint due to that some characters are presented by different colors, and the direction constraint is designed more perfectly. The framework of this method is shown in Fig. 2, and its basic idea is stated as follows. Once we extract the texts from the topographic map, we have a binary image where the connected components in the foreground are characters or parts of the characters. The proposed method performs multiple iterations to expand the background pixels, and finally the characters are connected into text strings via the expanded background pixels. In one iteration, this method tests every background pixel (P(i)) to determine the number (Nb) of the characters which P(i) connects to. If Nb ¼0 or Nb 42, do nothing. If Nb ¼1, P(i) is expanded and marked as the foreground pixel. Else if Nb ¼ 2, which means P(i)
connects to two characters, then the method needs to test the number (Nc) of the characters which are nearby to these two characters. If Nc r2, the consistency constraints are tested to determine whether P(i) is expanded. After an iteration, the time of the iterations (I) add 1. When I4 T, the method stops, where T is a threshold used to control the times of the iterations. We describe the four consistency constraints in the remainder of this section. 3.2. The color and size consistency constraints When Nb ¼ 2, assuming that the characters α1 and α2 are the two characters which the background pixel P(i) connects to. Then based on the distances between characters, we try to get the neighbor character α1, N of α1, and α2, N of α2. In this way, at least two and no more than four characters, which may belong to the same text string, can be obtained. Then there are two cases when we check α1 and α2 whether belong to the same text string or not. In one case, if Nc ¼0, which means there are no neighbor characters of α1 and α2, the color and size consistency constraints
P. Xu et al. / Neurocomputing 212 (2016) 96–106
101
Fig. 7. The results obtained from topographic maps M1.
are tested to determine whether P(i) is expanded. In topographic maps, some text strings are represented by different colors, so we can use this information to distinguish different text strings. And for a character α , its color feature is defined as:
Cα = M ( α R, αG, α B )
(1)
where Cα is the main color feature of the character α , which is obtained by the color histogram. The average color value of the peak regions in the color histogram of the whole character is used as its main color feature. The color difference between α1 and α2 can be obtained by Mahalanobis distance.
Dα1, α 2 =
( Cα1 − Cα2 )T S−1(Cα1 − Cα2 )
(2)
where S−1 is the inverse of the covariance matrix of the samples. The color consistency constraint of the characters means that Dα1, α2 should be less than or equal to the minimum threshold Tc .
Dα1, α 2 ≤ Tc
(3)
where Tc is the average color difference in the area around the current characters α1 and α2. N−1
Tc =
Max (α _box_Height , α _box_Width) ≤ Max _S Min _S ≥ Min (α _box_Height , α _box_Width)
(5)
Then the current connectivity component is a single character. In Eq. (5), Min _S and Max _S are the threshold parameters of the minimum size and the maximum size of the single character. For a character, its size is the max value of the length and the width of its bounding box, and the size ratio between two characters α1 and α2 is evaluated by Eq. (6):
sizeRatioα1, α 2 =
Max (size (α1), size (α2 )) Min (size (α1), size (α2 ))
(6)
N
∑i = 1 ∑ j = i + 1 D pi , pj
(
where N is the number of the pixels in the area, D pi, pj is the color difference between the pixels pi and pj . If there is no color information, all the characters are viewed as that all of them have the same color. Further, they are tested by the size consistency constraint. In the binary images after text extraction, there are a variety of connectivity components, which are characters or non-characters. If the size of a connectivity component satisfies:
N (N −1) 2
)
size (α ) = Max (α _box_Height , α _box_Width) (4)
(7)
where α_box_Height and α_box_Width are the height and width of
102
P. Xu et al. / Neurocomputing 212 (2016) 96–106
Fig. 8. The results obtained from topographic maps M2.
the character’s bounding box respectively. Because the sizes of the characters in one text strings must be similar, the size ratio of the characters must be smaller than a threshold Ts . According to the size features of the characters, such as the English letter ‘f ’ and ‘a’, we always choose Ts = 3. So if there are no neighbor characters of α1 and α2, and the characters α1 and α2 satisfy the color and size consistency constraints, P(i) should be expanded.
grouping is similar to this case. So if Dα1, α2, αP , N ≤ Ts , and the angle θ is less than or equal to the threshold Td , these three characters are satisfied with the spacing and direction consistency constraints.
3.3. The spacing and direction consistency constraints
Here the curvature parameter β is utilized to control Td , according to the spacing and the sizes of the characters, or on the basis of the empirical data, Ts is usually set to 10, and the curvature parameter β is set to 50° generally. When α1 and α2 have their own neighbor characters, we need to check the spacing and direction consistency constraints of these four characters, as shown in Fig. 4. if
In the other cases, if Nc ¼1 or 2, which means there are one or two neighbor characters of α1 and α2, then the spacing and direction consistency constraints are used to determine whether P(i) is expanded besides the color and size consistency constraints. Generally, the characters in one string have similar character spacing, so the distance dαi, αj between characters αi and αj can be obtained by:
dαi, αj =
(x αi − x αj )2 + ( yαi − yαj )2
(P = 1 or 2)
(10)
Td = 180° ± β
(11)
2
∑P = 1 Dα1, α 2, αP , N
≤ Ts
(12)
(|θ1 − 180∘| ≤ Td )|(|θ 2 − 180∘| ≤ Td ),
(13)
2
(8)
When there is one neighbor character of α1 or α2, As shown in Fig. 3. the difference of the character spacing is
Dα1, α 2, αP , N =|dα1, α 2 − dαP, αP, N |
|θ − 180°| ≤ Td
(9)
On the other hand, although the sizes of the characters in one text string meet the size consistency constraint, they are not strictly equal, so the centers of these characters should be on a curved line rather than on a straight line. And the curved text
and
then these four characters are satisfied with the direction consistency constraint. If the characters α1, α2 and their neighbor characters are satisfied with the four consistency constraints, P(i) should be expanded, α1 and α2 are grouped into the same text string.
P. Xu et al. / Neurocomputing 212 (2016) 96–106
103
Fig. 9. The results obtained from topographic maps M3.
4. Experiments and analysis In this section, several experiments are made on artificial images and topographic maps to verify the accuracy of the proposed method. And three previous methods (Cao's method in 2002 [16], Pezeshk's method in 2011 [23] and Chiang's method in 2014 [26]) are chosen as comparison methods. Firstly, we made two artificial images which contain multi-oriented, multi-sized, and curved strings to test our method. In these images, the characters are separated and unbroken, and all the results obtained by the four methods are shown in Figs. 5 and 6. Further, in order to verify the accuracy of the proposed method in applications, four topographic maps are chosen as test images. Characters are separated based on color information and
morphological features [10], and all the grouped results are shown in Figs. 7–10. In these topographic maps, the distribution of characters is much more complex, so it is more difficult to group these characters into text strings accurately. As we observe from all the results, Cao's method can complete character grouping effectively for most characters. In this method, morphology corrosion and dilation operations with square structuring elements are performed, then the characters of the text strings will form connected components. This method works well on the condition that the gaps within strings are smaller than the gaps between strings. But the distribution of the characters in topographic maps is not always like this case, which results in that the characters in different strings are easily linked mistakenly, as shown in Figs. 5(b)–10(b).
104
P. Xu et al. / Neurocomputing 212 (2016) 96–106
Fig. 10. The results obtained from topographic maps M4.
Table 1 The accuracy of the proposed method and the existing methods.
Cao's method Pezeshk’s method Chiang's method The proposed method
All the text strings
Text strings grouped correctly
The accuracy of character grouping (%)
118
69 65 72 114
58.47 55.08 61.02 96.61
Based on the features of the character spacing, Pezeshk proposed a method to group the individual characters into their respective strings using pyramid decomposition with Gaussian kernels [24]. In this method, the successive subsampling and smoothing are performed to merge adjacent characters into blobs, then expanding the reduced image back to its original dimension. This method only needs to estimate the gaps between characters without any prior knowledge about the orientation of the strings and the character sizes. But when the text strings are near to each other, they will be connected mistakenly, as shown in Figs. 5(c)–10 (c).
P. Xu et al. / Neurocomputing 212 (2016) 96–106
Chiang's method can deal with the multi-oriented, multi-sized and curved text strings well. In the grouped results as shown in Figs. 5(d)–10(d), the red pixels are the expanded background pixels which connect to only one character, and each green pixel connects to two characters which will be grouped. However, Chiang's method cannot deal with the beginning or the end characters of the two adjacent texts due to the disadvantages of the method in string curvature condition. Besides, his method cannot distinguish the texts with various colors because the color information is not used, as shown in Fig. 8(d). Compared with the traditional methods, the proposed method deals with the color and direction information better. It can distinguish texts with different colors due to the introduction of the color consistency constraint, as shown in Fig. 8(e). The direction consistency constraint is designed more perfectly, so the characters can be grouped more accurately, especially for the beginning or the end characters of different text strings, less of them are grouped into one text string mistake, or cannot be grouped into any text strings. By counting the number of all the texts in these images and the numbers of the texts grouped by different methods, the accuracy comparison between the proposed method and the existing methods is given in Table 1. From the experimental data, it can be seen that the proposed method can get the best results among these methods. Most of the characters are grouped into text strings accurately except several characters in the special cases.
5. Conclusion It is well known that it is challenging to design a perfect method to group all the characters accurately in topographic maps. This paper proposes an improved method to group characters into text strings based on the designed consistency constraints of the character color, size, spacing and direction. In this method, color features are introduced, and the direction consistency constraint is designed more perfectly, this method can handle the text strings with complex distribution in topographic maps. However, there is still much work to do. The proposed method cannot deal with the characters in the strings with big character spacing, so new methods should be studied further.
Acknowledgments The work was jointly supported by the National Natural Science Foundations of China under Grant nos. 61472302, 61272280, U1404620, 41271447, 61373177, 61502387, and 61272195; The Program for New Century Excellent Talents in University under Grant no. NCET-12-0919; The Fundamental Research Funds for the Central Universities under Grant nos. K5051203020, K50513100006, K5051303018, JB150313, and BDY081422; Natural Science Foundation of Shaanxi Province, under Grant no. 2014JM8310 and 2016JQ6029; The Creative Project of the Science and Technology State of Xi’an under Grant no. CXY1441(1); The State Key Laboratory of Geo-information Engineering under grant no. SKLGIE2014-M-4-4; The Science Research Project of the Education Department of Shanxi Province under Grant nos. 15JK1748 and 15JK1734; The Science Foundations of Northwest University under Grant nos. 14NW25, 14NW27, 14NW28.
References [1] Y.Y. Chiang, C.A. Knoblock, Recognizing text in raster maps, GeoInformatica (2014) 1–27. [2] M. Gong, J. Liu, H. Li, et al., A multiobjective sparse feature learning model for deep neural networks, Neural Netw. Learn. Syst. IEEE Trans. 26 (12) (2015)
105
3263–3277. [3] Y.Y. Chiang, C.A. Adviser-Knoblock, Harvesting geographic features from heterogeneous raster maps, University of Southern California, 2010. [4] Q. Miao, P. Xu, T. Liu, et al., Linear feature separation from topographic maps using energy density and the shear transform, Image Process. IEEE Trans. 22 (4) (2013) 1548–1558. [5] RBoost: Label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. [J]. IEEE Transactions on Neural Networks & Learning Systems, 2015, 1. [6] S. Adam, J.M. Ogier, C. Cariou, et al., Symbol and character recognition: application to engineering drawings, Int. J. Doc. Anal. Recognit. 3 (2) (2000) 89–101. [7] A. Pezeshk, R.L. Tutwiler, Extended character defect model for recognition of text from maps//Image Analysis & Interpretation (SSIAI), in: Proceedings of the 2010 IEEE Southwest Symposium on IEEE, 2010, pp. 85–88. [8] Q. Miao, C. Shi, P. Xu, et al., A novel algorithm of image fusion using shearlets, Opt. Commun. 284 (6) (2011) 1540–1547. [9] M. Gong, M. Zhang, Y. Yuan, Unsupervised band selection based on evolutionary multiobjective optimization for hyperspectral images, Geosci. Remote Sens. IEEE Trans. 54 (1) (2016) 544–557. [10] L.A. Fletcher, R. Kasturi, A robust algorithm for text string separation from mixed text/graphics images, IEEE Trans. Pattern Anal. Mach. Intell. 10 (6) (1988) 910–918. [11] L. Li, G. Nagy, A. Samal, et al., Cooperative text and line-art extraction from a topographic map, in: Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999, pp. 467–470. [12] L. Li, G. Nagy, A. Samal, et al., Integrated text and line-art extraction from a topographic map, Int. J. Doc. Anal. Recognit. 2 (4) (2000) 177–185. [13] Z. Lu, Detection of text regions from digital engineering drawings, IEEE Trans. Pattern Anal. Mach. Intell. 20 (4) (1998) 431–439. [14] M. Caprioli, P. Gamba, Detecting and grouping words in topographic maps by means of perceptual concepts, in: Proceedings of the European Signal Processing Conference, 2000, pp. 889–892. [15] H. Goto, H. Aso, Extracting curved text lines using local linearity of the text line, Int. J. Doc. Anal. Recognit. 2 (2–3) (1999) 111–119. [16] R. Cao, C.L. Tan, Text/graphics Separation in Maps, Graphics Recognition Algorithms and Applications, Springer, Berlin Heidelberg 2002, pp. 167–177. [17] P.P. Roy, E. Vazquez, J. Lladós, et al., A System to Segment Text and Symbols from Color Maps, Graphics Recognition. Recent Advances and New Opportunities, Springer, Berlin Heidelberg 2008, pp. 245–256. [18] U. Pal, S. Sinha, B.B. Chaudhuri, Multi-oriented English Text Line Identification//Image Analysis, Springer, Berlin Heidelberg 2003, pp. 1146–1153. [19] J. Pouderoux, J.C. Gonzato, A. Pereira, et al., Toponym recognition in scanned color topographic maps, in: Proceedings of the Ninth International Conference on Document Analysis and Recognition, 2007, vol. 1, pp. 531–535. [20] P.P. Roy, U. Pal, J. Lladós, et al., Multi-oriented english text line extraction using background and foreground information//Document Analysis Systems, 2008, DAS'08, in: The Eighth IAPR International Workshop on IEEE, 2008, pp. 315–322. [21] P.P. Roy, U. Pal, J. Lladós, et al., Multi-oriented and multi-sized touching character segmentation using dynamic programming//document analysis and recognition, 2009, ICDAR'09, in: Proceedings of the 10th International Conference on IEEE, 2009, pp. 11–15. [22] A. Velázquez, S. Levachkine, Text/Graphics Separation and Recognition in Raster-scanned Color Cartographic Maps//Graphics Recognition. Recent Advances and Perspectives, Springer, Berlin Heidelberg 2004, pp. 63–74. [23] A. Pezeshk, R.L. Tutwiler, Automatic feature extraction and text recognition from scanned topographic maps, Geosci. Remote Sens. IEEE Trans. 49 (12) (2011) 5047–5063. [24] P. Burt, E. Adelson, The Laplacian pyramid as a compact image code, Commun. IEEE Trans. 31 (4) (1983) 532–540. [25] A. Pezeshk, R.L. Tutwiler, Improved multi angled parallelism for separation of text from intersecting linear features in scanned topographic maps, in: Proceedings of the 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2010, pp. 1078-1081. [26] Y.Y. Chiang, C.A. Knoblock, Recognition of multi-oriented, multi-sized, and curved text, in: Proceedings of the 2011 International Conference on Document Analysis and Recognition (ICDAR), 2011, pp. 1399–1403. [27] Y.Y. Chiang, C.A. Knoblock, An approach for recognizing text labels in raster maps//Pattern Recognition (ICPR), in: Proceedings of the 2010 20th International Conference on IEEE, 2010, pp. 3199–3202.
Pengfei Xu is lecturer at Information Science and Technology School, Northwest University in China. His main research interests include: image processing and pattern recognition.
106
P. Xu et al. / Neurocomputing 212 (2016) 96–106 Qiguang Miao is Professor at School of Computer Science and Technology, Xidian University. His research interests include: image processing, and multiscale geometric representations for image.
Xiaojiang Chen is Professor at Information Science and Technology School, Northwest University in China. His research interests include: image processing and mobile internet.
Ruyi Liu is received the B.S. degree from Shaanxi Normal University Shaanxi, China in 2012. Now she is currently working toward the Ph.D. degree in School of Computer and technology, Xidian University Shaanxi, China. Her current interests include image classification and segmentation, pattern recognition, and computer vision methods with applications in remote sensing.
Xunli Fan is received his MS and Ph.D. from Northwestern Polytechnical University in 1996 and 2000 respectively. He is currently an associate professor at Northwest University, China. His current research interests focus on network QoS, wireless sensor networks, and image processing.