Image segmentation incorporating double-mask via graph cuts

Image segmentation incorporating double-mask via graph cuts

ARTICLE IN PRESS JID: CAEE [m3Gsc;May 10, 2016;15:15] Computers and Electrical Engineering 0 0 0 (2016) 1–9 Contents lists available at ScienceDir...

NAN Sizes 1 Downloads 126 Views

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;May 10, 2016;15:15]

Computers and Electrical Engineering 0 0 0 (2016) 1–9

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Image segmentation incorporating double-mask via graph cutsR Wencong Wang a,b, Zhenbo Li a,∗, Jun Yue c, Daoliang Li a a

College of Information and Electronic Engineering, China Agricultural University, Beijing 100083, P.R. China Patent Examination Cooperation Jiangsu Center of The Patent Office, SIPO, Suzhou 215000, P.R. China c College of Information and Electronic Engineering, LuDong University, Yantai 264025, P.R. China b

a r t i c l e

i n f o

Article history: Received 21 June 2014 Revised 18 November 2015 Accepted 3 March 2016 Available online xxx Keywords: Image segmentation Mahalanobis distance Graph cuts Underwater scene

a b s t r a c t This paper described a novel strategy to apply double-masking in image segmentation based on graph cuts. We provided a reasonable method for imposing seed information automatically at object regions where image labels are difficult to determine during complex underwater scene segmentation. Cr component pre-segmentation based on Mahalanobis distance played a role in YCrCb space. The pre-segmentation region was used as object mask for the graph model of the original image in Cr pre-segment binary image. The minimum-enclosing rectangle of the object mask could reduce the calculative area in graph model and the bounding box-provided graph model of the original image with the background mask. Our approach was easily realized and did not require specialized hardware, prior knowledge of underwater conditions, or scene structure. Experimental results demonstrated the robustness and accuracy of the performance of our proposed method. © 2016 Published by Elsevier Ltd.

1. Introduction Many segmentation methods aim to separate an image into meaningful objects using low features such as intensity, color, edge, and texture. However, the complex scene remains the primary constraining factor in image segmentation. Underwater imaging is especially challenging because of the physical properties of such environments. In contrast with common images, underwater image segmentation continues to be a challenging work in image processing and computer vision. Several problems in underwater images include limited range visibility, noise, occlusion, low contrast, non-uniform light, blurring, and diminished color. Traditional general-purpose segmentation methods often fail under these conditions. Therefore, underwater image segmentation is considered to be a classic computer vision problem. In recent years, studies on interactive image segmentation have attracted significant attention. The ultimate goal is to extract an object with as few user interactions as possible. The use of graph cuts has been considered to be an effective image segmentation method that incorporates prior knowledge [1]. The popular method that we use in this study is based on graph cuts [2]. This method minimizes an energy function composed of a data term (computed using color likelihoods of foreground and background) and a boundary term (modulated with the contrast in the image). The study by D.M. Greig, et al. [3] is the first to combine graph cuts with the problem. Their study proposed the creation of a graph according to MAP-MRF energy function, and the construction of a one-to-one correspondence between MRF configurations and graph R ∗

Reviews processed and approved for publication by the Editor-in-Chief. Corresponding author. E-mail addresses: [email protected] (W. Wang), [email protected] (Z. Li), [email protected] (J. Yue), [email protected] (D. Li).

http://dx.doi.org/10.1016/j.compeleceng.2016.03.003 0045-7906/© 2016 Published by Elsevier Ltd.

Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

ARTICLE IN PRESS

JID: CAEE 2

[m3Gsc;May 10, 2016;15:15]

W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

Fig. 1. (a) Original image; (b) image with some constraints, in which blue represents object and red represents background based on user input; (c) generated image segmentation using graph cuts.

cuts. Therefore, the configuration with minimum energy can be found by the min-cut of the graph. Min-cut can be calculated by max-flow algorithm using Ford and Fulkerson’s theory [4]. Unfortunately, because of the relatively slow operation speed and application restriction, Greig et al.’s work did not obtain deserved recognition at the time. Graph cuts received increased attention in the vision domain after the development of Y. Boykov, et al. [5]. Since then, varied methods based on graph cuts have been developed and these approaches have used widely in medical images, video, and natural image segmentation. This method requires the establishment of an energy function that will reach a minimal value when the image is segmented as an expected result. For graph cut segmentation, the energy function is constructed based on regional and boundary information, and can achieve a globally optimal result [6]. However, segmenting a target in the complex underwater scene can be difficult when the graph cuts approach is used alone. Fig. 1(a) shows an original image. In Fig. 1(b), the user constrains some pixels to be object and background using the mouse. Fig. 1(c) shows the segmentation by graph cuts generating some contours. Many unnecessary contours appeared in the result shown in Fig. 1(c), and even some reasonable constraints in Fig. 1(b) are set. These figures show that removing interference from the scene will be necessary. Several researchers developed various algorithms that address this project, and these algorithms can be categorized into three approaches: pixel-, contours-, and shape-based. T. Saitoh’s [7] mathematical morphology method and K. S. Tan et al.’s [8] ideas on clustering were applied in pixel-based segmentation. The contour-based segmentation included geometrical [9] or statistical active shape model and active contour algorithms [10]. L. Massoptier and S. Casciaro developed a graph-cut method initialized by an adaptive threshold [11]. Shape prior-based graph-cut algorithms have also been considerably investigated. These algorithms incorporate the shape information of the object into the energy function to improve segmentation result [12–15]. The energy of the prior shape was combined into the energy function. For shape prior-based graph cuts, establishing the shape template was highly important. Many potential segmented objects can be adopted as the training set to build the shape template. However, the process of capturing the training set results in additional burdens in image processing and a shape model is not easy to define. Thus, a shape model is the direction of our future work. In this study, we propose to impose object and background prior constraints by double-masking from pre-segmentation and applying prior information in graph cuts. The rest of the paper is organized as follows. In Section 2, the background for the graph cuts based on the energy function and our segmentation strategy incorporating double-mask in graph cuts are presented. In Section 3, we first describe our experimental platform and data and then show the segmentation results and experimental analysis. Some comparison methods are also provided. The conclusion and future works are shown in Section 4. With the increasing interest in exploiting aquarium fish and the demand for aquaculture development, increasing attention has focused on acquiring and comprehending exact information of underwater scenes. In this study, the method can be applied widely in the appreciation of aquarium fish and posture recognition in aquaculture. 2. Methodology 2.1. Graph cuts Many segmentation problems can be formulated in terms of energy minimization, which could be handled in the maximum flow problem in graphs. Therefore, graph cut segmentation achieved an optimal solution by minimizing an energy function via the max-flow/min-cut algorithm. An undirected graph can be denoted as G = where V is a set of vertices and E is a graph edge that connects every two neighbor vertices. The vertex V includes neighborhood nodes that correspond to the pixels and two terminal nodes that consist of s (source) and t (sink). This type of graph is also called s-t graph, where in the image, the s node usually represents the object and the t node denotes the background. In this type of graph, two types of edges exist, n-links and t-links. In the graph, each edge is assigned a non-negative weight denoted as we , which is also named “cost.” A cut is a subset of edges E that can be denoted as C and expressed as C⊂E. The cost of cut |C| is the sum of the weights on edges C, which is expressed as follows.

|C | =



we

(1)

e∈C

Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

ARTICLE IN PRESS

JID: CAEE

W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

[m3Gsc;May 10, 2016;15:15] 3

Fig. 2. Illustration of s-t graph. The pixels correspond to the neighbor nodes in the graph (except s and t nodes). The solid lines in the graph are n-links whereas the dotted lines are t-links [18].

The max-flow/min-cut algorithm developed by Boykov and Kolmogorov can be used to obtain the minimum cut for the s-t graph. Therefore, the graph is divided by this cut and the nodes separated into two disjoint subsets S and T, where s ∈ S, t ∈ T, and S ∪ T = V . The two subsets correspond to the foreground and background in image segmentation. Fig. 2 depicts this type of graph. Graph cut segmentation achieves an optimal solution by minimizing an energy function via the max-flow/min-cut algorithm [2]. The energy function is defined in the following equation.

E (L ) = α R(L ) + B(L ),

(2)

where L = {l1 , l2 , l3 , · · ·, l p , · · ·l|P| }represents a binary vector whose component lp specifies the assignment of background or foreground to pixel p in an arbitrary set of data elements P in an image I, and α is a non-negative coefficient that specifies relative importance between R(L) and B(L). R(L) is called a region term that incorporates regional information into the segmentation, and B(L) is called the boundary term, which incorporates the boundary constraint into segmentation. When α is set to be 0, regional information is ignored and only the boundary information is considered. The combination of the energy function with our method will be introduced in Section 2.4. 2.2. Pre-segmentation approach A color image with YCrCb representation comprises multiple homogeneous regions with different ranges for each channel. The pixels belong to each homogeneous region, with Y, Cr, and Cb values within the ranges of that homogeneous region for each channel. Zhang et al. extracted Y, Cr, and Cb components in the YCrCb color space and applied iterative threshold segmentation and gradient sharpening [17]. The Cyprinus carpio object performed remarkably during our experiments when using Cr component pre-segment in YCrCb space-based on Mahalanobis distance. Fig. 3 shows the proposed presegmentation technique generated the binary segmentation image in Y, Cr, and Cb channels, respectively. More specifically, Mahalanobis distance is usually written as

dM (x, μ ) =



(x − μ )T A−1 (x − μ ),

(3)

where dM (x, μ) is the Mahalanobis distance between Cr vector and total color space average vector. A−1 is the inverted covariance matrix of a specific pattern and (x − μ )T is the matrix transpose operation from(x − μ ). The high quality distance metric is capable of identifying and discriminating between relevant and irrelevant features. Fig. 3 (c) shows that we can detect the potential object area using the above method. 2.3. Double-mask approach The binary image of Cr component, which is shown in Fig. 4 (a), exhibits many small white noise spots. Thus, these small spots were removed to capture the accuracy mask region. The minimum enclosing rectangle of the object can then be calculated as shown in Fig. 4 (b). In our approach, the first mask represented the rectangle region, which could reduce the calculation cost of the graph cuts. The red bounding box provided graph models of the original image with background seeds. The white region was considered as the second mask, which provided object seeds to the graph model of the original image. Coordinates were used as transmit medium of seeds between the original and mask images. The double-mask information proposed was generated completely in Fig. 4 (b). Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

ARTICLE IN PRESS

JID: CAEE 4

[m3Gsc;May 10, 2016;15:15]

W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

Fig. 3. Three component segmentation results. Only a small grey area is almost invisible in (d) because of the obscure representation when capturing images in Cb component.

Fig. 4. (a) Cr component pre-segmentation and (b) double-mask image.

2.4. Graph cuts implementation The energy function was explained in detail in Section 2.1. More specifically, Eq. (2) was written as

E (L ) = α



R p (l p ) +



B · δ (l p , lq ),

(4)

{ p,q}∈N

p∈P

where N is the set of neighboring pixels and lp represents the label assigned to pixel p. The particular forms for Rp (lp ) are the penalty for assigning label lp to pixel p. The weight of Rp (lp ) can be obtained by comparing the intensity of pixel p with the given histogram of the object and background. The weight of the t-links is defined in the following equations:









R p (1 ) = − ln Pr I p | ob j , R p (0 ) = − ln Pr I p | bkg ,

(5)

(6)

Eqs. (5) and (6) show that when Pr(I p | ob j ) is larger thanPr(I p | bkg ), Rp (1) will be smaller thanRp (0). Thus, when the pixel is more likely to be the object, the penalty for grouping that pixel into object should be smaller, which can reduce the energy in (2). Thus, when all pixels have been separated correctly into two subsets, the regional term will be minimized.  B · δ (l p , lq ) in Eq.(4) is the boundary term, which is defined as the following equations [16]. { p,q}∈N

 δ ( l p , lq ) =

1 0

if if

l p = lq l p = lq

(7)

Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

ARTICLE IN PRESS

JID: CAEE

[m3Gsc;May 10, 2016;15:15]

W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

YCrCb space

Cr-Component

5

Double-mask Segmentation Graph cuts

Fig. 5. Structure of the proposed method.

Fig. 6. Configuration sketch of data collection platform.



(Ip − Iq )2 B ∝ exp − 2σ 2

(8)

The regional constraint can be interpreted as assigning labels lp and lq to neighboring pixels. When neighboring pixels have similar labels, the penalty is 0, which means the regional term would only sum up the penalty at the segmented boundary. σ can be viewed as camera noise. The penalty is only extremely high when the intensity of two neighboring pixel is highly similar, otherwise, the penalty low. Thus, when the energy function obtains the minimum value, energy is more likely to occur at the object boundary. Thus, the minimum energy problem is converted into the graph cuts problem. Based on our discussion in the previous section, we proposed incorporating seeds information selectively based on the needs of the graph construction. We can increase the number of determined labels (L = {l1 , l2 , l3 , · · ·, l p , · · ·l|P| }) to adjust the input in the energy function in Eq. (2). The graph model was constructed in a special area of the original image, and corresponded to the red bounding box by coordinate delivery. In our method, the double-mask provided object and background seeds with the white mask and red bounding box, respectively. In this study, we proposed a new strategy that achieved Cyprinus carpio image segmentation by incorporating doublemask via graph cuts. For the first contribution of the study, a pre-segmentation strategy, which was combined with YCrCb color space segmentation based on Mahalanobis distance, was shown. This approach used the correct object region as mask. Pre-segmentation supplied important data sources for graph cuts. For the second contribution, we proposed the doublemask method to decrease the computational area of graph model significantly. Fig. 5 shows the structure of our proposed method. 3. Experiments 3.1. Platform and data Underwater image data were captured using a data collection platform based on computer vision. The platform comprised five main modules: a water tank, a lamp, an experimental desk, an aerator, a CCD color camera (DH-HV3151UC COMOS), and a PC. The entire device is shown in Fig. 6. The water tank was placed on the experimental desk to maintain the CCD camera at an appropriate height level. The water quality was guaranteed by the aerator. Videos (750 frames each video) were captured by the CCD camera and saved in the PC in AVI format. Cyprinus carpio images were captured using our program, and the resolution of these images is 872 × 436. Some images were selected arbitrarily to test our proposed method. Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

ARTICLE IN PRESS

JID: CAEE 6

[m3Gsc;May 10, 2016;15:15]

W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

Fig. 7. (a), (e), (h) are samples selected of Cyprinus carpio in different postures. Table 1 Statistical results of OS.

a1 a2 a3 a4 Average

Otsu

PCNN

YUV

Our method

3.12% 1.28% 3.27% 0.35% 2.01%

1.32% 0.96% 0.8% 0.68% 1.02%

75.73% 71.01% 69.33% 54.11% 67.55%

88.16% 87.19% 86.61% 83.26% 86.31%

3.2. Results and analysis Images were selected arbitrarily for our pre-segment approach. Fig. 7 shows the accuracy results of double-mask detection. In these images, the detection windows reduced the size of the graph model, which will be calculated. Double-mask information was transformed into the original image and additional prior constraints were incorporated into the energy function. 3.3. Method comparison We used the pixel-level segmentation overlap score, OS, to quantify accuracy. The quality of the segmented region with respect to ground-truth object segmentation was measured as Eq. 8:

OS =

|GT ∩ R| , |GT ∪ R|

(9)

where GT represents the object region associated with region R’s majority pixel; GT denotes the segmentation regions handled manually by humans. R is our segmentation region. Higher OS gained by the output indicates its higher effectiveness and accuracy. In this section, we analyzed the segmentation results of Otsu, PCNN, and YUV and present their results using collection images (Fig. 8) . The quantity results with respect to the images were shown in Table 1. The accuracy of the Otsu and PCNN segmentations was significantly low. A remarkable feature of the YUV color spaces was revealed. However, Fig. 8(d1−d4) shows that the eyes and some regions were not segmented because of the scale. These problems were addressed by the proposed method based on graph cuts. Table 1 shows that the average accuracy rates of Fig. 8(b–d) were 2.01%, 1.02%, and 67.55%, respectively. Although these methods could segment the object region, the entire Cyprinus carpio segmentation was not implemented precisely. The average accuracy rate of our method was 86.31% in Fig. 8(e). Four videos (750 frames of each video) were selected to count Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

JID: CAEE

ARTICLE IN PRESS W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

[m3Gsc;May 10, 2016;15:15] 7

Fig. 8. Several results with different segmentation methods.

the execution time. Average execution time was approximately 2.17 s for each image with masks, and average execution time was approximately 6.31 s for each image without masks. Therefore, Cyprinus carpio object segmentation incorporating double-mask via graph cuts was a novel and effective approach. 4. Conclusion and future works A novel and effective image segmentation method based on graph cuts was proposed to extract the Cyprinus carpio target from a complex underwater scene. Experimental results showed that a traditional image segmentation method based on threshold was unsuitable for segmenting the Cyprinus carpio target directly. Good segmentation results might not be achieved because of non-uniform light, blurring, and diminished color in the underwater environment. The remarkable performance of Cyprinus carpio in the Cr component in YCrCb color space was an important cue in our experiments. The first step in our double-mask method was pre-segmentation. Double-mask formation based on graph cuts could effectively improve the performance of the Cyprinus carpio segmentation. Experimental data were insufficient to support the work in this study, resulting in a defect in the comparative advantage of the proposed method. The amount of data from the target mask can be reduced further to improve the efficiency of the algorithm. In future work, we intend to utilize other methods to study several Cyprinus carpio features and extend our method to other kinds of Cyprinus carpio. Experimental platforms will be established to collect depth information on the species. We will also continue searching for more effective and general segmentation approaches to improve our experiments. Acknowledgments This research is financially supported by the Chinese Universities Scientific Fund (2013QJ052), the National Science and Technology Support Program (2011BAD21B01 & 2012BAD35B07), the National Natural Science Foundation of China Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

JID: CAEE 8

ARTICLE IN PRESS

[m3Gsc;May 10, 2016;15:15]

W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

(61100115, 61472172, 61471133), the Science and Technology Development Plan of Shandong Province (2015GGX101019), Opening Foundation of Engineering Research Center of Digital Media Technology, Ministry of Education (2015AA0 0 02), and the Natural Science Foundation of Shandong Province (ZR2012FM008). References [1] Vicente S, Kolmogorov V, Rother C. Graph cut based image segmentation with connectivity priors. Computer vision and pattern recognition, CVPR; 2008. [2] Boykov Y, Jolly M. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In: International conference on computer vision, vol.1; 2001. p. 105–12. [3] Greig DM, Porteous BT, Seheult AH. Exact maximum a posteriori estimation for binary images. J R Stat Soc, series B (Methodological), 1989;51(2):271–9. [4] Ford LR, Fulkerson DR. Flows in networks. Princeton: Princeton University Press; 1962. [5] Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 2001;23(11):1222–39. [6] Yuri YB, Lea GF. Graph cuts and efficient N-D image segmentation. Int J Comput Vision 2006;70(2):109–31. [7] Saitoh T, Tamura Y, Kaneko T. Automatic segmentation of liver region based on extracted blood vessels. Syst Comput Japan 2004;35(5):633–41. [8] Tan KS, Mat Isa NA, Lim WH. Color image segmentation using adaptive unsupervised clustering approach. Appl Soft Comput 2013;13:2017–36. [9] Sojar V, Stanisavljev D, Hribernik M, Glui M, Kreuh D, Velkavrh U, Fius T. Liver surgery training and planning in 3D virtual space. In: International Congress Series, 1268(06); 2004. p. 390–4. [10] Kass M, Witkin A, Terzopoulos D. Snakes: active shape models. Int J Comput Vis 1987;1:321–31. [11] Massoptier L, Casciaro S. Fully automatic liver segmentation through graph-cut technique. In: Proceedings of the 29th annual international conference of the IEEE EMBS cité internationale; 2007. p. 23–6. [12] Freedman D, Zhang T. Interactive graph cut based segmentation with shape priors. In: IEEE computer society conference on CVPR, vol. 1; 2005. p. 755–62. [13] Wang H, Zhang H. Adaptive shape prior in graph cut segmentation. In: IEEE international conference on ICIP; 2010. p. 2029–3032. [14] Zhou J, Ye M, Zhang X. Graph cut segmentation with automatic editing for Industrial images. In: Inter conference on ICICIP; 2010. p. 633–7. [15] Wang H, Zhang H, Ray N. Adaptive shape prior in graph cut image segmentation. Pattern Recognit 2013;46:1409–14. [16] Yuri YB, Lea GF. Graph cuts and efficient N-D image segmentation. Int J Comput Vis 2006;70(2):109–31. [17] Zhang C, Feng X, Li L, et al. Identification of cotton contaminants using neighborhood gradient based on YCbCr color space. In: Proceedings of the 2nd international conference on signal processing systems, ICSPS, Vol. 3; 2010. p. 733–8. [18] Yi F, Moon I. Image segmentation: a survey of graph-cut methods. In: Proceedings of the 2012 International Conference on Systems and Informatics (ICSAI 2012); 2012. p. 1936–41.

Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003

JID: CAEE

ARTICLE IN PRESS W. Wang et al. / Computers and Electrical Engineering 000 (2016) 1–9

[m3Gsc;May 10, 2016;15:15] 9

Wencong Wang received his M.E. from the College of Electrical and Information Engineering, China Agricultural University in 2014. His research interest is analysis of animals’ behavior based on computer vision. He is currently a patent examiner at Patent Examination Cooperation Jiangsu Center of The Patent Office, SIPO. Dr. Zhenbo Li received his Ph.D degrees from the Institute of Computing Technology, Chinese Academy of Sciences, China in 2007. He is an associate professor at College of Information and Electrical Engineering Department, China Agricultural University, China. His research interests are computer vision, computer graphics, and information management in agriculture. Prof. Jun Yue received her Ph.D from College of Economics & Management, China Agricultural University, China in 2007. She is an professor in College of Information and Electrical Engineering Department, LuDong University, China. Her research interests are cross media retrieval, sematic web. Prof. Daoliang Li received his Ph.D from Engineering College, China Agricultural University China in 1999. He is a professor at College of Information and Electrical Engineering Department, China Agricultural University, China. His research interest is information processing in agriculture.

Please cite this article as: W. Wang et al., Image segmentation incorporating double-mask via graph cuts, Computers and Electrical Engineering (2016), http://dx.doi.org/10.1016/j.compeleceng.2016.03.003