Segmentation of breast ultrasound image with semantic classification of superpixels

Segmentation of breast ultrasound image with semantic classification of superpixels

Journal Pre-proof Segmentation of Breast Ultrasound Image with Semantic Classification of Superpixels Qinghua Huang , Yonghao Huang , Yaozhong Luo , ...

1MB Sizes 0 Downloads 124 Views

Journal Pre-proof

Segmentation of Breast Ultrasound Image with Semantic Classification of Superpixels Qinghua Huang , Yonghao Huang , Yaozhong Luo , Feiniu Yuan , Xuelong Li PII: DOI: Reference:

S1361-8415(20)30024-4 https://doi.org/10.1016/j.media.2020.101657 MEDIMA 101657

To appear in:

Medical Image Analysis

Received date: Revised date: Accepted date:

14 March 2019 18 January 2020 22 January 2020

Please cite this article as: Qinghua Huang , Yonghao Huang , Yaozhong Luo , Feiniu Yuan , Xuelong Li , Segmentation of Breast Ultrasound Image with Semantic Classification of Superpixels, Medical Image Analysis (2020), doi: https://doi.org/10.1016/j.media.2020.101657

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.

   

Highlights A novel segmentation method for breast ultrasound is proposed Breast lesion is segmented by semantic classification of superpixels A K-NN based reclassification is adopted to refine the segmentation Experimental results show the advantages of the proposed method

1

Segmentation of Breast Ultrasound Image with Semantic Classification of Superpixels Qinghua Huang1,5*, Yonghao Huang2, Yaozhong Luo2, Feiniu Yuan3, and Xuelong Li4,5 1

School of Mechanical Engineering, and Center for OPTical IMagery Analysis and Learning (OPTIMAL),

Northwestern Polytechnical University, Xi'an 710072, China. 2

School of Electronic and Information Engineering, South China University of Technology, Guangzhou

510006, China. 3

College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai

201418, China. 4

School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China.

5

Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University,

Xi'an 710072, China. *Contact: [email protected] Abstract: Breast cancer is a great threat to females. Ultrasound imaging has been applied extensively in diagnosis of breast cancer. Due to the poor image quality, segmentation of breast ultrasound (BUS) image remains a very challenging task. Besides, BUS image segmentation is a crucial step for further analysis. In this paper, we proposed a novel method to segment the breast tumor via semantic classification and merging patches. The proposed method firstly selects two diagonal points to crop a region of interest (ROI) on the original image. Then, histogram equalization, bilateral filter and pyramid mean shift filter are adopted to enhance the image. The cropped image is divided into many superpixels using simple linear iterative clustering (SLIC). Furthermore, some features are extracted from the superpixels and a bag-of-words model can be created. The initial classification can be obtained by a back propagation neural network (BPNN). To refine preliminary result, k-nearest neighbor (KNN) is used for reclassification and the final result is achieved. To verify the proposed method, we collected a BUS dataset containing 320 cases. The segmentation results of our method have been compared with the corresponding results obtained by five existing approaches. The experimental results show that our method achieved competitive results compared to conventional methods in terms of TP and FP, and produced good approximations to the hand-labelled

2

tumor contours with comprehensive consideration of all metrics (the F1-score = 89.87% ± 4.05%, and the average radial error = 9.95%±4.42%). Keywords: image segmentation, semantic classification, breast tumor, ultrasound.

1. Introduction Breast cancer has become one of the main causes of cancer death among women in the United States in 2018 (Siegel et al., 2018). However, the molecular etiology of breast cancer is not precise yet. Therefore, early detection and treatment are the essential measure to reduce mortality (Huang et al., 2017, Qi et al., 2019). Due to noninvasiveness, speediness and low cost, medical ultrasound (US) imaging is regarded as one of the most common methods for breast tumor imaging (Huang et al., 2019, Sahiner et al., 2007). However, the routine US examination requires the radiologist’s experience and skills. To reduce the subjectivity of the diagnosis, computer-aided diagnosis (CAD) became more and more useful (Huang et al., 2018, Xiao et al., 2002). A CAD system makes use of intelligent computing technology to automatically output the diagnostic results as reference for doctors, which both improves the diagnostic accuracy and reduces the working strength of the doctors (Huang et al., 2015). In general, a typical CAD system for BUS examination consists of four steps, i.e. image preprocessing, segmentation of breast lesion, image feature extraction and classification (Huang et al., 2019b). As a very critical step, image segmentation separates the tumor region from the background and is decisive for the subsequent diagnosis. Like the low-dose computed tomography (Chen et al., 2019), BUS image segmentation remains a challenging issue (Peng et al., 2016, Xian et al., 2016) due to the poor quality of the US images, e.g. high speckle noise (Wells and Halliwell, 1981), low contrast, low single noise to ratio (SNR), intensity inhomogeneity and blurry boundaries (Xiao et al., 2002, Xian et al., 2018). In the past decade, a large number of segmentation methods for BUS images have been developed and can be grouped into six main categories: i.e. graph-based methods (Chang et al., 2015, Huang et al., 2012, Huang et al., 2014, Huang et al., 2015, Luo et al., 2017, Zhou et al., 2014, Zhang et al., 2010), deformable models methods (Gao et al., 2012, Li et al. 2010), learning-based methods (Kumar et al., 2018, Moon et al., 2014, Shan et al., 2012), thresholding-based methods, region growing methods and watershed methods (Zhang and Zhang, 2011, Xian et al., 2014). Deformable model is one of the most commonly used methods for the BUS images. Active contour model (ACM) is a kind of deformable model and it is also named as “Snakes”. Gao et al. (2012) proposed an improved edge-based active contour model method in a variational level set formulation for semi-automatically capturing breast tumor boundaries in US images. In conventional ACM,

3

the topology of closed curves would not change in the course of approaching the manual contour of the objective region. Moraru et al. (2014) presented an effective image energy function in segmentation by considering the image features, first-order textural features and four masks. The method combined the metric of original ACM and statistical texture information. Although it can obtain accurate contours for some BUS images, it requires manually set initial contours which bring much uncertainty of segmentation output. In order to solve the invariability of the topology, curve evolution theory and level set method were introduced. Li et al. (2010) presented distance regularized level set evolution (DRLSE) and applied it to medical image segmentation. Learning-based methods (including supervised and unsupervised) view the image segmentation as a classification problem, and image features are extracted for optimizing a classifier to discriminate different pixels. Xian et al. (2012) used the maximum a posteriori probability (MAP) with Markov random field (MRF) segmentation framework to combine the spatial priori knowledge with the frequency constraints for the BUS images. As an unsupervised example, Moon et al. (2014) adopted the fuzzy C-means (FCM) clustering to detect the hypoechogenic regions and they quantified seven features to predict the tumor. Recently, supervised learning method is more commonly used. Deep convolutional neural network has been widely used for various tasks of medical image analysis (López-Linares et al., 2018, Zhao et al., 2018). As for BUS images segmentation, Kumar et al. (2018) used convolutional neural network (CNN) and an improved U-net algorithm to segment the suspicious breast masses automatically. Nevertheless, their results are fairly close to the DRLSE. In contrast, thresholding-based methods, region growing methods and watershed methods are the classical methods and usually combined with other methods to obtain good results. For example, Zhang et al. (2011) employed an extended fuzzy watershed method for image segmentation and developed a fully automatic algorithm for BUS image segmentation. In the past few years, many graph-based segmentation method has been applied in the BUS images. The image is constructed as a weighted connected undirected graph in graph-based segmentation. In 2010, Zhang et al. (2010) used the discriminative graph-cut approach. Then, they trained a classifier to distinguish the tumor from the image and obtained the segmentation result. Zhou et al. (2014) proposed a novel segmentation method based on mean shift and graph cuts (MSGC) for BUS images and MSGC is rapid as well as efficient. Huang et al. (2014) put forward a parameter-automatically optimized robust graph-based (PAORGB) segmentation method. Nevertheless, the PAORGB produces over-segmentation or undersegmentation when it is applied to more complex BUS images. Subsequently, Luo et al. (2017) introduced the multi-objective optimization functions and combined the information of the region and edge into the RGB segmentation to overcome the problem of over-segmentation and under-segmentation.

4

Those above-mentioned methods take into account the low-level visual information of image itself, e.g. brightness, edge, texture, contrast, outline, etc. The segmentation depending on those low-level image features often fails when the boundaries are incontinuous or not adequately clear. That is due to a fact that most of conventional segmentation methods (e.g. Snake and level set) aim to find edges to locate and extract an object without understanding an object. Although some learning based methods can recognize different regions by classifying or clustering the pixels, they make use of low-level image features and cannot really understand the different objects in the image. In contrast, deep learning based methods (e.g. U-Net) can extract high-level semantic features in the network model and achieved better results than conventional methods. Nevertheless, the learned features in a deep learning model are not interpretable, and training the network requires big data of medical images which are often unavailable. Let us imagine the mental process of a doctor in diagnosing a BUS image. Without a very big data for training, the doctor could easily find the location of lesions, make evaluations on some high-level semantic features, e.g. the American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) Lexicon, and finally make a judgment on the tumor type. It is therefore concluded that high-level semantics can be more helpful in judging the tumor region and tumor type. Inspired by the use of local receptive field in the CNN model, we propose a novel segmentation method with semantic classification of superpixels for BUS images. A superpixel can be regarded as a local receptive field. The classification procedure distinguishes the superpixels based on the semantics in the local receptive filed, understands which superpixels belong to the lesion, and finally guides the segmentation towards a right direction. Furthermore, the locations of neighboring subregions have been taken into account and the k-nearest neighbor (KNN) method (Duda and Hart, 1973) is applied to help refine the segmentation results. Our main contributions are summarized as follows: (1) Unlike traditional segmentation methods which mainly rely on low-level image features (e.g. edge, gradient, gray-level, etc.), we propose a novel segmentation method that uses image semantics based classification for BUS images. (2) Inspired by a fact that the CNN understands an image using the local receptive fields, we split the whole image into a number of superpixels, in each of which the high-level semantic features (i.e. visual words) with good interpretability are extracted. The classification of the superpixels would provide a good guidance in segmentation of the tumor region. (3) Taking advantage of neighboring relationship of segmented tumor regions, we innovatively put forward a reclassification step to refine the segmentation. It is easy to implement but effective.

5

This paper is organized as follows. Section 2 gives the proposed method in detail. Section 3 presents the experiments and the results, including the comparisons among different BUS segmentation methods. The final section provides some discussions and draws the conclusion.

2. Methods In this paper, the SLIC (Achanta et al., 2012) is firstly used to generate the superpixels from a given image. In each superpixel, the low-level image features including gray, textures and local binary pattern (LBP) are computed and the semantic information (i.e. the bag of visual words) is extracted. The BPNN (Rumelhart et al., 1986) is utilized to classify the superpixels to obtain the initial result. Then, we designed a KNN-based approach to refining the segmentation result. The flowchart of the proposed method is shown in Figure 1. In the remainder of this section, we elaborate each step in the proposed method. Original BUS image

Final result

Crop ROI

KNN reclassification and postprocessing

Bilateral filtering BPNN classification for initial results

Histogram equalization Mean shift filtering

Kmeans and a bag-ofwords model

Generate superpixels using SLIC

Extract features: gray histogram, GLCM and co-occurrence matching of LBP

Figure 1. Flowchart of the proposed method for BUS image segmentation with semantic classification of superpixels.

2. 1. Preprocessing 2.1.1. Cropping of the tumor Due to high speckle noise, low contrast and inhomogeneity, BUS image segmentation remains a challenging task. To achieve good segmentation results, data preprocessing is necessary. In an original BUS image, the tumor region often occupies a small portion of the whole image and the segmentation method

6

would be unable to locate where the tumor is. Therefore, we asked the doctor firstly to crop the region of interest (ROI) in which the tumor occupies the main portion. Practically, the operator determines two diagonal points and portrays a relatively smaller rectangle from the original BUS image. The rectangle should contain the tumor region and the tumor region accounts for about 30%-50% of the whole ROI. In this way, it can significantly reduce the interference of other irrelevant areas. Furthermore, it is more efficient to generate the superpixels. 2.1.2. Bilateral filtering Due to the speckle noises, the BUS images should be despeckled. Bilateral filter (Elad, 2002) is nonlinear filter and easy to implement. It not only reduces the noise but also preserves the edges. Therefore, the bilateral filter is used for speckle reduction in this study. 2.1.3. Histogram equalization Low contrast often occurs in BUS images due to different imaging machines or the varied characteristics of tumor tissues. To improve the contrast of BUS images, the histogram equalization (Gonzalez and Woods, 2002) is conducted to enhance the filtered ROI. Histogram equalization maps the pixel gray of the original image from the relatively concentrated distribution to the approximately uniform distribution. After some preliminary experiments, we found that the classical histogram equalization method could sufficiently enhance the contrast and preserve image details. 2.1.4. Pyramid mean shift filtering Due to the nonhomogeneity of the intensities in different BUS images, pyramid mean shift filter (Comaniciu and Meer, 1999) is adopted for further preprocessing. Mean shift filtering, a nonparametric method based on the gradient ascent of the probability density, could find the peaks distribution of the color space through window scanning. It can well improve the homogeneity of BUS images and further decrease the speckle noise. Figure 2 shows an example of the image preprocessing in this study.

Crop a ROI

After bilateral filter

After histogram equalization

Figure 2. An example of the BUS image preprocessing.

7

After pyramid mean shift filtering

2. 2. Simple Linear Iterative Clustering After cropping the ROI, the SLIC is adopted to generate the superpixels. The BUS images are grayscale images. In SLIC, a three-dimensional (3D) space is used to represent a pixel in the ROI, e.g. [l, x, y]. [l] refers to the brightness of the image and [x, y] stands for the pixel’s spatial position. The only parameter of SLIC is the desired number of the superpixels. Given the desired number M, all pixels N are divided uniformly in M regions of N/M. In order to generate the similarly-sized superpixels, the grid interval is set to √

. Calculating the gradient in a 3×3 neighborhood and adjusting the centers to the

lowest gradient position can avoid a problem that the center of the superpixel is on the edge. The SLIC method is initialized by choosing M superpixel centers [li, xi, yi], where i = 1, ..., M. Let S stand for the initial interval between two adjacent cluster centers. Because the ideal area of a superpixel is approximately S2, it is natural that the pixels belonging to the i-th cluster are supposed to be in a region 2S × 2S around the superpixel center. To evaluate the distance between the cluster center and other pixels, a distance measure is defined as follows: √( √(



) )

(1) (

)

(2)

( ⁄ )

(3)

where dtotal is the combination of Light space distance dc and XY-spatial distance ds. The parameter m controls the relative importance between gray similarity and spatial proximity. To generate the regular superpixels, m is set to a large value in our method. The example of generating superpixles from the filtered image is shown in Figure 3. The general procedures of SLIC can be referred to Achanta et al., 2012.

(a)

(b)

Figure 3. An example of generating the superpixels from the BUS image. (a) The ROI cropped from the original image, and (b) the superpixels generated by the SLIC.

8

2. 3. Feature Extraction Because the semantic information is learned from low-level features, we extract some often used image features for every superpixel. We choose three types of features, i.e. gray histogram, gray level cooccurrence matrix (GLCM) (Haralick et al., 1973) and co-occurrence matching of local binary pattern (LBP) (Yuan et al., 2019). In general, there are 598 dimensional features extracted for each superpixel. 2.3.1. Features Based on Gray Histogram There are shadows in the tumor region and its gray value may be smaller than other tissues in the BUS image. Therefore, the gray histogram describes the gray scale distribution of the image and therefore is useful for discriminating the tumor. It is defined as below: ()

()

(4)

where S(i) is the number of pixels of the i-th grayscale, N refers to the number of all pixels and L stands for the gray level. In this paper, we used six first-order statistics (Materka and Strzelecki, 1998) to represent the histogram. More specifically, gray mean is the average of the gray levels, gray variance reflects the degree of dispersion, kurtosis denotes the peakness, skewness measures the symmetry of the histogram, energy indicates the uniformity of the gray distribution and entropy shows the heterogeneity of the image texture. Formal definitions of the six features based on gray histogram are given in Table 1. Table 1. Features based on gray histogram statistic

Equation

Gray variance

()



Gary mean 𝜎



(

) (

⁄𝜎 ∑

Skewness Kurtosis

⁄𝜎 ∑

Energy

∑ ∑

Entropy

()

(

) )

()

()

[ ( )] ()

[ ( )]

2.3.2 Features Based on Gray Level Co-occurrence Matrix Image textures are conventionally used for image classification and object recognition. Gray level cooccurrence matrix is a second-order histogram for describing the image textures and can be represented as a symmetric matrix to quantify the joint distribution of the pairwise pixels. Different spacings and different orientation angles produce different values of the GLCM. In this paper, we set a spacing of 1 pixel and used

9

four different orientation angles (0o, 45o, 90o, 135o) and 21 statistics. The final GLCM is the average of each GLCM feature across the four orientation angles. If the pixel values are highly correlated, the probabilities would converge around the diagonal. Table 2 lists the features based on GLCM. Table 2. Features based on GLCM statistic

Equation (

Maximum probability

) |

|

(

)



Dissimilarity



Contrast

⁄[



Homogeneity

⁄[



Inverse Energy

√∑

Entropy



Mean

∑ (



Correlation

)

(

)(

Sum SD

√∑

Sum entropy



(

)

Difference SD

√∑

Difference entropy



() ()

(

)

∑∑

Independent marginal entropy Relevant information 1

(

Relevant information 2



10

()

[

()

()

()

( ()

∑∑

Joint marginal entropy

()

() ∑

Marginal entropy

()

() ∑

Difference average

)

()



Sum average

|⁄ ]

|

√∑

Standard deviation (SD)

) ]

(

( ))

()

( ()

)⁄

(

(

( )) ) )]

In Table 2, Pij denotes the element at the ith row and jth column in the matrix, and px, py, px+y and px-y are four edge probability distributions. They are defined as follows: ()



(5)

()



(6)

( )

(

( )

(

) |

|)

∑ ∑

(7)

∑ ∑

(8)

Among these features based on GLCM, some measure the texture of the images (e.g. homogeneity, inverse, energy and entropy) and the others describe the complexity of gray scale changes (e.g. dissimilarity and contrast). Given a superpixel, it can be extracted to a 21-dimensional feature vector. 2.3.3 Features Based on Local Binary Pattern In addition to the GLCM features, we use another texture descriptor, an improved local binary pattern (LBP), for the subsequent semantics extraction. The basic LBP (Ojala et al., 1996) compares a center pixel with its adjacent pixels in a 3×3 rectangle block. If an adjacent pixel is smaller than the center pixel, the value of the corresponding bit will be 0 and otherwise it will be 1. Eight-bit binary code can therefore be obtained. Ojala et al. (2002) designed three mapping patterns including uniform mapping pattern (U2), rotation invariant mapping pattern (RI) and rotation invariant and uniform mapping (RIU2). U2 is adopted in this paper. In the LBP code, if the 0/1 and 1/0 transitions are no more than 2, it would be the U2 pattern and is assigned a same code. LBP with U2 pattern is computed as follows: {



(

)

(

)

(9)

where (

)

| (

)

(

)|



| (

)

(

)|

(10)

and ( )

{

(11)

where gi (i=0, 1…7) corresponds to the gray value of i on the 3×3 rectangular block and gc is the gray value of the central pixel. In this paper, a 59-bin histogram is used to count different U2 pattern codes. Nevertheless, the LBP only describes the textures within a very small block, and cannot demonstrate high order textures within a relatively larger neighborhood. Inspired by the fact that 0-0, 1-1, 0-1, 1-0 matching of two bits from a pair of LBP codes could represent the fluctuation of the pixel values in local regions, Yuan et al. (2019) proposed two matching measures on the basis of the LBP, i.e. similarity matching based LBP (SMLBP) and dissimilarity matching based LBP (DMLBP). They applied larger local neighborhoods of LBP to determine offsets for omputation of code occurrence. Based on the obtained LBP

11

code with radius R and sampling number P, similarity matching measure si and dissimilarity matching measure di can be computed as follows: ( ) ( )

∑ ∑

( ) ( ) ∑



( )[



[

( )][

( )[ ( )] ∑

( )]

( )] [

( )] ( )

(12) (13)

where co(k) and ci(k) refer to the k-th bit of the center code and its i-th neighbor code on the enlarged LBP map, respectively, ε is a positive number with a very small value to prevent the denominator from being zero. Furthermore, SMLBP and DMLBP can be calculated as: ∑ ∑

( (

)

(14) )

(15)

where the parameter ms and md are the means of the similarity matching measure and dissimilarity matching measure, respectively, and Rs, Rd, Ps, Pd correspond to the radii and sampling numbers of SMLBP and DMLBP, respectively. Each value of SMLBP and DMLBP ranges from 0 to 255 and therefore we use two 256-bin histograms to represent them. The SMLBP and DMLBP describe the similarity and dissimilarity information on the spatial distribution of the LBP codes, respectively, implying higher order textures and higher semantics which are more appropriate for medical US images. Consequently, the LBP, SMLBP and DMLBP codes are concatenated to a 571-dimensional feature vector (i.e. 256+256+59=571). For more detailed technical information, please refer to Yuan et al. (2019). Eventually, we have a 598-dimensional feature vector (i.e. 571+6+21=598). 2.4. Initial Segmentation and KNN-based Reclassification In this study, we aim to learn the semantics of every superpixel and subsequently merge those labelled as tumor into the segmented tumor region. To this end, the bag-of-words (BoW) model (Csurka et al., 2004) is employed to represent the image semantics. With the clustered words and the coded semantic feature vector, the BPNN using the sigmoid function as the activation function is applied to classify the superpixels into two categories: i.e. tumor region and normal tissue region. According to the classification results, the superpixels with the same class labels are merged together and the initial segment result can be achieved. To avoid the effect of a small number of misclassified superpixels, the KNN-based reclassification method is used to refine the segmentation result. 2.4.1 Initial Segmentation The aim of creating the BoW model is to extract the semantic features of the generated superpxiels. In this paper, we used Kmeans to divide the sample points each of which is represented as a 598-dimensional vector

12

into K (K=250 in this study) clusters. Then, the length of BoW is K and hence there are K visual words. In fact, the frequency of the visual words is expressed as a K-bins histogram. The BoW model is shown in Figure 4.

--------------------Image patches generation

...

598 dimensional feature vector

... ...

... ...

...

... ...

Create visual words (K=250)

Figure 4. The BoW model is adopted to extract the semantic features of the superpixels. The visual words computed from a large amount of 598 dimensional feature vectors using the Kmeans.

Using the BoW, a superpixel is represented as a K-dimensional vector, which can be used for semantic classification. Due to its strong ability of data fitting, the BPNN is used as the learning model. In this step, we classify the superpixels into two categories: i.e. the patches in the tumor region and the patches in the normal tissue region. The input is the semantic feature vector of a superpixel and it is a K-dimensional feature vector. The output is the probability of the two categories. The structure of the BPNN is shown in Figure 5. According to the classification results, the initial segmentation is obtained.

The probability of two categories Output layer (N = 2)

...

Hidden layer (N = 100)

...

Input layer (N = 250)

250-dimension semantic feature Figure 5. The structure of the 3-layer BPNN used in this study.

2.4.2 KNN-based reclassification The semantics based classification would understand the superpixels that belong to the tumor region. However, the initial result may be coarse because there often are some misclassifications on the superpixels. Figure 6 shows several examples of misclassifications. Having observed the misclassified superpixels, we found that some superpixels belonging to the tumor region are really similar to some locating in normal tissues and they often occur on the margin of the tumor. Moreover, the superpixels in the training set were

13

collected with different imaging settings, often making the appearances of the tumor regions and normal tissues inconsistent in different BUS images. Therefore, the classification of the superpixels should be further refined. It is observed that the misclassified superpixels are often surrounded by those correctly classified. Taking into account the neighboring information of each superpixel, therefore, we employed the KNN to reclassify all superpixels.

The original BUS image

Label

The result of initial segmentation

Figure 6. Three examples of misclassifications after the initial classification.

Let us consider every superpixel and its neighborhoods. There are three cases as the follows. First, the central superpixel is classified as the same label as its neighborhoods’. Second, the central superpixel is classified as the different label from all the neighborhoods. Third, the label of the central superpixel is partially different from those of its neighborhoods. As illustrated in Figure 7, sc denotes the central superpixel and sn1 ... sn6 refers to its neighborhoods.

s sn6 n1 sn2 s sn5 c sn3 sn4 Figure 7. An illustration of a superpixel and its neighborhoods.

For the first case, the label of sc should be unchanged. For the second case, the initial classification Cinit should be changed to the category of its neighborhoods. For the third case, we adopt a voting strategy, i.e. the

14

KNN method is used to assign the label of the majority in the neighborhoods to the central superpixel. We computed the Euclidean distance between the central superpixel and its neighborhoods. If the distance is smaller than a threshold (threshold = 100 in this study), the corresponding superpixel is used as the training sample and K is the number of the selected superpixels. After the KNN-based reclassification, the initial segmentation result is refined and we can have the final segmentation result of the tumor region. 2.5. Postprocessing Having segmented the ROI by the aforementioned steps, we turn it into a binary image. The predicted tumor region is set to the foreground and the remaining region is set to the background. Thereafter, an edge detection method (Suzuki and Be, 1985) is adopted to find out all connected components. The largest connected component is extracted to be the final result as there is only one tumor in the ROI. Figure 8 shows an example of postprocessing. In this example, there are three tumor regions, e.g. S1, S2 and S3 before the procedure of postprocessing. Then the largest one S1 is selected as the final output. S2 S1

Original image

S3 Before postprocessing

After postprocessing

Segmentation result

Figure 8. An example of the postprocessing, in which the incorrectly predicted tumor regions are removed.

2.6. The Proposed Segmentation Method The general procedures of the proposed method are summarized as follows: Step 1: Manually portray the ROI that contains the tumor from the original BUS images. Step 2: Conduct image data preprocessing, including the bilateral filtering, histogram equalization and pyramid mean shift filtering. Step 3: Use the SLIC algorithm to generate the superpixels (i.e. the image patches) in the ROI. Step 4: Extract color and texture features of each superpixel. The features include the 6 gray histogram statistics, 21 GLCM statistics, and the LBP based features (i.e. the LBP, SMLBP and DMLBP). Step 5: The BoW model is employed to extract semantic features. 250 visual words are clustered using the Kmeans method, and a superpixel is represented by a 250-dimensional vector.

15

Step 6: The BPNN is used for semantic classification. The superpixels with the same label are merged together and the initial segmentation result is achieved. Step 7: To eliminate the disturbance of misclassfication, the KNN-based reclassification method is executed for refining the initial result. Step 8: Morphological postprocessing is finally performed to output the segmented tumor region. 2.7. Experimental methods We developed the proposed method using the Python language and OpenCV 2.4.10 and JetBrains PyCharm 2017, and run it on a computer with 3.40 GHz CPU and 8.0 GB RAM. Experiments have been conducted on our own dataset to validate the proposed method. Our work was approved by Human Subject Ethics Committee of South China University of Technology, and the 320 clinical BUS images with the subjects’ consent forms were provided by the Cancer Center of Sun Yat-sen University. The BUS images were taken from an HDI 5000 SonoCT System (Philips Medical Systems) with an L12-5 50mm Broadband Linear Array at the imaging frequency of 7.1 MHz. We normalized the size of all the BUS images to 128× 128 and the labels of the tumor regions were manually delineated by a well-trained radiologist with over tenyear experiences. The dataset is composed of 160 BUS images with benign tumors and 160 BUS images with malignant tumors. The age of the patients is 46.6±14.2. To demonstrate the effectiveness of the proposed method, we compared it with five methods, i.e. MOORGB (Luo et al., 2017), PAORGB (Huang et al., 2014), DRLSE (Li et al., 2010), MSGC (Zhou et al., 2014), FCN (Long et al., 2015). The MOORGB and PAORGB are two improved methods of RGB (Huang et al., 2012). DRLSE is an edge-based method. MSGC is a method based on mean shift and graph cuts. FCN is a classical deep learning models for semantic segmentation. In this study, 5-fold cross validation was employed to demonstrate the generalized performance of the FCN. In each fold, 256 BUS images were used as the training set and the remaining 64 images were used as the testing set. No data augmentation was adopted. While training the FCN, the weights of convolutional layers were initialized by glorot uniform distribution and the biases were initialized to zero. Binary cross entropy function was used for the loss function. As for the optimizer, Adam optimization method was adopted and we set β1=0.9, β2=0.999, ε=10-8 and learning rate = 10-4. The batch size was 2 and epoch was 40. We used Keras to conduct the FCN. It took about one hour to train the models using an Intel Xeon W-2125 @ 4.00GHz and an NVIDIA GeForce GTX 1050Ti (4GB).

16

To make a quantitative assessment on the segmentation performance, four criteria were adopted, i.e. F1score, averaged radial error (ARE), true positive (TP) and false positive (FP) (Udupa et al., 2002). ARE is the quantitative measure for evaluation of the object boundary approximation performance. It describes the difference between the contour of the tumor region delineated by the radiologist and that of the predicted tumor region. It is computed as below: ( )



()

| |

()

( )|

(16)

|

where Co is the center of the manual tumor region and Cr(i) refers to the point where the contour manually delineated by the radiologist intersects the i-th ray from Co, and Cs(i) stands for the point where the contour of the predicted tumor region intersects the i-th ray. In this study, n is set to be 36. The computation of ARE is shown in Figure 9.

Figure 9. An illustration for computation of the ARE.

TP and FP are two region-based metrics, and F1-score is a balancing metric between TP and FP. They are defined as (17) (18) (

)

(19)

where symbol Am corresponds to the area of the tumor region delineated by the radiologist, An the predicted tumor region, and Am ∩ An the overlapping region between Am and An. TP represents the proportion of the area of the correctly segmented tumor region in the area of the hand-labelled tumor region. FP denotes the ratio of the amount of tissue falsely identified by the proposed method to the area of the hand-labelled region. With respect to the definitions mentioned above, the larger F1-score and TP as well as the smaller FP and

17

ARE reflect better performance of the segmentation method. All codes, data and results can be referred to: http://www.wisemed.cn/Public/images/11.zip.

3. Results 3.1. Effectiveness of features Firstly, we evaluated how the features based on gray histogram, GLCM and LBP affect the accuracy of the semantic classification. In this experiment, the dataset consists of 30741 superpixles generated from 320 BUS images. Ten-fold cross validation was performed and the result is shown in Table 3. It can be seen that the combined feature produces a better result than a single feature. Table 3. The results of 10-fold cross validation. Training accuracy (%)

Testing accuracy (%)

Features based on gray histogram

86.61±0.20

85.62±1.43

Features based on GLCM

86.10±0.02

85.86±1.48

Features based on LBP

74.95±0.32

73.40±2.22

Features based on gray histogram, GLCM, and LBP

87.58±0.21

86.54±1.36

3.2. Qualitative segmentation results Due to the limited length, we present the segmentation results of five tumors (two benign and three malignant). As shown in Figure 10, the first three rows are malignant tumors and the last two are benign tumors. Our method achieved the competitive performance compared with the other five methods. Although it has a slight flaw, the contour obtained by the proposed method is the closest to the contour delineated by the radiologist. Undersegmentation happens in the results obtained by the PAORGB and MSGC, and the contours generated by the PAORGB are relatively rough. Most of the contours obtained by the MSGC are within the manual contours. The DRLSE shows over-segmentation as its contours often contain the handlabelled contours. To some extent, the MOORGB suppresses the under-segmentation and over-segmentation. Because we used the original BUS data to train the FCN without using any data augmentation means, the FCN may have a problem of overfitting. In contrast, the contours of the proposed method are the closest to the hand-labelled tumor contours.

18

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Figure 10. Segmentation results of five tumors. (a) The ROI of the BUS image with the contour delineated by the radiologist, (b) the result of the proposed method, (c) the result of MOORGB, (d) the result of PAORGB, (e) the result of DRLSE, (f) the result of MSGC and (g) the result of FCN.

3.3. Quantitative results In quantitative evaluations, we randomly selected 100 images as testing set because it took too long to run the MOORGB and PAORGB. Tables 4, 5 and 6 show the quantitative results on the testing set. Table 4 shows the segmentation results on the testing BUS images with benign tumors. Table 5 lists the results of the testing BUS images with malignant tumors. Table 6 shows the overall results of the testing dataset. From Tables 4 and 5, the performance on the BUS images with benign tumors for all the methods is better than that on the BUS images with malignant tumors. As shown in Table 6, the DRLSE achieved the highest TP (91.90%) but the highest FP (19.24%), which is caused by the serious oversegmentation. The

19

MSGC obtained the lowest FP (2.72%) but the lowest TP (74.73%), indicating the problem of undersegmentation. Nevertheless, it is the fastest method (0.12s) among all the five methods. The MOORGB could alleviate the problems of oversegmentation and undersegmentation compared to the PAORGB. In comparison, our method achieved higher TP and the highest F1-score (or the lowest ARE) among all the methods, implying the good contour approximation performance. Table 4. Quantitative segmentation results of 50 testing BUS images with benign tumors Methods

F1-score (%)

ARE(%)

TP (%)

FP (%)

Proposed method MOORGB PAORGB DRLSE MSGC

90.20±4.00 89.83±4.69 78.73±13.87 87.56±3.99 81.09±7.16

9.66±4.36 11.09±4.40 16.47±5.90 11.37±4.21 15.76 ±4.25

91.24±5.73 85.61 ± 8.67 72.03 ± 17.65 93.49 ± 4.98 71.70 ± 10.24

11.13±7.96 4.65±5.96 9.10 ± 21.55 20.23±8.51 4.37 ± 5.42

Table 5. Quantitative segmentation results of 50 testing BUS images with malignant tumors Methods Proposed method

MOORGB PAORGB DRLSE MSGC

F1-score (%) 89.53±4.12 89.52±5.56 78.99±11.15 84.56±5.04 81.34±5.30

ARE(%) 10.33±4.75 10.41±3.43 19.12±6.03 15.84±3.85 15.52±5.75

TP (%) 91.58±5.24 87.00±8.08 72.61 ± 16.66 94.47 ± 5.58 71.37 ± 7.42

FP (%) 13.31±10.54 7.44±11.56 9.52±17.28 29.69±15.45 3.74 ± 4.09

Table 6. Quantitative segmentation results of all the 100 testing BUS images Methods

Proposed method MOOORGB PAORGB DRLSE MSGC

F1-score (%) 89.87±4.05 89.67±5.12 78.86±12.52 86.06±4.77 81.22±6.27

ARE(%) 9.95±4.42 10.75±3.08 17.80±6.65 13.60 ± 4.10 15.64 ± 5.18

TP (%) 91.41±5.04 86.31±8.37 72.32 ± 17.08 93.98±5.28 71.53 ± 8.90

FP (%) 12.22 ± 9.42 6.04 ± 9.26 9.31 ± 19.44 24.96 ± 13.29 4.05 ± 4.78

Time (s) 30 50.54 720 5.93 0.12

Note that our method only achieved the good results in the metrics of F1-score and ARE, and were comparable to the DRLSE in the metric of TP. However, only considering a single metric (e.g. TP or FP) cannot validate the comprehensive performance. For example, although the DRLSE achieved good TP, its contours are often locating in outer zone of the true tumor regions, and therefore resulting in the poorest FP. In contrast, our method does not require an initial contour and obtained a relatively low FP. Likewise, the MSGC performed well on the metric of FP, but is actually suffered from the problem of undersegmentation, hence resulting in the poorest TP. In general, the DRLSE sacrificed FP to improve TP and the MSGC did

20

the opposite. In contrast, our method achieved the best trade-off on all the metrics, and actually obtained the closest boundaries to the hand-labelled tumor regions. Figure 11 shows two examples in which the proposed method is compared to the DRLSE and MSGC. The examples illustrate that a single metric should not be able to reflect the segmentation performance, and a comprehensive evaluation based on F1-score is necessary.

(a)

(b)

Figure 11. Two examples for comparing the proposed method with the MSGC and DRLSE methods. (a) Our method: ARE is 13.53%, F1-score: 92.26%, TP is 90.17% and FP is 5.29%. MSGC: ARE is 15.75%, F1-score: 79.90%, TP is 69.50% and FP is 4.47%. The MSGC achieved better FP but worse TP, and (b) our method: ARE is 6.26%, F1-score: 92.54%, TP is 95.95% and FP is 11.41%. DRLSE: ARE is 12.06%, F1-score: 83.75, TP is 97% and FP is 34.63%. The DRLSE achieved better TP, but worse FP.

Table 7. Quantitative segmentation results of 5-fold validation. Methods Proposed method

F1-score (%) 86.27±0.86

TP (%) 86.05±0.97

FP (%) 11.41±1.55

Proposed method + Snakes FCN

89.06±1.21 89.12 ± 2.81

91.36±1.33 94.01±0.87

13.84±1.99 17.75±6.93

In addition to the methods above, we conducted 5-fold validation and compared the proposed method with the FCN. It is noted that the “true contour” of a tumor was delineated by the radiologist and always appears to be smooth. However, our method detects the tumor contour using the superpixels which are very sensitive to the local gradients and often show irregular boundaries, making the extracted tumor contours much wrinkled. To make the contour smoother, Snakes (Kass et al., 1988) was therefore adopted after the initial contours were achieved by the proposed method. Table 7 shows the quantitative result and Figure 12 shows the results of an example obtained by the three methods.

21

(a)

(c)

(b)

(d)

Figure 12. Segmentation result of a tumor using the proposed method and the FCN. (a) The ROI of the BUS image with the contour delineated by the radiologist, (b) the result of the proposed method, (c) the result of proposed method+Snakes, and (d) the result of FCN.

As shown in Figure 12, the contour generated by the FCN is a little bit smoother than that of the proposed method. It is because the hand-labelled contours were the labels of the deep learning model and thus the FCN could learn the smooth contour style drawn by hand. In contrast, our method learns only the class labels of the superpixels instead of the hand-labelled contours, hence cannot generate a smooth tumor contour. Thanks to the Snakes, the proposed method + Snakes makes the contour smoother and closer to the hand-labelled contour. From Table 7, the results also support above analysis. In terms of F1-score which is often deemed as an overall evaluation measure, the FCN is slightly better than our method without using the Snakes, but is almost the same as our method with using the Snakes. However, the true boundaries of the tumors may exist winkles, corners, bumps and burrs, and therefore the hand-labelled contours may not be the exactly true tumor boundaries. From the results, we can conclude that the deep learning model (e.g. the FCN) is more capable of learning the hand-labelled contours, while our method may be more capable of detecting the real tumor boundaries based on the semantic classification of superpixels. We also compared the segmentation results with and without using the KNN-based reclassification. The quantitative results are shown in Table 8. It is observed that the proposed method with using the reclassification is superior to that without using the reclassification. Table 8. Quantitative results of the comparison experiment with and without the KNN reclassification. Methods

F1 score (%)

TP (%)

FP (%)

With KNN

89.87±4.05

91.41±5.04

12.22 ± 9.42

Without KNN

89.61±4.29

91.05±6.11

12.33 ± 9.64

22

3.4. Parameters analysis In our method, the number of the superpixels M generated from a ROI is a crucial parameter. Different values of M may lead to different results. To further study the effect of varied M, we selected different values for M. Figure 13 shows the segmentation result with different values of M and the quantitative results are shown in Table 9.

(a) The ROI with the hand-labelled contour

(b) M = 50

(c) M = 120

(d) M = 150

(f) M = 250

(g) M = 300

(h) M = 350

(e) M = 200

Figure 13. The segmentation results with different values of M.

Table 9. Quantitative segmentation results of 100 BUS images with different values of M. The value of M

F1-score (%)

TP (%)

FP (%)

50

87.79±5.40

90.58 ± 5.66

16.43 ± 13.91

120

89.87±4.05

91.41 ± 5.04

12.22 ± 9.42

150

87.86±5.19

90.17 ± 5.94

15.73 ± 15.53

200

87.13±4.89

90.13 ± 6.16

17.20 ± 12.65

250

85.90±5.49

90.61 ± 6.39

21.02 ± 15.26

300

84.80±5.26

89.88 ± 6.23

22.81 ± 15.46

350

84.80±5.25

89.87 ± 6.23

22.80 ± 15.42

23

As shown in Figure 13 and Table 9, different values of M lead to different results. In general, when M is set to be 120, it achieved the best result. Thus we set M to be 120 in the experiments. Please note that the segmentation performance is highly dependent on the size of the superpixel. The larger M is, the smaller the size of the superpixel should be. If the size of the superpixel is either too large or too small, the extracted feature should not be able to appropriately describe the superpixel. A too large size may contain a part of tumor and a part of normal tissues, and a too small size may correspond to a noise. 3.5. The cropping of ROIs analysis The selection of ROI may affect the segmentation results. To analyze the effect of cropping the ROI, we chose 71 BUS images to make a new testing dataset and cropped different ROIs. In this experiment, cropping ratio indicates the percentage of the tumor region in the cropped ROI, and we set five cropping ratios, i.e. 30%, 35%, 40%, 45% and 50%. Figures 14 shows the qualitative results and Figure 15 provides a quantitative evaluation.

(a)

(b)

Figure 14. Segmentation results with different ROI sizes. (a) The ROIs with the manual contours, (b) the result of the proposed method. The ratios from top to bottom: 30%, 35%, 40%, 45% and 50%.

24

The results of different ratios of tumor regions to ROIs 100

%

80 60

ARE(%)

40

F1-score(%)

20

TP(%) FP(%)

0 30%

35%

40%

45%

50%

The ratio of tumor region to ROI

Figure 15. The results of different ratios of tumor region to ROI. According to Figure 15, the TP become worse and worse, however, the FP gets better as the cropping ratio increases. In fact, the TP heavily relies on the superpixels on the tumor boundary, and larger ROI would help the proposed method to classify more marginal superpixels as the tumor region. For the FP, it is very obvious that smaller ROI leads to smaller FP. Therefore, it would be difficult to explain the results only based on TP, FP even and F1-score. As mentioned above, the ARE measures the contour approximation performance. According to the ARE, we can see that the best segmentation results were achieved when the cropping ratio is about 40%. The qualitative results shown in Figures 14 can also validate this conclusion.

4. Discussion and Conclusions Segmentation of the BUS images is a necessary but very challenging step in the CAD system. In this paper, we propose a novel method based on the semantic classification of superpixels. It is a new type of classification based image segmentation algorithm. Three types of features and the BoW model are used to extract semantics of superpixels, and the superpixels labelled as the tumor are merged together to output the segmented tumor region. The training of the classifier using a specifically collected training data is necessary and would be decisive to the final segmentation output. Therefore, the trained classifier in this study cannot be directly used for different types of medical images or even the US images collected from different machines. The results heavily rely on the classification accuracy. In addition, the KNN-based reclassification is helpful in polishing the initial classification/segmentation results. The only drawback of this study is that the operator must select a ROI containing the tumor tissue, making our method not fully automated. Nevertheless, cropping ROI is easy to implement and convenient for doctors. It is worth noting that the proposed method makes use of the labels of tumor or normal tissue, and hence can be regarded as a supervised method. As is known well, a supervised method often outperforms an

25

unsupervised method. The experimental results have demonstrated that our method outperforms the traditional unsupervised methods in approximation of the hand-labelled tumor contour, therefore having validated this conclusion. However, our method was not superior to the deep learning method. As is well known, deep learning model has strong capability of learning the tumor region if the “true contour” is available. Because both the deep learning model and the proposed method are supervised, we have made comparisons between them. With respect to the experimental results, the FCN outperforms our method even without data augmentation. However, the FCN was trained by the hand-labelled contours (i.e. strong supervision) while our method was trained by the class labels of the superpixels (i.e. relatively weak supervision). Interestingly, when we used the proposed method + Snakes, the performance is almost the same as that of FCN. It implies that the key difference between our method and the deep learning model is the smoothness of the extracted contour which is actually the inherent style of the hand-labelled contour. Nevertheless, the hand-labelled contour may be quite different from the true boundaries of tumor. Our method taking advantages of both traditional unsupervised segmentation methods and supervised classification methods may have a better potential to approximate the real tumor boundaries. In our future work, we will keep improving the supervised learning of tumor region from two aspects, i.e. the classification and reclassification of superpixels. For another issue, we also think that the segmentation is an intermediate step for the final diagnosis. The doctors’ understanding of the BUS images is actually an iterative process, including segmentation, recognition and feedback learning. Segmentation and recognition influence each other. Our method is designed based on this process, i.e. the classification of tumor or normal tissue can help doctor delineate the boundary of tumor region, and the incorrectly labelled superpixels should be re-recognized to obtain the final segmentation result. In this study, we did not use the labels of tumor (i.e. benign or malignant) because there is little evidence that the segmentation output is highly related to the tumor label. However, like the doctors trying to extract the tumor region in their minds, the tumor label may have influence on the parameter setting of the proposed method to make the tumor contour or texture similar to those with the same tumor labels. Consequently, a closed loop may be formed to feed back the CAD output to the segmentation and the segmentation result acts as the input of the CAD procedure for classification of tumors. This would be another research topic in our future work.

ACKNOWLEDGMENTS

26

This work was partially supported by Shaanxi Science Foundation for distinguished young scholars (no. 2019JC-13), National Natural Science Foundation of China (No. 61571193), Natural Science Foundation of Guangdong Province, China (No. 2017A030312006), and Science and Technology Program of Guangzhou (No. 201704020134).

REFERENCES Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S., 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282. Chang, H., Chen, Z., Huang, Q., Shi, J., Li, X., 2015. Graph-based learning for segmentation of 3D ultrasound images. Neurocomputing 151, 632–644.

Chen, G., Xiang, D., Zhang, B., Tian, H., Chen, X., 2019. Automatic Pathological Lung Segmentation in Low-dose CT Image using Eigenspace Sparse Shape Composition. IEEE Trans. Med. Imaging. doi:10.1109/TMI.2018.2890510. Comaniciu, D., Meer, P., 1999. Mean shift analysis and applications, in: The Proceedings of the Seventh IEEE International Conference On Computer Vision, ICCV 1999, pp. 1197–1203. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C., 2004. Visual categorization with bags of keypoints, in: Workshop on Statistical Learning in Computer Vision, ECCV. Prague, pp. 1–2. Duda, R.O., Hart, P.E., 1973. Pattern classification and scene analysis. A Wiley-Interscience Publ. New York Wiley, 1973. Elad, M., 2002. On the origin of the bilateral filter and ways to improve it. IEEE Trans. image Process. 11, 1141–1151. Gao, L., Liu, X., Chen, W., 2012. Phase-and gvf-based level set segmentation of ultrasonic breast tumors. J. Appl. Math. 2012. Gonzalez, R.C., Woods, R.E., 2002. Digital image processing, Prentice Hall, Second edition. Haralick, R.M., Shanmugam, K., Dinstein, I., 1973. Textural features for image classification. IEEE Trans. Syst. Man. Cybern. 3, 610–621. Huang, Q.-H., Lee, S.-Y., Liu, L.-Z., Lu, M.-H., Jin, L.-W., Li, A.-H., 2012. A robust graph-based segmentation method for breast tumors in ultrasound images. Ultrasonics 52, 266–275. Huang, Q., Bai, X., Li, Y., Jin, L., Li, X., 2014. Optimized graph-based segmentation for ultrasound images. Neurocomputing 129, 216–224. Huang, Q., Chen, Y., Liu, L., Tao, D., Li, X., 2019a. On Combining Biclustering Mining and AdaBoost for Breast Tumor Classification. IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2019.2891622. Huang, Q., Hu, B., Zhang, F., 2019b. Evolutionary Optimized Fuzzy Reasoning with Mined Diagnostic Patterns for Classification of Breast Tumors in Ultrasound. Inf. Sci. (Ny)

Huang, Q., Huang, X., Liu, L., Lin, Y., Long, X., Li, X., 2018. A case-oriented web-based training system for breast cancer diagnosis. Comput. Methods Programs Biomed. 156, 73–83.

27

Huang, Q., Luo, Y., Zhang, Q., 2017. Breast ultrasound image segmentation: a survey. Int. J. Comput. Assist. Radiol. Surg. 12, 493–507. Huang, Q., Yang, F., Liu, L., Li, X., 2015. Automatic segmentation of breast lesions for interaction in ultrasonic computeraided diagnosis. Inf. Sci. (Ny). 314, 293–310. Kass, M., Witkin, A., Terzopoulos, D., 1988. Snakes: Active contour models. Int. J. Comput. Vis. 1, 321–331.

Kumar, V., Webb, J.M., Gregory, A., Denis, M., Meixner, D.D., Bayat, M., Whaley, D.H., Fatemi, M., Alizad, A., 2018. Automated and real-time segmentation of suspicious breast masses using convolutional neural network. PLoS One 13, e0195816. Li, C., Xu, C., Gui, C., Fox, M.D., 2010. Distance regularized level set evolution and its application to image segmentation. IEEE Trans. image Process. 19, 3243. Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3431–3440.

López-Linares, K., Aranjuelo, N., Kabongo, L., Maclair, G., Lete, N., Ceresa, M., García-Familiar, A., Macía, I., Ballester, M.A.G., 2018. Fully automatic detection and segmentation of abdominal aortic thrombus in post-operative CTA images using deep convolutional neural networks. Med. Image Anal. 202–214. Luo, Y., Liu, L., Huang, Q., Li, X., 2017. A novel segmentation approach combining region-and edge-based information for ultrasound images. Biomed Res. Int. 2017. Materka, A., Strzelecki, M., 1998. Texture analysis methods–a review. Tech. Univ. lodz, Inst. Electron. COST B11 report, Brussels 9–11. Moon, W.K., Lo, C.M., Chen, R.T., Shen, Y.W., Chang, J.M., Huang, C.S., Chen, J.H., Hsu, W.W., Chang, R.F., 2014. Tumor detection in automated breast ultrasound images using quantitative tissue clustering. Med. Phys. 41, 42901. Moraru, L., Moldovanu, S., Biswas, A., 2014. Optimization of breast lesion segmentation in texture feature space approach. Med. Eng. Phys. 36, 129–135. Ojala, T., Pietikäinen, M., Harwood, D., 1996. A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29, 51–59. Ojala, T., Pietikainen, M., Maenpaa, T., 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987. Peng, J., Shen, J., Li, X., 2016. High-order energies for stereo segmentation. IEEE Trans. Cybern. 46, 1616–1627. Qi, X., Zhang, L., Chen, Y., Pi, Y., Chen, Y., Lv, Q., Yi, Z., 2019. Automated diagnosis of breast ultrasonography images using deep neural networks. Med. Image Anal. 52, 185–198. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-propagating errors. Nature 323, 533. Sahiner, B., Chan, H.P., Roubidoux, M.A., Hadjiiski, L.M., Helvie, M.A., Paramagul, C., Bailey, J., Nees, A. V, Blane, C., 2007. Malignant and benign breast masses on 3D US volumetric images: effect of computer-aided diagnosis on radiologist accuracy. Radiology 242, 716.

28

Shan, J., Cheng, H.D., Wang, Y., 2012. Completely automated segmentation approach for breast ultrasound images using multiple-domain features. Ultrasound Med. Biol. 38, 262–275. Siegel, R.L., Miller, K.D., Jemal, A., 2018. Cancer statistics, 2018. Ca A Cancer J. Clin. 60, 277–300. Suzuki, S., Be, K., 1985. Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30, 32–46. Udupa, J.K., Lablanc, V.R., Schmidt, H., Imielinska, C., Jin, Y., 2002. Methodology for evaluating image-segmentation algorithms. Proc. SPIE - Int. Soc. Opt. Eng. 4684, 266–277. Wells, P.N.T., Halliwell, M., 1981. Speckle in ultrasonic imaging. Ultrasonics 19, 225–229. Xian, M., Cheng, H.-D., Zhang, Y., 2014. A fully automatic breast ultrasound image segmentation approach based on neutro-connectedness, in: 2014 22nd International Conference on Pattern Recognition. IEEE, pp. 2495–2500. Xian, M., Huang, J., Zhang, Y., Tang, X., 2012. Multiple-domain knowledge based MRF model for tumor segmentation in breast ultrasound images, in: ICIP. pp. 2021–2024. Xian, M., Zhang, Y., Cheng, H.-D., Xu, F., Ding, J., 2016. Neutro-connectedness cut. IEEE Trans. image Process. 25, 4691–4703. Xian, M., Zhang, Y., Cheng, H.-D., Xu, F., Zhang, B., Ding, J., 2018. Automatic breast ultrasound image segmentation: A survey. Pattern Recognit. 79, 340–355.

Xiao, G., Brady, M., Noble, J.A., Zhang, Y., 2002. Segmentation of ultrasound B-mode images with intensity inhomogeneity correction. IEEE Trans. Med. Imaging 21, 48–57. Yuan, F., Shi, J., Xia, X., Huang, Q., Li, X., 2019. Co-occurrence matching of local binary patterns for improving visual adaption and its application to smoke recognition. IET Comput. Vis. 13, 178–187. https://doi.org/10.1049/ietcvi.2018.5164 Zhang, J., Zhou, S.K., Brunke, S., Lowery, C., Comaniciu, D., 2010. Database-guided breast tumor detection and segmentation in 2D ultrasound images, in: Medical Imaging 2010: Computer-Aided Diagnosis. International Society for Optics and Photonics, p. 762405. Zhang, L., Zhang, M., 2011. A fully automatic image segmentation using an extended fuzzy set, in: Computer Science for Environmental Engineering and EcoInformatics. Springer, pp. 412–417. Zhao, X., Wu, Y., Song, G., Li, Z., Zhang, Y., Fan, Y., 2018. A deep learning model integrating FCNNs and CRFs for brain tumor segmentation. Med. Image Anal. 43, 98–111.

Zhou, Z., Wu, W., Wu, S., Tsui, P.-H., Lin, C.-C., Zhang, L., Wang, T., 2014. Semi-automatic breast ultrasound image segmentation based on mean shift and graph cuts. Ultrason. Imaging 36, 256–276.

Graphical abstract

29

--------------------Image patches generation

...

598 dimensional feature vector

Create visual words (K=250)

30

... ...

... ...

...

... ...