Semi-supervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network

Semi-supervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network

Journal Pre-proof Semi-supervised Segmentation of Lesion from Breast Ultrasound Images with Attentional Generative Adversarial Network Luyi Han , Yun...

2MB Sizes 0 Downloads 68 Views

Journal Pre-proof

Semi-supervised Segmentation of Lesion from Breast Ultrasound Images with Attentional Generative Adversarial Network Luyi Han , Yunzhi Huang , Haoran Dou , Shuai Wang , Sahar Ahamad , Honghao Luo , Qi Liu , Jingfan Fan , Jiang Zhang PII: DOI: Reference:

S0169-2607(19)30658-3 https://doi.org/10.1016/j.cmpb.2019.105275 COMM 105275

To appear in:

Computer Methods and Programs in Biomedicine

Received date: Revised date: Accepted date:

4 May 2019 30 October 2019 11 December 2019

Please cite this article as: Luyi Han , Yunzhi Huang , Haoran Dou , Shuai Wang , Sahar Ahamad , Honghao Luo , Qi Liu , Jingfan Fan , Jiang Zhang , Semi-supervised Segmentation of Lesion from Breast Ultrasound Images with Attentional Generative Adversarial Network, Computer Methods and Programs in Biomedicine (2019), doi: https://doi.org/10.1016/j.cmpb.2019.105275

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Highlights    

A generative adversarial network is leveraged to automatically segment breast lesion. Semi-supervised learning is exploited to fully use the unannotated ultrasound images. A dual-attentive-fusion block is designed to enhance the evaluator‟s discrimination. The proposed model shows a superior performance on multi-site datasets than the state-of-the-art methods.

Semi-supervised Segmentation of Lesion from Breast Ultrasound Images with Attentional Generative Adversarial Network

Luyi Hana, Yunzhi Huanga,b,*, Haoran Douc, Shuai Wangd, Sahar Ahamadd, Honghao Luoe, Qi Liua, Jingfan Fanf, Jiang Zhanga

a. College of Electrical Engineering, Sichuan University, Chengdu 610065, China b. Department of Biomedical Engineering, Sichuan University, Chengdu 610065, China c. National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University, Shenzhen 518060, China d. Department of Radiology and Biomedical Research Imaging Center, University of North Carolina, Chapel Hill, USA e. Department of Ultrasound, West China Hospital of Sichuan University, Chengdu 610041, China f. Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China

*Corresponding author: Yunzhi Huang([email protected])

Abstract: Background and Objective: Automatic segmentation of breast lesion from ultrasound images is a crucial module for the computer aided diagnostic systems in clinical practice. Large-scale breast ultrasound (BUS) images remain unannotated and need to be effectively explored to improve the segmentation quality. To address this, a semi-supervised segmentation network is proposed based on generative adversarial networks (GAN). Methods: In this paper, a semi-supervised learning model, denoted as BUS-GAN, consisting of a segmentation base network—BUS-S and an evaluation base network—BUS-E, is proposed. The BUS-S network can densely extract multi-scale features in order to accommodate the individual variance of breast lesion, thereby enhancing the robustness of segmentation. Besides, the BUS-E network adopts a dual-attentive-fusion block having two independent spatial attention paths on the predicted segmentation map and leverages the corresponding original image to distill geometrical-level and intensity-level information, respectively, so that to enlarge the difference between lesion region and background, thus improving the discriminative ability of the BUS-E network. Then, through adversarial training, the BUS-GAN model can achieve higher segmentation quality because the BUS-E network guides the BUS-S network to generate more accurate segmentation maps with more similar distribution as ground truth. Results: The counterpart semi-supervised segmentation methods and the proposed BUS-GAN model were trained with 2000 in-house images, including 100 annotated images and 1900 unannotated images, and tested on two different sites, including 800 in-house images and 163 public images. The results validate that the proposed BUS-GAN model can achieve higher

segmentation accuracy on both the in-house testing dataset and the public dataset than state-of-the-art semi-supervised segmentation methods. Conclusions: The developed BUS-GAN model can effectively utilize the unannotated breast ultrasound images to improve the segmentation quality. In the future, the proposed segmentation method can be a potential module for the automatic breast ultrasound diagnose system, thus relieving the burden of a tedious image annotation process and alleviating the subjective influence of physicians‟ experiences in clinical practice. Our code will be made available on https://github.com/fiy2W/BUS-GAN.

Key Words: Ultrasound image, Breast lesion, Image segmentation, Semi-supervised learning, Generative adversarial networks, Attention mechanism

1. Introduction Breast cancer is the most common cancer in women. According to 2019 statistics, 268,600 women are estimated to be diagnosed with this disease and over 41,760 women may die of it in the United States alone [1]. Moreover, breast cancer is among the leading cause of cancer mortality among young females [2]. Early detection of breast lesions can effectively reduce mortality. Ultrasound (US) scanning is currently employed for the early detection of breast cancers because of its low cost and real-time high-resolution imaging without ionizing radiation [3-5]. With the increase in the number of patients diagnosed with breast cancer, a large number of US images need to be evaluated by the physicians. Thus, an automatic lesion diagnostic workflow based on breast ultrasound (BUS) images is essential to relieving the burden of physicians. Segmenting lesion from BUS images is the preliminary step for developing a reliable and automatic diagnostic system [6-9]. This allows computing the quantitative descriptors, i.e., lesion shape, size, and echo pattern, etc., which assists the subsequent clinical analysis. Furthermore, the annotated regions can be fed into the classifier to determine the category of the lesions. Hence, accurate BUS segmentation contributes significantly in classifying benign and malignant lesions and studying breast cancer MRI radiomics. Referring to Fig. 1, automatic lesion segmentation from BUS images accurately is a challenging task in practice, as a consequence of (1) ambiguous anatomical boundaries in ultrasound images due to speckle noise, low contrast, low signal noise ratio (SNR); (2) large inter-subject variability of breast structures.

Fig. 1. Clinical examples of breast ultrasound (BUS) images. Green curves correspond to the ground truth annotated by physicians, and red arrows point out the missing/ambiguous boundaries.

Recently, deep-learning-based methods have triumphed in the segmentation task for medical images [10-19]. Most of the segmentation frameworks are implemented by extracting multi-scale features in a fully supervised manner based on the typical segmentation networks, i.e., FCN-16s [20], U-Net [17]. Despite successful, the fully supervised learning-based methods require physicians to annotate large-scale training data, which is rather time-consuming and labor-intensive. To alleviate the annotation burden for physicians, many researchers focused on implementing segmentation task in a semi-supervised learning fashion [21-28], in which limited annotated images are required and a large amount of the unannotated data can be utilized. For instance, Bai et al. [25] trained an FCN-like segmentation network in a semi-supervised manner, in which probability maps predicted from the unannotated data were directly taken as the additional ground truth for updating the model parameters. Feng [23] improved the training strategy based on Bai et al.‟s work. Only part of the reasonable segmentation maps predicted from the unannotated samples were progressively combined with the annotated samples to improve the training procedure. All aforementioned methods cannot ensure that only good segmentation results of the unannotated data would be imported into training network. The

quality of the unannotated samples plays a vital role in determining the final segmentation accuracy. To effectively evaluate the quality of predicted segmentation maps, several researchers [21, 22, 29-33] further adopted generative adversarial network (GAN), which consists of a generator and a discriminator, to handle the segmentation task. For example, Lahiri et al. [30] deployed unannotated images to generate the fake samples to increase the amount of the training dataset. However, the model is lack of stability due to the insufficient supervision of the discriminator. Once the discriminator fails to recognize the quality of segmentation, the biased segmentation maps may greatly affect the performance of the entire adversarial training. Note that a good discriminator contributes significantly into instructing the learning segmentation task to achieve a desirable performance. Hence, several recent studies then focused on enhancing the performance of the discriminator [21, 22, 29, 32]. Zhang et al. [21] designed an adversarial loss function to use the unannotated data to better control the training process. Nie et al.[22] introduced the concept of confidence map to supervise the learning of unannotated data. In Nie et al.‟s work, the discriminator can export the confidence map of the segmentation result. According to the confidence map, only the segmentation regions with high confidence can be further exploited to guide the learning process. But for BUS images, the confidence map may be influenced by the presence of speckle noise. And the unstable confidence map may affect the entire learning procedure and lead to undesirable segmentation results on the unannotated data. Therefore, it is a challenging task to implement the effective segmentation of breast lesion with limited annotated data and a large amount of unannotated data. To this end, a GAN-based segmentation model is proposed for automatic BUS image segmentation. Specifically, in the evaluator, a dual-attentive-fusion block is designed to better extract the representative features of the lesion region and the background separately, which makes it more

effective in determining the quality of the segmentation maps predicted by the generator. Moreover, with a semi-supervised training strategy, the proposed framework can efficiently leverage the unannotated data to increase the amount of training data thus enhancing the generalization capability of the segmentation network. The major contributions can be concluded as follows: (1) A semi-supervised deep-learning-based segmentation model is proposed to make full use of unannotated BUS images to improve segmentation performance. The proposed method is beneficial in a clinical application with an increased number of BUS images and reduces the cost in terms of human effort and finance while the training only requires limited manually delineated data. (2) A developed dual-attentive-fusion (DAF) block is incorporated into the evaluator of the segmentation model, named BUS-GAN, to constraint the predicted segmentation map closer to the ground truth, and in turn to increase the performance of the segmentation network through adversarial training. The proposed DAF block not only can leverage the intensity-level and geometrical-level information, but also extract representative features from lesion region and background separately with two independent spatial attention paths, thus enhancing the discriminative ability of the evaluator. (3) The proposed BUS-GAN model is tested on multi-site breast ultrasound datasets. The BUS-GAN can achieve the best segmentation accuracy compared with the-state-of-art semi-supervised segmentation methods.

2. Method Fig. 2 shows the schematic illustration of the developed semi-supervised segmentation network for BUS images. The proposed dual-attentive GAN-based framework is composed of a segmentation network (BUS-S) and an evaluation network (BUS-E). The BUS-S network is first trained with the

annotated data in a supervised learning manner. Then, the unannotated data is fed into the pre-trained BUS-S network to generate the corresponding segmentation probability maps. Subsequently, the BUS-E network evaluates the segmentation quality for the input, which consists of the ground truth from the annotated data, and the predicted segmentation possibility maps from both the annotated and unannotated data. The evaluation results correspond to a binary score, where 0 and 1 refer to poor and good segmentation, respectively. Based on the scores generated by the BUS-E network, the proposed framework can further perform adversarial learning between the BUS-S and BUS-E. Through adversarial training, the BUS-GAN can enhance the evaluation of the quality of segmentation, thus guiding the BUS-S network to generate more accurate segmentation maps. The details of the segmentation model are given in the subsequent sections.

Fig. 2. Overview of the dual-attentive GAN-based semi-supervised segmentation network.

2.1. The Architecture of BUS-GAN For the GAN-based architecture, the performance of both the segmentation and evaluation networks can greatly affect the overall segmentation accuracy. In the BUS-GAN network, the segmentation network BUS-S is designed to identify lesion region from BUS images. Considering low quality of BUS images, and large appearance variability and intensity distribution among breast lesions, the proposed network densely extracts features at multiple scales. Meanwhile, the evaluation network

BUS-E is designed to effectively identify poor segmentation maps from good ones by integrating a dual-attention-fusion (DAF) block. Through adversarial training, the segmentation maps generated from BUS-S can be progressively close to the distribution of annotated ground truth. The details of the BUS-S and BUS-E are listed as follows.

Fig. 3. The architecture of the BUS-S network and BUS-E network.

2.1.1.

Segmentation Network—BUS-S

The BUS-S network is modified on the DeepLab [34] and customize it for BUS segmentation. Due to ambiguous boundary and ambulant speckle noise in BUS images, the BUS-S network is constructed with a densely connected network, in which better higher-level features can be generated by effectively incorporating low-level information, thus facilitating the identification of lesion region from BUS images. To alleviate the vanishing gradient problem in the deep segmentation model, three dense blocks [35] are first utilized to generate feature maps by combining high-level and low-level features. Moreover, considering large inter-subject variations in breast lesions size, an atrous spatial pyramid pooling (ASPP) module [34, 36, 37] is adopted to capture multi-scale features from the input feature maps for more accurate breast lesion representation.

In each dense block, all the outputs of the subsequent layers are iteratively concatenated in a feed-forward manner. And each convolutional layer (Conv) in the dense block is followed by a batch normalization layer (BN) and a leaky rectified linear unit (Leaky-ReLU). Specifically, atrous-convolutional layer with kernel size 3×3 and rate of 2 is adopted in the last dense block to increase the receptive field, so that to accommodate to variable size of breast lesions. Moreover, for several BUS images, the lesion region may vanish in the down sampled feature maps, like case 4 in Fig. 1. To ensure high-level feature maps of small size are equipped with features from lesion region, the stride of the last atrous-convolutional layer in the deepest dense block is set to 1. In the ASPP structure, four convolutional layers are parallelly concatenated, including one convolutional layer with kernel size of 1×1 and three atrous-convolutional layers with kernel size of 3×3 and rate of 6, 12, 18 for each layer, respectively. Moreover, an image pooling layer is added to extract image-level features and encode the global context information. In the image pooling layer, the global average pooling (GAP) is first employed on the input feature map, and then an up-sampling layer is employed to restore it to the desired spatial dimension. Referring to Fig. 3, the proposed BUS-S network takes BUS images as input and outputs the corresponding segmentation probability map. A convolutional layer with kernel size of 7×7 and stride of 2 first extract low-level features from input image. Then, three dense blocks and the ASPP block extract high-level features. Following the ASPP block, a convolutional layer with kernel size of 3×3, which is also followed by a batch normalization layer and a leaky-ReLU function, is utilized to merge the multi-scale features. Finally, a convolutional layer with kernel size of 1x1 is used to transit the channel of segmentation probability map into 1. And the top up-sampling layer enlarged its input by 8 times so that to recover probability map to the same size as that of the input BUS image.

2.1.2.

Evaluation Network—BUS-E

The proposed BUS-E network is designed to effectively supervise the distribution of predicted segmentation maps by guiding the predictions close to annotated ground truth. To focus on effective lesion region, the BUS-E network first employs a DAF block (details can be seen in the following subsection and Fig. 4) to construct hybrid features by incorporating both intensity-level and geometric-level information. The BUS-E network requires two inputs, including the BUS image and the corresponding lesion segmentation. For the annotated data, the lesion segmentation is obtained via two resources, one was from the ground truth annotated by the physicians and the other was the segmentation probability map generated by the BUS-S network. For the unannotated data, the lesion segmentation is only comprised of the segmentation probability map generated from the BUS-S network. Subsequently, two dense blocks are utilized to generate high-level feature maps. A final dense layer flattens the feature maps and acts as a classifier. According to the final score, the BUS-E evaluates whether the input segmentation result corresponds to the input BUS image. Note that, the output of BUS-E network is a binary score i.e., 1 for good quality and 0 for poor quality.

2.1.3.

Dual-attentive-fusion Block

An effective input encodes informative features can enhance the discernibility of the evaluation network. While directly concatenating the segmented images (segmentation probability maps or the ground truth annotation) and the original BUS images, the difference between the good and poor segmentations may not be significant enough to make the BUS-E network converge to a desirable

status. With the designed DAF block, the difference between the good and poor segmentation can be enhanced by effectively integrating the geometric-level and intensity-level relationship between the original image and the segmented images. Specially, three inputs are merged together the DAF block, including the original BUS images, the segmented images and the inverse of the segmented images. Therefore, not only the lesion region is utilized to provide informative features, but also the background is leveraged as a constraint for evaluating the performance of segmentation. Referring to Fig. 4, a dense block is first utilized to extract features from the original BUS images. Then, two separate paths (path1 and path2) are designated to extract the features of the lesion region and that of the background. In each path, a convolutional layer with kernel size of 3×3 followed by a sigmoid function is employed to generate the attentive rated map for the segmented and the corresponding inverse images. Subsequently, the “Eval feature” generated by the dense block is multiplied by the attentive rated map in an element by element manner to generate the attentive feature and inverse attentive feature. Then, a convolutional layer with kernel size of 1×1 is further employed to refine the feature map. Finally, the features of the two paths are concatenated to generate the dual attentive feature map.

Fig. 4. Illustration of the dual-attentive-fusion block.

2.2. Learning Strategy of the BUS-GAN To define the loss function, the symbols utilized is first enlisted. Both the annotated BUS image and the unannotated image represented by

are of size

( ) and

physicians is denoted as

.

. The BUS-S network and the BUS-E network are

( ) , respectively. Annotated ground truth determined by the is of size

and encompasses two channels, the first channel

contains only the background and the second channel involves only the breast lesion region. ̂ represents the predicted segmentation probability maps.

2.2.1.

Loss function of the BUS-S Network

The BUS-S network is trained by minimizing the following loss function: (1) where

and

represented the supervised segmentation loss and adversarial loss, respectively.

refers to the weight of the adversarial learning and was set to 0.02. The loss function

determines whether the segmentation probability map generated from the

input annotated data is close to the ground truth, and is defined mathematically as: ( where

(∑

)

binary (̂

(

(

)

)

(

(

)

)

(2)

denotes the parameters of the BUS-S network. The dice cross-entropy loss (̂

as

)

(∑

∑ ∑

∑ ∑

cross-entropy ∑

)



̂

(

loss ∑

. The binary cross-entropy loss

refers to the

))

function,

which

)

)



(

dice cross-entropy loss

))

̂

(

(

is ̂

(

and the binary cross-entropy loss

is defined

denoted

as

)) . Particularly, the

simultaneously control the

training of the annotated data. The adversarial loss

is designed to estimate the segmentation probability map from either the

annotated or unannotated data to appear like the ground truth annotation, and is expressed as: ( where

)

(

(

(

)) )

(

denotes the parameters of the BUS-S network.

(

(

)) )

(3)

refers to the binary cross-entropy loss

function, same as that in equation (2).

2.2.2.

Loss function of the BUS-E Network

To distinguish the generated segmentation probability maps from the ground truth, the proposed evaluation network is trained by minimizing the following loss function: E

 X a , X u , Ya ; E  

bce

u  where

 Eva  X bce

a

, Ya  ,1  a 

 Eva  X

u

, Seg  X u

 Eva  X  , 0 bce

represents the parameters of the BUS-E network.

loss function, same as that in equation (2).

and

a

, Seg  X a   , 0



(4)

refers to the binary cross-entropy

corresponds to the loss coefficients of the

annotated samples and unannotated samples, respectively. The training procedure performed well when setting

. Particularly, the first term in equation (4) only worked with the annotated data,

due to the lack of ground truth for unannotated data.

3. Experimental Results 3.1. Datasets The dataset is comprised of 2D BUS images of female patients diagnosed with breast lesion. Only one case from each volunteer participant was used to generate the dataset. The BUS image with the maximum cut surface of breast lesion was set as the case because it is the easiest image for physicians to diagnose [38]. Totally, 2963 breast lesion ultrasound images were involved in this research, in which 2800 images are in-house data collected from West China Hospital of Sichuan University and 163 images are public dataset provided by [39]. Notably, human subject ethical approval was obtained from a relevant committee at West China Hospital of Sichuan University before collecting ultrasound images. Each subject provided written consent prior to the research. Philips IU22 ultrasound scanner (Philips Medical System, Bothell, WA) with a 5- to 12-MHz linear probe was utilized while collecting the data. 2000 in-house images were used as training dataset for BUS-S and BUS-E, while the remaining 800 in-house images and 163 public available images [39] were used as testing dataset. The detailed diagnosis for the in-house dataset is listed in the following Table I. Particularly, the 2000 training images contained 100 annotated images and 1900 unannotated images. For image preprocessing in the training phase, random horizontal flipping, random cropping and intensity normalization were applied on the collected images to generate the images with a uniform size of 224×224. And only intensity normalization was performed in the test phase.

Table I. Diagnosis for the in-house dataset. Training Biopsy Results

Testing Anno.

Unan.

Fibroadenomas

17

340

155

Adenosis of Breast

14

255

117

Adenosis with Fibroadenoma Formation of Breast

16

311

87

Sclerosing Adenosis

1

22

10

Intraductal Papillary Neoplasms

1

12

6

Granulomatous Mastitis

1

24

11

Invasive Ductal Carcinoma

47

867

384

Ductal Carcinoma in Situ

2

46

21

Invasive Lobular Carcinoma

1

23

9

100

1900

800

Total

1400

Benign

Malignant

Total

1400

2800

3.2. Implementation Details and Performance Evaluation The proposed method was implemented on the Tensorflow [40] library. All experiments were conducted on a workstation equipped with a 2.40 GHz Intel Xeon E5-2630 CPU and an NVIDIA GeForce GTX 1080Ti GPU. To train the segmentation network, an Adam optimizer was employed to minimize

and

. The initial learning rate was set to 1e-2, and then decreased by a factor of 5

every 1000 epochs. With the real-time learning rate except ASPP structure was set to

, the learning rate for eq (2) was set to

, and the learning rate for eq (1) was set to

,

. The training

stage was stopped after 5000 epochs. The batch size was set to 10. The annotated results of physicians were regarded as „ground truth‟, five metrics were utilized to evaluate the quality of segmentation probability map, including dice similarity coefficient (DSC) [41-43], Jaccard index (JI) [42, 44], precision [42, 44], Hausdorff distance (HD) [42, 43, 45] and

average distance (AvgD) [42, 43]. DSC, JI, and precision were employed to examine the overlap areas between the two comparisons. HD and AvgD were exploited to measure the Euclidean distance between a computer-identified lesion boundary and the boundary determined by physicians. Higher DSC, JI and precision, along with lower HD and AvgD, correspond to higher similarity between the two compared regions.

3.3. Quantitative and Qualitative Analysis 3.3.1 Tendency of segmentation accuracy with increased unannotated data To insight the tendency of segmentation performance of the proposed model when increasing the percentage of unannotated images, five experiments were exposed on the BUS-GAN model. Each experiment was set with fixed 100 annotated images but with different scales of unannotated images, including 100, 500, 1000, 1500 and 1900, during the training stage. Table II shows the results from the five experiments tested on both the in-house dataset and publicly available dataset. As shown in Table II, the average values of the area overlap metrics (DSC, JI, and Precision) increase and the average values of the boundary error indicators (HD and AvgD) decrease with the increment of unannotated images. The results indicate that the proposed BUS-GAN model can better learn the real distribution of the inputs and estimate the relationship between the inputs and outputs through gradually increasing the unannotated data in the training phase. Meanwhile, the improvement of each quantitative indicator in Table II becomes slower until convergence when the amount of unannotated data exceeds 1000, which is consistent with the logarithmic relationship between the performance and the amount of training data used for representation learning found in [46].

Table II. Testing results of the BUS-GAN model by training with a fixed number of annotated images and a variable number of unannotated images. The segmentation results of the testing data are evaluated in terms of area overlap and boundary error. (Dataset A: in-house testing dataset, Dataset B: public testing dataset) Area Overlap

Boundary Error

#images involved DSC (%)

JI (%)

Precision (%)

HD (pixels)

AvgD (pixels)

Anno.

100

100

100

100

100

3.3.2

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

A

B

A

B

A

B

A

B

A

B

84.31

79.08

74.01

67.68

83.85

78.44

25.87

33.19

6.105

9.264

(8.482)

(11.25)

(11.18)

(15.17)

(12.83)

(16.52)

(18.49)

(34.42)

(5.182)

(13.28)

85.37

79.25

75.75

67.67

84.65

78.38

24.60

32.96

5.940

9.201

(7.726)

(10.86)

(10.90)

(15.21)

(11.33)

(16.23)

(17.87)

(33.82)

(5.068)

(13.39)

86.50

79.53

76.51

67.89

85.63

78.42

24.43

32.65

5.872

9.126

(6.710)

(10.80)

(9.837)

(15.03)

(11.46)

(16.02)

(16.16)

(32.53)

(5.641)

(12.91)

86.76

79.75

76.80

68.00

86.02

78.50

24.37

32.23

5.806

9.098

(6.383)

(10.35)

(9.023)

(14.87)

(11.24)

(15.76)

(16.54)

(31.74)

(5.650)

(12.44)

87.12

79.82

77.62

68.03

86.38

78.58

23.61

32.15

5.364

9.089

(5.650)

(10.38)

(8.642)

(14.98)

(10.79)

(15.95)

(15.76)

(31.85)

(4.902)

(12.55)

Unanno.

100

500

1000

1500

1900

Comparison with state-of-the-art semi-supervised learning methods

To validate that the BUS-GAN can accurately segment the lesion region, a quantitative comparison between the proposed model and three recent semi-supervised learning-based models was conducted, including Feng [23], ASDNet [22], and DAN [21]. The three compared models were trained on in-house BUS images (100 annotated + 1900 unannotated) with their optimal parameters, respectively. t shows the quantitative results of area overlap and boundary distances with five evaluation metrics (DSC, JI, Precision, HD, and AvgD). Among the listed methods, the model proposed by Feng [23] has the lowest area overlap (DSC, JI, and Precision) with the maximum boundary distance error (HD and AvgD). ASDNet [22] and DAN [21] have rather close performance in terms of all the metrics. In contrast, the proposed BUS-GAN model achieves the highest average values accompanied by the lowest standard values on the assessment of area overlap (DSC, JI and Precision), and obtains the lowest average values along with the lowest standard values of boundary error (HD, and AvgD) on both the in-house testing dataset. Moreover, the BUS-GAN can achieve higher segmentation accuracy with statistically significant difference (p < 0.05 for paired t-test) on the five quantitative metrics for both the testing datasets.

Table III. Comparison between the proposed BUS-GAN and other semi-supervised learning methods with five evaluation metrics, including DSC, JI, Precision, HD, and AvgD. (Dataset A: in-house testing dataset, Dataset B: public testing dataset) Area Overlap DSC (%)

Boundary Error JI (%)

Precision (%)

HD (pixels)

AvgD (pixels)

Methods Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

A

B

A

B

A

B

A

B

A

B

83.06

76.37

72.89

65.72

82.85

76.88

28.79

34.98

7.150

9.328

(8.620)

(15.64)

(11.86)

(19.94)

(13.49)

(17.39)

(18.99)

(32.79)

(6.252)

(14.68)

85.00

77.50

74.78

66.89

84.08

77.22

26.03

33.68

6.729

9.749

(7.544)

(14.67)

(11.71)

(16.32)

(12.56)

(16.88)

(18.48)

(33.84)

(5.519)

(14.59)

85.20

77.62

75.42

67.13

84.28

77.54

25.51

34.06

5.894

9.857

(7.331)

(11.77)

(10.51)

(17.62)

(11.52)

(16.43)

(18.33)

(33.52)

(6.123)

(14.84)

87.12

79.82

77.62

68.03

86.38

78.58

23.61

32.15

5.364

9.089

(5.650)

(10.38)

(8.642)

(14.98)

(10.79)

(15.95)

(15.76)

(31.85)

(4.902)

(12.55)

Feng [23]

ASDNet [22]

DAN [21]

Proposed

Fig. 5 illustrates several representative contours estimated from the listed segmentation models in Table III. The estimated contours and the ground truth contours are marked with red and green curves, respectively. Compared with other methods listed in Table III, the testing cases in Fig. 5 from the BUS-GAN model can ensure a higher quality of segmentation. Particularly, the boundary estimated from BUS-GAN is closer to the boundary of ground truth. Moreover, the BUS-GAN can effectively reduce the inferenced pixels of background, particularly pixels with similar intensities (referring to case 4 in Fig. 5). Therefore, compared with state-of-art semi-supervised learning models, the proposed BUS-GAN model appears to be a better choice to automatically estimate the contour of breast lesion.

Fig. 5.

Illustration of the dual-attentive-fusion Sample cases resulted from different methods. From

the first to the fourth column, each column exhibits the predicted contours from “Feng [23]”, “ASDNet [22]”, “DAN [21]”, and the proposed BUS-GAN, respectively. Red curves correspond to the contours from each method, and green curves refer to the contours marked by physicians. Case1-4 derived from the in-house testing dataset and Case5 resulted from the public dataset.

3.3.3

Ablation Study

Taking the BUS-S network as a baseline, we first conduct ablation study to demonstrate the contribution of three elements in this study, including GAN architecture (refer to “+BUS-E”), semi-supervised learning (refer to “+Semi-”), and DAF block (refer to “+Attention”). Table IV and Fig.

6 display the performance of the proposed model equipped with different settings of the three elements. Note that, 100 annotated data were employed as training dataset under the settings without introducing semi-supervised learning (Semi-). In contrast, the additional 1900 unannotated data were employed to create the training dataset under the settings with Semi-.

Table IV. Quantitative analysis of segmentation performance from different settings of the proposed BUS-GAN. (Dataset A: in-house testing dataset, Dataset B: public testing dataset)

+Attention

+Semi-

+BUS-E

Backbone

Area Overlap

BUS-S

















DSC (%)

JI (%)

Boundary Error Precision (%)

HD (pixels)

AvgD (pixels)

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

A

B

A

B

A

B

A

B

A

B

83.11

77.81

72.44

67.54

79.55

77.94

29.32

34.25

6.894

9.989

(9.094)

(13.36)

(11.90)

(15.26)

(13.33)

(16.68)

(19.52)

(34.52)

(5.914)

(14.56)

83.45

78.97

72.67

67.67

83.34

77.34

28.92

33.92

6.687

9.613

(9.102)

(12.02)

(12.08)

(15.65)

(13.48)

(17.41)

(19.02)

(35.36)

(6.256)

(15.25)

83.88

78.94

73.01

67.49

84.55

78.55

28.32

33.32

6.254

9.258

(9.010)

(11.10)

(10.80)

(15.35)

(13.73)

(16.39)

(18.88)

(34.08)

(5.953)

(13.93)

85.75

79.32

75.87

67.98

82.72

78.32

26.28

33.01

5.977

9.123

(8.255)

(11.55)

(11.37)

(15.17)

(12.73)

(16.71)

(18.65)

(33.65)

(5.914)

(13.94)

87.12

79.82

77.62

68.03

86.38

78.58

23.61

32.15

5.364

9.089

(5.650)

(10.38)

(8.642)

(14.98)

(10.79)

(15.95)

(15.76)

(31.85)

(4.902)

(12.55)

Fig. 6. Sample cases resulted from experiments by adding different elements listed in Table IV. From the first to the fifth column, each column exhibits the predicted contours from experiment “Backbone (BUS-S)”, “BUS-S + BUS-E”, “BUS-S + Attention”, “BUS-S + BUS-E + Semi-”, “BUS-S + BUS-E + Semi-+Attention”, respectively. Case1 and Case2 derived from the in-house testing dataset and Case3 resulted from the public dataset.

Table IV illustrates that the segmentation accuracy can be progressively improved by adding the three listed elements. In Table IV, the first three rows correspond to the experiments performed on 100 annotated data and the last two experiments were implemented on both the annotated and unannotated data. For the first three experiments, even with limited annotated data, adding either the element “+BUS-E” or “+Attention” to the segmentation backbone (BUS-S) can achieve an average improvement on area overlap and decrement on boundary error. For the experiments with additional unannotated data, segmentation improvements can be found as compared to the first three experiments. Compared to the “+BUS-E” experiment, the fourth experiment (adding “+BUS-E” and “+Semi-”) can further improve the performance on area overlap and boundary error. Moreover, further adding the attention block to the semi-supervised GAN (corresponding to the fifth experiment), the segmentation

accuracy can further achieve improvement on the area overlap with decrement in the boundary error. That is, the complete BUS-GAN, which is equipped with all the three elements, achieves the lowest standard biases on each metric and produces the highest average DSC (87.12% for in-house data and 79.82% for public data), JI (77.62% for in-house data and 68.03% for public data) and precision (86.38% for in-house data and 78.58% for public data) and the lowest HD (23.61 for in-house data and 32.15 for public data) and AvgD (5.364 for in-house data and 9.089 for public data). Fig. 6 also shows that the complete BUS-GAN can obtain the best contours with the minimum distance from the ground truth contours.

3.3.4

Effectiveness of the segmentation backbone

A reliable segmentation baseline is the preliminary step to construct the semi-supervised GAN-based model. To prove that the proposed segmentation backbone can ensure BUS images achieve a reliable segmentation accuracy, a comparison is performed between the BUS-S network and three prevalent segmentation models, including FCN-16s [20], U-Net [17], and DeepLab [34]. Note that, the models here are implemented in a fully supervised learning manner based on 100 annotated data. According to quantitative results shown in Table V, the proposed baseline BUS-S can obtain the highest area overlap in terms of DSC and JI with least boundary error among all the listed methods on both in-house and public available testing datasets. Although the precision of the BUS-S is around 1% lower than that of U-Net, the BUS-S backbone achieves better segmentation performance if considering the other four metrics. Fig. 7 exhibits the experimental cases from different segmentation methods, in which the boundary predicted from BUS-S is closer to the annotated boundary than the other listed models. Therefore, the backbone BUS-S can provide the complete BUS-GAN model with a

good segmentation baseline.

Table V. Accuracy comparison between the backbone BUS-S and other fully supervised learning-based methods with five evaluation metrics, including DSC, JI, Precision, HD, and AvgD. (Dataset A: in-house testing dataset, Dataset B: public testing dataset) Area Overlap DSC (%)

JI (%)

Boundary Error Precision (%)

HD (pixels)

AvgD (pixels)

Method Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

A

B

A

B

A

B

A

B

A

B

79.33

70.86

67.56

60.44

78.90

71.85

33.10

37.43

8.992

11.21

(13.66)

(16.92)

(17.24)

(19.25)

(15.73)

(19.71)

(26.65)

(35.63)

(12.06)

(15.78)

80.84

75.40

69.22

63.63

80.29

79.01

31.85

34.34

8.430

10.93

(12.35)

(15.74)

(16.11)

(18.36)

(18.04)

(18.04)

(25.35)

(33.05)

(11.38)

(14.32)

81.91

77.44

70.98

67.26

79.20

77.58

31.59

35.03

8.034

10.06

(11.21)

(13.08)

(12.80)

(15.80)

(13.74)

(16.08)

(21.06)

(34.37)

(9.684)

(13.84)

BUS-S

83.11

77.81

72.44

67.54

79.55

77.94

29.32

34.25

6.894

9.989

(backbone)

(9.094)

(13.36)

(11.90)

(15.26)

(13.33)

(16.68)

(19.52)

(34.52)

(5.914)

(14.56)

FCN-16s

U-Net

DeepLab

Fig. 7.

Sample cases resulted from different fully supervised leaning-based methods. Each column

corresponds to the predicted contours from “FCN”, “U-Net”, “DeepLab”, “BUS-S”, respectively (1st 4th column). Case1 and Case2 are sample cases from in-house testing images and Case3 is a sample case from the public dataset.

3.3.5

Impact of the Dual-Attentive-Fusion Block

To prove the designed DAF block can have a positive effect on the prediction of the BUS-E network and further influence the segmentation quality of the whole BUS-GAN model, three types of input are compared here: (1) Experimental case 1, named as “Seg-only”. In this experiment, only the segmented results were taken as the input of the BUS-E network; (2) Experimental case 2, denoted as “Concatenation”. In this experiment, the segmented results were directly concatenated with the original image before giving as input to the BUS-E; and (3) Experimental case 3, denoted as “No-inverse”. In this experiment, without the inverse path (refer to path2 in Fig. 4), only the segmented results and the original images were utilized in the DAF block. Besides, the DAF module is compared with two recent dual attentive networks, the dual attention network (DANet, [47]) and the convolutional block attention module (CBAM, [48]). Both DANet and CBAM adopted an identical input with the DAF block for a fair comparison. The results of the first three rows and the last row in Table VI illustrates the quantitative comparison of different inputs for the BUS-E network, and the segmentation quality gradually improved by introducing more constraint on the inputs of the BUS-E network. Compared with other inputs of the BUS-E network, the DAF block can help the BUS-GAN network in achieving the best performance on two different testing datasets with the highest average of DSC, JI, and precision and lowest average of HD and AvgD. In contrast, the worst-performing input listed in Table VI is the Seg-only experimental cases. Note that, the precision of “No-inverse” experimental case is lower than that of the proposed method. This means that the input of “No-inverse” may introduce more false positive pixels for the network than that of the DAF block, thus reducing the segmentation accuracy. The last three rows

correspond to the results from different dual attention methods. The proposed DAF block can achieve the higher area overlap based on DSC and JI, and precision and fewer boundary errors as compared to DANet and CBAM on both testing datasets with different degrees. Overall, the proposed DAF is validated to facilitate the network in achieving better segmentation performance.

Table VI. Comparisons of the results from different inputs of the proposed DAF block and those from different dual attention methods with five evaluation metrics, including DSC, JI, Precision, HD, and AvgD. (Dataset A: in-house testing dataset, Dataset B: public testing dataset) Area Overlap DSC (%)

JI (%)

Boundary Error Precision (%)

HD (pixels)

AvgD (pixels)

Method Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

Dataset

A

B

A

B

A

B

A

B

A

B

83.35

78.06

72.33

67.61

82.35

78.00

28.20

33.82

6.634

9.635

(8.758)

(12.81)

(11.94)

(14.85)

(13.27)

(16.33)

(18.17)

(33.43)

(5.120)

(14.08)

85.75

79.32

75.87

67.98

82.72

78.32

26.28

33.01

5.977

9.123

(8.255)

(11.55)

(11.37)

(14.17)

(12.73)

(16.71)

(18.65)

(33.65)

(5.914)

(13.94)

Seg-only

Concatenation 86.32

79.57

76.71

68.01

83.40

78.34

25.66

33.24

5.869

9.201

(8.012)

(10.99)

(11.14)

(15.12)

(11.90)

(16.08)

(18.81)

(32.31)

(5.622)

(13.34)

86.07

79.43

76.18

68.00

84.31

78.27

24.30

32.42

5.722

9.184

(7.908)

(10.58)

(11.10)

(14.83)

(12.18)

(15.73)

(18.34)

(32.08)

(5.233)

(12.78)

86.36

79.69

76.72

68.01

84.02

78.39

25.87

32.59

6.191

9.137

(8.492)

(10.65)

(10.16)

(15.01)

(11.22)

(16.32)

(18.76)

(31.35)

(6.486)

(12.86)

87.12

79.82

77.62

68.03

86.38

78.58

23.61

32.15

5.364

9.089

(5.650)

(10.38)

(8.642)

(14.98)

(10.79)

(15.95)

(15.76)

(31.85)

(4.902)

(12.55)

No-inverse

DANet

CBAM

Proposed

Fig. 8 displays several predicted segmentation probability heatmaps from the listed methods in Table VI. In the heatmap, higher absolute value corresponds to higher certainty. The darker red corresponds to higher certainty of the lesion region, and the darker blue refers to higher certainty of background region. The green areas represent high uncertainty, meaning that the area cannot easily distinguish the lesion region from background. As shown in Fig. 8, the “Seg-only” input, which lacks original images, guides the BUS-GAN to generate segmentation probability heatmaps with high uncertainty (green) on

background. In contrast, with other inputs listed in Table VI, the segmentation probability heatmaps can predict higher certainty on the background region. Particularly, with the input produced by the DAF module, the segmentation probability heatmaps exhibit less green uncertainty areas than the other three types of input. This suggests that the proposed DAF block can reduce the disturbance from background and enhance the confidence of lesion region. Meanwhile, either DANet or CBAM fails to achieve high certainty on the lesion region and background due to the insufficient usage of the information contained in lesion region and background. Overall, the segmentation probability heatmaps with the DAF block shows higher certainty in both the background region and the lesion region than the other methods. Therefore, the proposed DAF block provides the BUS-E network with a better capability to differentiate the lesion region from background than “No-Inverse”, “Concatenation” and “Seg-only”.

Fig. 8.

Samples of the segmentation probability heatmaps resulted from different inputs of the BUS-E

network. From the third column to the sixth column, each column represents the segmented probability maps from method “Seg-only”, “Concatenation”, “No-Inverse”, “DANet”, “CBAM” and the proposed method, respectively. Case1-3 correspond to the results from the in-house testing dataset and Case 4

refers to the result from the public dataset.

4

Disscussion Accurate segmentation of lesion from BUS images has crucial significance for computer aided

diagnostic procedures. Usually, traditional methods depend on annotated data to achieve automatic segmentation of BUS images. The large-scale unannotated BUS images need to be fully exploited for automatic segmentation system. In this study, we propose a learning-based model in a semi-supervised fashion, so that information encoded in unannotated data can also be fully utilized to facilitate in improving the accuracy and robustness of lesion segmentation. Due to the low quality of BUS images and large variability of breast lesions, it is a challenging task to improve the lesion segmentation accuracy of BUS images with small amount of annotated data and relatively large-scale unannotated data. Referring to Table III and Fig. 5, the proposed BUS-GAN model can leverage unannotated data to achieve better segmentation accuracy than recent proposed semi-supervised models [21-23]. This suggests that the proposed BUS-GAN is more suitable for lesion segmentation from BUS images. The ablation study then investigates each contributed element of the BUS-GAN. Based on the segmentation backbone, the element “+BUS-E” can bring improvements on segmentation accuracy (refer to Table IV and Fig. 6), this is because the distribution of generated segmentation probability map is closer to that of annotated ground truth by introducing adversarial training. Then, adding the elements “+Semi-” and “+BUS-E” further improves the segmentation accuracy, which suggests that the BUS-GAN architecture can effectively encode the unannotated BUS images by making the predicted segmentations from unannotated data have similar distribution with the annotated ground truth, and facilitate the segmentation task. Also, the element “attention” can boost the

segmentation accuracy based on the experiment equipped with element “+BUS-E” and “+Semi-”, this is because the proposed DAF block can provide corrected features for the BUS-E network by suppressing background and enhancing foreground, thus increasing the effective data for subsequent adversarial training. In the GAN-based architecture, a good generator needs to be effective on both annotated and unannotated data. The generator first needs to ensure good segmentation accuracy on limited annotated data, and then facilitate the BUS-GAN to achieve better segmentation quality on the unannotated BUS images. Referring to Table V and Fig. 7, compared with the prevalent fully supervised segmentation methods, e.g., FCN-16s, U-Net and DeepLab, the baseline BUS-S network proves to achieve desirable segmentation accuracy and provides a reliable backbone for the proposed BUS-GAN model. This can be attributed to the improved blocks deployed in the BUS-S network: (1) atrous-convolutional layer [34, 36]; (2) ASPP [34, 36, 37], and (3) dense block [35]. With the ASPP block and atrous-convolutional layer, the BUS-S can extract multi-scale features which accommodates large appearance variability of breast lesions. Moreover, with atrous-contribution layer, BUS-S can perceive larger field of view with less down-sampling. With larger size of feature map, lesion of small size would effectively remain in high-level images. Additionally, the dense block [35], in the BUS-S network can recall the lower-level features to help generate better higher-level features and ensure that the gradient propagates to the deeper layers. Therefore, the dense block, atrous-convolutional layer, and ASPP block can facilitate the proposed model in extracting better multi-scale features and invariant higher-level features. In the GAN-based architecture, a good discriminator can act as an experienced physician and provide the generator network (BUS-S) a good guidance. The discriminative ability of the evaluator

can be credited to the effective feature representation. Different inputs of the BUS-E network provide different information for the evaluation network. Refer to Table VI, the DAF block can ensure a better segmentation accuracy than the other listed inputs. With the “Seg-only”, the BUS-E network can only learn features from segmentation maps. Without the information of original BUS images, the decision of the evaluator may not be stable and certain. In the “Concatenation” input, the BUS image and the corresponding segmentation result directly combined in channels, namely, the US image and the corresponding segmentation result share the same filters for feature extraction. However, due to the grayscale distribution and texture features of US image, the corresponding segmentation map are usually different, if using the same filters, the final segmentation quality of “Concatenation” may not be satisfactory. Clinically, the intensity distributions of breast lesion and background in collected BUS images varies among individuals and devices. In several clinic cases, where the intensity difference between lesion and background is not that evident, the “No-Inverse” input may lead the BUS-E network to misidentify background as lesion due to the lack of the background information. In the dual-attentive-fusion block, both the background region and the lesion region are taken into consideration. Two separate paths are employed to enhance the features of lesion and the background, respectively. With the inverse path (refer to path 2 in Fig. 4), the contrast between background and lesion can be enhanced, thus enhancing the precision of segmentation by considering the background information. Therefore, the proposed attentive input can help the BUS-GAN achieve better segmentation performance than that of No-Inverse, Concatenation and Seg-only. Unlike the dual attentive design in DANet [47] and CBAM [48], which leveraged a channel attention model to implicitly distinguish lesion from background and a spatial module to learn spatial relationship of feature maps, the proposed DAF block explicitly decouples the feature representation of lesion region

from that of background using two independent spatial attention paths (see in Fig. 4) so that to make full use of the information contained in background and the lesion region to enlarge the contrast between lesion and background, thus facilitating the identification of the lesion region. Besides, considering the boundary ambiguity and abundant speckle noise inherited in BUS images, the proposed evaluator employed the simplified binary score as the evaluation values rather than performing evaluation on the whole images, thus making the evaluation more stable. Extensive experiments demonstrate that the BUS-GAN model can produce promising segmentation maps. Notably, the overlap metrics (DSC, JI, and Precision) of the public dataset are about 8% lower than those of the in-house testing dataset, which is caused by the difference between data domains. Usually, the appearance of ultrasound images varies across acquisition devices, causing a performance drop on automated ultrasound image segmentation task [49]. However, the BUS-GAN can achieve the highest overlap area with the least boundary error as ompared to the state-of-the-art methods, indicating the effectiveness and robustness of the BUS-GAN on multi-site datasets. Accurate segmentation not only can relieve physicians from tedious annotation task but can also facilitate in improving the robustness of the automatic diagnostic system of breast lesion. By effectively increasing the utilization of the unannotated BUS images, the proposed BUS-GAN model can relieve physicians from tedious annotation task. Additionally, the BUS-GAN can provide the automatic diagnostic system with a good segmented breast lesion. In the future, we will continue to collect more BUS images and then utilize the BUS-GAN model for more unannotated BUS images.

5

Conclusion In this study, based on the GAN architecture, we proposed a semi-supervised learning model

BUS-GAN. With the BUS-GAN model, unannotated data can be fully leveraged to improve the segmentation quality of breast lesion from ultrasound images. Besides, the proposed segmentation backbone can ensure the BUS-GAN model with stable and robust segmentation by densely extracting multi-scale features. Moreover, the attentive fusion block embedded into the BUS-GAN can further provide improvement for segmentation quality. Therefore, the proposed segmentation scheme can automatically aid physicians in localizing and diagnosing breast lesion from US scans, thus avoiding the significant influence caused by the experience and subjective viewpoint of an evaluation. Furthermore, the segmentation scheme of breast lesion can be easily extended to the segmentation on other large amount of unannotated BUS images collected from other equipment.

CONFLICT OF INTEREST The authors have no conflicts to disclose.

Acknowledgment The authors would like to thank to the physicians in the Department of Ultrasound, West China Hospital of Sichuan University for their helpful contribution in labeling the boundary of each collected breast US image.

References [1]

R. L. Siegel, K. D. Miller, and A. Jemal, "Cancer statistics, 2019," CA: a cancer journal for clinicians, vol. 69, pp. 7-34,

2019. [2]

W. Chen, R. Zheng, P. D. Baade, S. Zhang, H. Zeng, F. Bray, et al., "Cancer statistics in China, 2015," CA: a cancer

journal for clinicians, vol. 66, pp. 115-132, 2016. [3]

W. H. Kim, W. K. Moon, S. J. Kim, A. Yi, B. La Yun, N. Cho, et al., "Ultrasonographic assessment of breast density,"

Breast cancer research and treatment, vol. 138, pp. 851-859, 2013. [4]

R. J. Hooley, K. L. Greenberg, R. M. Stackhouse, J. L. Geisel, R. S. Butler, and L. E. Philpotts, "Screening US in patients

with mammographically dense breasts: initial experience with Connecticut Public Act 09-41," Radiology, vol. 265, pp. 59-69, 2012. [5]

Y.-L. Huang, D.-R. Chen, and Y.-K. Liu, "Breast cancer diagnosis using image retrieval for different ultrasonic systems,"

in 2004 International Conference on Image Processing, 2004. ICIP'04., 2004, pp. 2957-2960. [6]

D.-R. Chen, R.-F. Chang, W.-J. Kuo, M.-C. Chen, and Y.-L. Huang, "Diagnosis of breast tumors with sonographic texture

analysis using wavelet transform and neural networks," Ultrasound in medicine & biology, vol. 28, pp. 1301-1310, 2002. [7]

R.-F. Chang, W.-J. Wu, W. K. Moon, and D.-R. Chen, "Automatic ultrasound segmentation and morphology based

diagnosis of solid breast tumors," Breast cancer research and treatment, vol. 89, p. 179, 2005. [8]

J. Levman, E. Warner, P. Causer, and A. Martel, "Semi-automatic region-of-interest segmentation based computer-aided

diagnosis of mass lesions from dynamic contrast-enhanced magnetic resonance imaging based breast cancer screening," Journal of digital imaging, vol. 27, pp. 670-678, 2014. [9]

Y. Huang, L. Han, H. Dou, H. Luo, Z. Yuan, Q. Liu, et al., "Two-stage CNNs for computerized BI-RADS categorization

in breast ultrasound images," Biomedical engineering online, vol. 18, p. 8, 2019. [10] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, et al., "A survey on deep learning in medical image analysis," Medical image analysis, vol. 42, pp. 60-88, 2017. [11] T. Brosch, L. Y. Tang, Y. Yoo, D. K. Li, A. Traboulsee, and R. Tam, "Deep 3D convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation," IEEE transactions on medical imaging, vol. 35, pp. 1229-1239, 2016. [12] C. Wang, X. Yan, M. Smith, K. Kochhar, M. Rubin, S. M. Warren, et al., "A unified framework for automatic wound segmentation and analysis with deep convolutional neural networks," in 2015 37th annual international conference of the ieee engineering in medicine and biology society (EMBC), 2015, pp. 2415-2418. [13] W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, et al., "Deep convolutional neural networks for multi-modality isointense infant brain image segmentation," NeuroImage, vol. 108, pp. 214-224, 2015. [14] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, et al., "Brain tumor segmentation with deep neural networks," Medical image analysis, vol. 35, pp. 18-31, 2017. [15] H. Fu, Y. Xu, S. Lin, D. W. K. Wong, and J. Liu, "Deepvessel: Retinal vessel segmentation via deep learning and conditional random field," in International conference on medical image computing and computer-assisted intervention, 2016, pp. 132-139. [16] K.-K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. Van Gool, "Deep retinal image understanding," in International conference on medical image computing and computer-assisted intervention, 2016, pp. 140-148. [17] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015, pp. 234-241. [18] M. U. Dalmış, G. Litjens, K. Holland, A. Setio, R. Mann, N. Karssemeijer, et al., "Using deep learning to segment breast and fibroglandular tissue in MRI volumes," Medical physics, vol. 44, pp. 533-546, 2017. [19] M. H. Yap, M. Goyal, F. M. Osman, R. Martí, E. Denton, A. Juette, et al., "Breast ultrasound lesions recognition: end-to-end deep learning approaches," Journal of Medical Imaging, vol. 6, p. 011007, 2018. [20] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440. [21] Y. Zhang, L. Yang, J. Chen, M. Fredericksen, D. P. Hughes, and D. Z. Chen, "Deep adversarial networks for biomedical image segmentation utilizing unannotated images," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 408-416. [22] D. Nie, Y. Gao, L. Wang, and D. Shen, "ASDNet: Attention based semi-supervised deep networks for medical image

segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018, pp. 370-378. [23] Z. Feng, D. Nie, L. Wang, and D. Shen, "Semi-supervised learning for pelvic MR image segmentation based on multi-task residual fully convolutional networks," in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 2018, pp. 885-888. [24] C. Baur, S. Albarqouni, and N. Navab, "Semi-supervised deep learning for fully convolutional networks," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 311-319. [25] W. Bai, O. Oktay, M. Sinclair, H. Suzuki, M. Rajchl, G. Tarroni, et al., "Semi-supervised learning for network-based cardiac MR image segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 253-260. [26] D. Mahapatra, "Semi-supervised learning and graph cuts for consensus based medical image segmentation," Pattern recognition, vol. 63, pp. 700-709, 2017. [27] L. Gu, Y. Zheng, R. Bise, I. Sato, N. Imanishi, and S. Aiso, "Semi-supervised learning for biomedical image segmentation via forest oriented super pixels (voxels)," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 702-710. [28] J. Xing, Z. Li, B. Wang, B. Yu, F. G. Zanjani, A. Zheng, et al., "Automated Segmentation of Lesions in Ultrasound Using Semi-pixel-wise Cycle Generative Adversarial Nets," arXiv preprint arXiv:1905.01902, 2019. [29] J. Son, S. J. Park, and K.-H. Jung, "Retinal vessel segmentation in fundoscopic images with generative adversarial networks," arXiv preprint arXiv:1706.09318, 2017. [30] A. Lahiri, K. Ayush, P. Kumar Biswas, and P. Mitra, "Generative adversarial learning for reducing manual annotation in semantic segmentation on large scale miscroscopy images: Automated vessel segmentation in retinal fundus image as test case," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 42-48. [31] D. Yang, D. Xu, S. K. Zhou, B. Georgescu, M. Chen, S. Grbic, et al., "Automatic liver segmentation using an adversarial image-to-image network," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2017, pp. 507-515. [32] Y. Xue, T. Xu, H. Zhang, L. R. Long, and X. Huang, "Segan: Adversarial network with multi-scale l 1 loss for medical image segmentation," Neuroinformatics, vol. 16, pp. 383-392, 2018. [33] D. Jin, Z. Xu, Y. Tang, A. P. Harrison, and D. J. Mollura, "CT-realistic lung nodule simulation from 3D conditional generative adversarial networks for robust lung segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018, pp. 732-740. [34] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017. [35] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708. [36] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence, vol. 40, pp. 834-848, 2017. [37] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected crfs," arXiv preprint arXiv:1412.7062, 2014. [38] P. B. Gordon, F. A. Gagnon, and L. Lanzkowsky, "Solid breast masses diagnosed as fibroadenoma at fine-needle aspiration biopsy: acceptable rates of growth at long-term follow-up," Radiology, vol. 229, pp. 233-238, 2003. [39] M. H. Yap, G. Pons, J. Martí, S. Ganau, M. Sentís, R. Zwiggelaar, et al., "Automated breast ultrasound lesions detection using convolutional neural networks," IEEE journal of biomedical and health informatics, vol. 22, pp. 1218-1226, 2017. [40] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, et al., "Tensorflow: A system for large-scale machine

learning," in 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 265-283. [41] L. R. Dice, "Measures of the amount of ecologic association between species," Ecology, vol. 26, pp. 297-302, 1945. [42] M. Xian, Y. Zhang, H.-D. Cheng, F. Xu, B. Zhang, and J. Ding, "Automatic breast ultrasound image segmentation: A survey," Pattern Recognition, vol. 79, pp. 340-355, 2018. [43] Y. Guo, A. Şengür, and J.-W. Tian, "A novel breast ultrasound image segmentation algorithm based on neutrosophic similarity score and level set," Computer methods and programs in biomedicine, vol. 123, pp. 43-53, 2016. [44] Q. Huang, Y. Luo, and Q. Zhang, "Breast ultrasound image segmentation: a survey," International journal of computer assisted radiology and surgery, vol. 12, pp. 493-507, 2017. [45] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, "Comparing images using the Hausdorff distance," IEEE Transactions on pattern analysis and machine intelligence, vol. 15, pp. 850-863, 1993. [46] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, "Revisiting unreasonable effectiveness of data in deep learning era," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 843-852. [47] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, et al., "Dual attention network for scene segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3146-3154. [48] S. Woo, J. Park, J.-Y. Lee, and I. So Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3-19. [49] X. Yang, H. Dou, R. Li, X. Wang, C. Bian, S. Li, et al., "Generalizing deep models for ultrasound image segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2018, pp. 497-505.