Multi-Scale Fully Convolutional Network for Gland Segmentation Using Three-Class Classification
Communicated by Pingkun Yan
Journal Pre-proof
Multi-Scale Fully Convolutional Network for Gland Segmentation Using Three-Class Classification Huijun Ding, Zhanpeng Pan, Qian Cen, Yang Li, Shifeng Chen PII: DOI: Reference:
S0925-2312(19)31540-1 https://doi.org/10.1016/j.neucom.2019.10.097 NEUCOM 21492
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
4 February 2019 15 October 2019 26 October 2019
Please cite this article as: Huijun Ding, Zhanpeng Pan, Qian Cen, Yang Li, Shifeng Chen, Multi-Scale Fully Convolutional Network for Gland Segmentation Using Three-Class Classification, Neurocomputing (2019), doi: https://doi.org/10.1016/j.neucom.2019.10.097
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.
Multi-Scale Fully Convolutional Network for Gland Segmentation Using Three-Class Classification Huijun Dinga , Zhanpeng Pana , Qian Cena , Yang Lib , Shifeng Chenc,∗ a Guangdong Provincial Key Laboratory of Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China b Pediatric Orthopaedic Department, Anhui Provincial Childrens Hospital, Anhui 230002, China c Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518060, China
Abstract Automated precise segmentation of glands from the histological images plays an important role in glandular morphology analysis, which is a crucial criterion for cancer grading and planning of treatment. However, it is non-trivial due to the diverse shapes of the glands under different histological grades and the presence of tightly connected glands. In this paper, a novel multi-scale fully convolutional network with three class classification (TCC-MSFCN) is proposed to achieve gland segmentation. The multi-scale structure can extract different receptive field features corresponding to multi-size objects. However, the max-pooling in the convolution neural network will cause the loss of global information. To compensate for this loss, a special branch called high-resolution branch in our framework is designed. Besides, for effectively separating the close glands, a three-class classification with additional consideration of edge pixels is applied instead of the conventional binary classification. Finally, the proposed method is evaluated on Warwick-QU dataset and CRAG dataset with three reliable evaluation metrics, which are applied to our method and other popular methods. Experimental results show that the proposed method achieves the-state-of-the∗ Corresponding
author Email addresses:
[email protected] (Huijun Ding),
[email protected] (Zhanpeng Pan),
[email protected] (Qian Cen),
[email protected] (Yang Li),
[email protected] (Shifeng Chen)
Preprint submitted to Neurocomputing
November 5, 2019
art performance. Discussion and conclusion are presented afterwards. Keywords: histological image, segmentation, multi-scale, fully convolutional network, dilated convolution
1. Introduction As an important histological structure presenting in most organ systems, glands are primarily responsible for secreting proteins and carbohydrates [1]. At the same time, malignant tumors arising from glandular epithelium, also 5
known as adenocarcinomas, are the most prevalent form of cancer, such as the prostate, breast, lung, and colon [2]. Cancer ranks at the forefront of high-risk diseases with a high mortality rate. Correct cancer grading can guide a professional doctor to specify the appropriate treatment plan. The morphology of glands including shape structure and gland formation is crucial for cancer grad-
10
ing [3]. It is better to be analyzed in the digital images since the computer-aided techniques can greatly increase the efficiency of labor-intensive work. With the development of science and technology, stained slides can be saved as digital images, making the above analysis operational [4][5][6]. Furthermore, using computer aided techniques to extract quantitative morphological features from
15
the histology images is proven to be workable for cancer grading [7] [8]. As a prerequisite of extracting features, accurate segmentation of glands is particularly important. In general, manual segmentation of glands is more accurate than automatic segmentation. However, it is time-consuming and expensive to manually seg-
20
ment the glands from the large-size digitized images. Therefore, the study of automatic segmentation methods is meaningful and valuable. With the context of existing computer-aided techniques that can quickly segment glands, accurate gland segmentation remains a challenge. Some reasons are listed below. 1) The higher the degree of malignant gland differentiation is, the more irregular the
25
shape of the glands are. 2) The size of gland is diverse. 3) Some glands are too close to be segmented into separate individuals. Different forms of glands are
2
(a)
(b)
(c)
(d)
Figure 1: Gland Hematoxylin and Eosin (H&E) images. Each gland is marked by the ground truth label in green color. (a)-(d) are different histological grades: benign healthy, benign adenomatous, malignant poorly differentiated, and malignant moderately differentiated, respectively. Red arrows point to the small gap between glands.
shown in Fig. 1. The traditional image processing methods such as region growing method [9] and graphical model [10] are not suitable for the segmentation of differently 30
differentiated glands. These methods are often used for the segmentation of a single gland type. Meanwhile, traditional machine learning methods such as K-means cluster [11], support vector machine [12], and random forest [13] are suitable for different gland types. However, those machine learning methods depend on various hand-crafted feature. With the development of machine
35
learning, lots of deep-learning methods have been applied in the image segmentation problems as well as medical image segmentation, and achieve good performances [14][15][16]. Among that, the convolution neural networks (CNN) [17] utilizes a sliding window (convolution kernel) to learn the features automatically and is able to map low-level features to high-level ones. As derivation of
40
CNN, the fully convolutional networks (FCN) [18] which only consist of convolutional layers has gradually become the mainstream architecture of the image segmentation task [19][20]. The FCN is an end to end learning model which achieves good performance in the semantic segmentation task [18] [21]. Considering the glands with different forms and sizes, a novel multi-scale
3
TCC Gland Segmentation Training
Inference
MSFCN Feed
Preprocessing
Postprocessing
Train FCN
Channel Split
Figure 2: The flow chart of the TCC-MSFCN. The left side represents the training process and the right side represents the inference. The glands are labeled by different colors at last.
45
fully convolutional network (MSFCN) is proposed to extract features with different scales from different convolution stages. Unlike the linear structure of FCN, multi-scale structure can extract different receptive field features corresponding to multi-size objects. On the other hand, considering some tight glands that are hard to be separated, a three-class classification (TCC) method
50
is proposed to separate each gland accurately. In contrast to the commonly used binary classification (BC) method, TCC additionally considers the pixels of the edge between tight glands. It thus is able to divide the tight glands into separated parts instead of misjudged as a whole. The flow chart of the whole system is exhibited in Fig. 2. Firstly, the ground truths of the training set
55
are processed to get the three-class labels. Secondly, the label information and training data are used to train the MSFCN. Finally, the images of test data feed into the MSFCN, followed by a post-processing method to generate the segmented gland mask. The whole system is denoted by TCC-MSFC, and a detailed description is presented is presented in Section 3. The proposed method is
60
mainly verified on the Warwick-QU dataset released from MICCAI 2015 Gland Segmentation (Glas) Contest1 [1]. It achieves the state-of-the-art performance compared to the top-ranked methods in the MICCAI 2015 Glas contest and some other popular methods. In addition, another dataset named Colorectal Adenocarcinoma Gland2 (CRAG) is used to verify that the proposed method is 1 http://www.warwick.ac.uk/bialab/GlaScontest 2 https://warwick.ac.uk/fac/sci/dcs/research/tia/data/mildnet
4
65
not data dependent. The structure of this paper is arranged as follows. Related work is firstly reviewed in Section 2. In Section 3, the proposed method is introduced in details, followed by the evaluation metrics and experiment schedule described in Section 4. In Section 5, the experimental results are presented and discussed.
70
Finally, the conclusion is given in Section 6.
2. Related Work The related studies of automatic gland segmentation reported in recent years are reviewed in this section. Initially, the gland segmentation methods are designed to explicitly identify substructures such as nuclei and lumen. A pixel75
level region growing method is proposed by Wu et al. [9] to separate the nuclei, which is segmented preliminarily by thresholding the image. Afterward, the nuclei region is obtained by an expanding process from the seed points in big empty regions. Farjam et al. [11] distinguish stroma and lumen using k-means clustering of local texture features. Unlike the above methods, Gunduz-Demir
80
et al. [10] propose a graph-based method that decomposes the image into a set of disks which are represented by vertices of a graph. Particularly, The graph connectivity is used to determine seeds for region growing. Based on local color statistics, Cohen et al. [13] utilize a random forest to distinguish different components of gland images such as nuclei, lumen, cytoplasm, and
85
stroma. The crypts are segmented by an active contour approach, and the lumen candidate initialized by the active contour is detected by a pixel-level classification method. All the above methods perform well in benign gland image. However, if the segmentation region is different from the spatial assumptions, those methods
90
usually work unsatisfactorily, especially for the malignant glands with high differentiation. Nguyen et al. [12] use the normal cut method to divide the graph of nuclei and lumen into different components and each component corresponds to a gland. This method is applicable to different shapes of glands because it
5
can segment glands without lumen or multi-lumen. Moreover, a Bayesian infer95
ence based method is proposed by Sirinukunwattana et al. [22]. This method considers the location of adjacent nuclei on the epithelial boundary and prior knowledge of the spatial connection. Each candidate region is treated as a polygon composed of a random number of vertices. Those vertices are located by reversible-jump Markov chain Monte Carlo. After removing some false detec-
100
tions, this method can get a better contours than the previous methods [23]. In the past, different datasets and evaluation metrics are utilized for gland segmentation tast. Therefore, it is difficult to compare the performance of gland segmentation methods fairly. In order to have a fair evaluation, the Warwick-QU dataset containing the information of different histological grades and several
105
reliable evaluation metrics are released in the Glas contest of MICCAI 2015 [1]. In the GlaS contest, some teams achieved impressive performances. The champion team, CUMedVision3 presents a FCN based deep contour-aware networks (DCAN) [24]. The DCAN has dual-task decoder including gland segmentation and contour detection. It fuses two outputs of the decoder to achieve gland in-
110
stance segmentation [25]. For the same reason, the second-placed team, ExB34 [1], proposes a two-path CNNs containing a series of convolution layers in order to extract different features. One path classify the pixels of gland and background, the other classify the pixels of boundaries and non-boundaries. All paths are connected by two fully connected layers. From the GlaS challenge, the deep-learning methods are active area for gland
115
segmentation. The framework is normally based on FCN or patch, and becomes more complicated in recent years. Xu et al. [26] propose a multi-channel neural network to complete different tasks including foreground segmentation, edge detection, and object detection. The special fusion network is used to fuse 120
the multi-channel outputs and generates the instance segmentation of glands. Raza et al. [27] propose step-structure networks that consist of different groups. 3
The Chinese University of Hong Kong. Research and Development.
4 ExB
6
Those groups contain a series of convolution blocks and each output feature map of the block is merged in another block in the next group. Manivannan et al. [23] combine FCN features and hand-crafted features to predict the structure of 125
the patch extracted from the original image. The hand-crafted features in [23] are zoom-out features. They are extracted from different size patches of gland image, and then concatenated together. This method makes use of the information of local fine structure and performs well on the Warwick-Qu dataset. Yan et al. [28] propose the shape-preserving loss for getting good gland shape
130
and separating close glands. This loss function consists of two parts, the shape similarity loss [29] and weighted pixel-wise cross-entropy loss. The deep model proposed by Yan et al. is similar to DCAN, which has a dual-task structure outputting gland segmentation and contour detection. Graham et al. [30] propose a minimal information loss dilated network (MILD-Net) which not only con-
135
siders the high-resolution information by a minimal information loss unit, but also adds dilated convolution and atrous spatial pyramid pooling to augment receptive field. At the end of the network, the random transformation sampling is used to obtain a measure of uncertainty which can improve the accuracy of detection.
140
In addition, some researchers make efforts on how to reduce the workload of annotation. Zhang et al. [31] use the additional unannotated images to train a deep adversarial network. This work proves that unannotated data can be used to train the model in a weakly supervised manner. For the same purpose, Yang et al. [32] propose an active learning method which uses less annotated
145
training data to approximate the performance of using all annotated training data. Follow the job of Yang et al. [32], Xu et al. [33] quantifies the network for reducing the size which improves the speed of inference. This method achieves 1% improvement compared with the work of Yang [32] and reduces memory usage by up to 6.4 times.
150
In conclusion, the methods, such as region growing, graph-based method and traditional machine learning method rely on detecting the components, such as nuclei and lumen to seed in a candidate region for region growing or 7
contour search. These methods are often applied to glands with single degree of differentiation. Thus they are not robust enough. In contrast to those methods, 155
the supervised deep-learning methods with some strategy of separating tight glands are suitable for different types of gland and achieve good performances. 3. Method Description The proposed method is described in three parts namely the TCC labeling, the details of network structure, and the post-processing.
160
3.1. TCC labeling The gland segmentation task needs to separate or label each gland from the background. Fig. 2 shows the flow chart of gland segmentation task that can be viewed as a instance segmentation task [25], which is simplified into two steps: semantic segmentation and post-processing. Each gland is labeled by
165
different colors in post-processing shown by the green dashed box in Fig. 2. At the semantic segmentation, each pixel should be assigned a unique label, belonging to foreground or background. The output of the gland segmentation task is thus a mask image containing labels of different categories. In general, binary classification (BC) is studied and utilized. It refers to the classification of
170
background and gland objects in gland segmentation task. However, in the gland Hematoxylin and Eosin (H&E) image, one situation requires extra attentions that some glands are too close to each other and the edge of gland needs to be accurately located. The tight glands might be misjudged as a whole after BC process, which is so-called connected problem [26]. Once the connected problem
175
occurs, inaccurate segmentation of gland individuals will seriously affect the computer-aided diagnosis because edge shape of the gland is the key to cancer diagnosis and grading. In addition, it also affects the accurate calculation of the number of glands, which may adversely affect subsequent computer-aided techniques or other related applications.
180
In order to solve this problem, a carefully observing of the H&E images is conducted. We find that the lumen of the gland is surrounded by the deep 8
stained nucleus noted by the yellow arrows in Fig. 3 (a). There is a significant difference between the pixels of edge and lumen. Therefore, the edge pixels are determined to be view as the third class for gland segmentation in this paper. 185
The BC labels are converted to TCC labels in the proposed method, and the TCC labels are shown in Fig. 3 (b). A similar TCC attempt has been used for the segmentation of lung cancer nuclei in [34], and a better performance is achieved compared to the BC method. In addition, the top two teams in the MICCAI 2015 Glas contest, CUMedVision and ExB, also consider the edge
190
pixels by using multi-path or multi-network to complete two BC, namely ”background & lumen” and ”background & edge”. In summary, the TCC method is reasonable and applicable for solving the connected problem. Unlike the above related work, the TCC method is implemented by using only a single path end-to-end fully convolutional network in our method.
195
For carrying out the proposed TCC method, three categories, namely background, gland lumen, and gland edge, are considered for label information, and the proposed MSFCN is trained by the three-class label as well. In order to get the three-class label images, a method containing four steps are conducted. Firstly, the ground truth images are binarized. Secondly, the morphological
200
dilation and erosion are applied on the binary images with a disk filter (radius = 5). Afterward, the edge of glands are obtained by subtracting the eroded image from the dilated image. Finally, the edge image and eroded image are combined as the three-class label images. Here the background, gland, and edge are marked as 0, 1, 2, respectively. The label processing is shown in Fig. 4.
205
3.2. Network Structure The image segmentation has reached an end-to-end mode since 2015 when FCN [18] was proposed. Those methods perform well on natural images segmentation. It can output a dense probability map which represents the probability of class for each pixel. The proposed MSFCN also follows the end-to-end fully
210
convolutional structure. The network structure of MSFCN includes backbone network, multi-scale architecture and high-resolution branch. Fig. 5. shows the 9
(a)
(b)
Figure 3: Display of gland edge and TCC labels. (a) and (b) are the edge of the gland and the TCC labels, respectively. The yellow arrows in (a) point to the dark color of the nucleus and different colors are used in (b) to represent the background, gland lumen, and gland edge.
Subtract
Figure 4: The edge class creating process of training labels for TCC. The black dashed arrows represent the combining process of the eroded image and edge image into a three class label image.
architecture of the proposed MSFCN. More details as well as loss function are given in the following subsections. 3.2.1. Backbone Network 215
Backbone network is the encoder of the MSFCN, shown in upper part of Fig. 5. At the beginning, the input image passes through two 3×3 conventional convolution layers. Afterward, two kinds of bottleneck blocks are used in the tail of the MSFCN. One is the general bottleneck block and the other is the dilated bottleneck block shown by the olive green and pink blocks, respectively, in upper
220
part of Fig. 5. The residual bottleneck can economize memory resources and make the training process faster [35]. It is defined as: y = f (x, {Wi }) + Ws x,
10
(1)
HRB
UP Conv 3×3, 3
UP-Conv
UP-Conv
UP-Conv
UP-Conv
UP-Conv
Concatenation
Dilated Bottleneck Block µ=521, λ=2
Bottleneck
Max-pooling
Bottleneck
Dilated Bottleneck Block, µ = 256, λ = 2
Conv 3×3, N-class Conv 1×1, N-class Softmax
Conv 1×1, 3 Conv 1×1, 3
Bottleneck Block µ=128, λ =1
Max-pooling
Bottleneck Block µ =64, λ =1
Bottleneck Bottleneck
BN, ReLU Conv 1×1, µ BN, ReLU Conv 3×3, µ ,λ BN, ReLU Conv 1×1, 4×µ
Max-pooling
BN, ReLU
Conv 3×3, 64
Input
Conv 3×3, 64
Bottleneck
Bottleneck Block, µ = 32, λ =1
Total 10 Bottlenecks
UP-Conv
Figure 5: Architectural details of the proposed MSFCN. Different color represents different kinds of layers and blocks.
y = f (x, {Wi }) + x,
(2)
where x and y represent the input and output of the residual bottleneck, respectively, Wi and Ws represent the weights. Function f = W3 σ3 (W2 σ2 (W1 σ1 )) 225
indicates the residual mapping to be learnt in which σ denotes ReLU activated function. For a residual bottleneck block, Eq. (1) indicates the first residual bottleneck of the block, in which Ws makes x be the same number of channels as f . Eq. (2) is the rest of the residual bottleneck. The details of residual bottleneck are shown by the bule box in the olive green block of Fig. 5. There are
230
three convolution layers in residual bottleneck including two 1×1 convolution layers and one 3×3 convolution layer. The general bottleneck block has intensive convolution kernel whose dilated factor λ is 1. There are 3 general bottleneck blocks used in backbone network and all the general bottleneck block has 3 residual bottlenecks. Unlike
235
the general bottleneck block, the dilated bottleneck block utilize the dilated convolution (DC). The DC is a special convolution that inflates the kernel by inserting spaces, which can dilate the receptive field without adding additional parameters. Besides, the DC is just used in the 3×3 convolution layer of the dilated residual block with dilated factor λ = 2 in this paper. Two dilated bot-
240
tleneck blocks are used in backbone network. The number of dilated residual bottleneck affects the receptive field of the last two multi-scale features, which in turn affects the segmentation of big-size glands. We test the different numbers 11
of dilated residual bottleneck including 6, 8, 10, 12, and the 10-layer dilated residual bottleneck has the best performance. It should be noted that all the 245
convolution layers are followed by the batch normalization and ReLU activated function. Additionally, the down-sampling factor is set to 8 (three max-pooling layers) in MSFCN. The role of fusion layers is to fuse the multi-scale and HDB feature maps shown by the red block at the right side of Fig. 5. Finally, a 3×3 and 1×1
250
convolution layer are used to lower the dimension and the softmax layer is used for pixel-wise classification. 3.2.2. Multi-Scale Architecture Multi-scale objects detection has always been a problem worthy of constant exploration [36]. It is difficult to balance large and small objects. Unfortunately,
255
gland segmentation has such problem. The glands have various shapes and sizes. From the perspective of convolutional network, the different receptive field features (RFFs) are important to multi-size gland segmentation. Therefore different RFFs need to be extracted. For solving this problem, the DCAN [24] attempts to extract different RFFs in the last three convolution layers of
260
the network. However, DCAN does not gain low-level features, resulting in lack of low RFFs. Unlike DCAN, the MSFCN is designed as a multi-scale structure considering different RFFs. The details of MSFCN are shown in Fig. 5. Different RFFs are extracted in different stages of convolution. The feature maps reach the up-convolution layer, named UP-Conv in Fig. 5, along the
265
bright green arrows. They are upsample to the same size as the input image by UP-Conv layers. Subsequently they are sent to the concatenation layer for feature fusion. The UP-Conv layer is a more effective up-sampling method than the upsample strategy used in DCAN. It consists of bilinear interpolation and one convolution layer instead of the commonly used deconvolution [37], because
270
deconvolution can causes checkerboard artifacts [38]. A total of 6 up-conv layers are employed in MSFCN.
12
3.2.3. High-Resolution Branch In general, CNNs always use the max-pooling layer after a series of convolution layers, aiming to increase the size of the receptive field, reduce comput275
ing resources and improve classification accuracy. However, The max-pooling method is unfriendly to the pixel-wise task because it causes the feature map to lose global information [21]. Therefore, a small amount, three layers, of maxpooling layers are currently used in MSFCN. For this problem, the U-net [39] which is a successful deep framework for medical image segmentation and pixel-
280
wise tasks [40] is designed to fuse the corresponding low-level features at each stage of decoder. In order to retain more low-level global information, a special branch, named high-resolution branch (HRB), is proposed, which is shown as the leftmost branch in Fig. 5. The HRB consists of two 1×1 convolution layers. The original
285
image passes HRB and then the HRB features are fused in concatenation layer. This branch is able to provide low-level global information which is beneficial to locate and shape the segmented objects [30]. 3.2.4. Loss Function The softmax function is employed to classify each pixel. In order to supervise
290
classification tasks , the loss function of the network is defined: O (i; θ) = −
X
log [p (i, g (i) ; θ)] ,
(3)
i∈I
where i and θ represent the pixel position in image space I and the parameters (weights and biases) of the network, respectively. p(i, g(x); θ) denotes the predicted probability of pixel i being assigned to ground truth g(i) after softmax classification. 295
3.3. Post-Processing The MSFCN outputs three-channel probability map and the channels express background, gland interior, and gland boundary, respectively. As shown
13
Table 1: Details of Warwick-QU Dataset
histological grade Benign Malignant total
Number of images Training 37 48 85
Test A 33 27 60
Test B 4 16 20
in Fig. 2, the probability map of the gland object and gland boundary are obtained by splitting the output of MSFCN. In order to obtain accurate gland 300
interior, the probability map of gland object in second channel should subtract the probability map of edge in third channel. After subtraction, segmented glands are gotten by thresholding the probability map with threshold T=0.8 in this paper. The radius of disk filter, denoted R is set to 5 pixels to dilate the glands objects for restoring the glands which are eroded in the pre-processing
305
of edge class creating. Finally, the segmented images are resized to original size and the individual gland is labeled by an different integer (background is 0). It should be noted that all steps are processed automatically.
4. Experiment In this section, the details of the evaluation experiment of the proposed 310
method are introduced. There are five subsections in this section, which are dataset, evaluation metrics, implementation details, evaluation of TCC-MSFCN, and evaluation of HRB & DC, respectively. 4.1. Dataset The proposed method is implemented on the Warwick-QU dataset released
315
by the MICCAI 2015 Glas Contest [1]. The dataset has 165 histological images of H&E stained slides with associated intensive ground truths, which were obtained by a Zeiss MIRAX MIDI slide scanner with a resolution of 0.62 µm/pixel. All sample images are extracted from 16 H&E-stained whole-slide images from 16 patients who had different grades of colorectal cancer. The dataset is split
320
into three parts, namely the training set with 85 images, and the test sets part 14
A and part B with 60 and 20 images, respectively. The details of Warwick-QU dataset are shown in Table 1. As shown in the table, benign glands account for more than 50% of the total data in part A, but only 20% in part B. Compared to benign gland images, malignant gland images consist of more irregular and 325
bigger gland which are more difficult to be segmented accurately. Therefore, the segmentation of test set, part B, is more challenging than part A. Another dataset named CRAG released by [30] is also used to evaluate the proposed method. This dataset contains 213 H&E colorectal adenocarcinoma image tiles at 20x magnification with full instance-level annotation. The data
330
is split into train and test sets, which contain 173 and 40 images, respectively. 4.2. Evaluation Metrics The pixel-wise segmentation task is always evaluated by the pixel-level accuracy. In the gland segmentation task, the base unit is a gland object including a number of connected pixels. Therefore, the pixel-level accuracy is not appro-
335
priate and can be replaced by object-level for gland segmentation evaluation. There are three kinds of object-level evaluation metrics used in MICCAI 2015 Glas contest, including F1 score, object-level Dice index, and Hausdorff distance, corresponding to detection accuracy, segmentation performance, and shape similarity, respectively. These metrics are employed to evaluate the performance of
340
the proposed method running on the test sets. 4.2.1. Detection Accuracy For the individual gland, F1 score is utilized to calculate the detection accuracy, and is defined as: F 1 = 2 · P recision · Recall/ (P recision + Recall) ,
(4)
P recision = NT P / (NT P + NF P ) ,
(5)
15
345
Recall = NT P / (NT P + NF N ) ,
(6)
where NT P , NF P , and NF N denote the number of true positives, false positives, and false negatives, respectively. Taking segmented object as the comparison subject, the situation of true positive is presented if it intersects with more than 50% area of a ground truth object. Otherwise, false positive is presented. Taking 350
ground truth object as the comparison subject, false negative is presented if it has no intersection or has intersection less than 50% with a segmented object. The range of F1 score is 0 to 1. The larger the value of F1 score, the better the performance of the evaluated method. 4.2.2. Segmentation Performance
355
The Dice index which is used for segmentation evaluation is defined as follows: D (G, S) = 2 (|G ∩ S|) / (|G| ∪ |S|) ,
(7)
where G denotes a ground truth image, S denotes a segmented image, and | · | denotes the set cardinality. The object-level Dice index is Dobj (G, S) nS nG X X 1 = αi D (GiM ax , Si ) + βj D (Gj , SjM ax ) , 2 i=1 j=1 αi = |Si | /
nS X p=1
!
|Sp | ,
βj = |Gj | /
nG X q=1
!
|Gq | ,
(8)
(9)
where nS represents the number of non-empty segmented objects and nG rep360
resents the number of non-empty ground truth objects. In addition, Si and GiM ax denotes the ith segmented object and a ground truth object that has a maximal intersection with Si . Similarly, Gj and SjM ax denotes jth ground truth object and a segmented object that has a maximal intersection with Gj . Similar to F1 score, the range of object-level Dice index is also 0 to 1, and the
16
365
bigger value indicates the better performance. 4.2.3. Shape Similarity The shape similarity plays an important role in gland segmentation, which represents the performance on morphology likelihood. Hausdorff distance is employed to evaluate shape similarity, which is defined as: H (G, S) = M ax sup inf kx − yk , sup inf kx − yk . x∈G y∈S
370
y∈S x∈G
(10)
Similar to the object-level Dice index, the object-level Hausdorff distance is defined as: Hobj (G, S) nS nG X 1 X = αi H (GiM ax , Si ) + βj H (Gj , SjM ax ) . 2 i=1 j=1
(11)
Unlike F1 score and object-level Dice index, the smaller value of Hausdorff distance indicates the better performance. The minimum value of object level Hausdorff distance is 0, and the maximum value is unlimited. 375
4.3. Implementation Details The Pytorch is used to build the proposed MSFCN. In order to improve performance and confine overfitting, in the training session, some data augmentation methods including random horizontal flip (flip probability: 0.5) and random rotation (range: −45◦ to 45◦ ) are employed to enrich data volume. Af-
380
ter data augmentation processing, the input images and corresponding ground truths are resized to 480x480. In the training process, the initial learning rate is set as 0.08. The effective learning rate policy, ploy learn rate policy (the learn power ing rate is multiplied 1 − M iter ), is used for planning the learning rate axiter
decay. Stochastic Gradient Descent (SGD) is used to optimize the loss function. 385
The weight decay is 0.0005 and the momentum is 0.99. The network is trained without pre-training on a Ubuntu server equipped eight NVIDIA GEFORCE
17
1080Ti GPUs (batchsize is 16) and initialized the parameters by default method in Pytorch. 4.4. Experiments of MSFCN & TCC 390
The two main contributions of the proposed method in this paper are MSFCN and TCC. In order to evaluate them, the MSFCN is compared with two popular networks including FCN-8s [18], and U-Net [39], and the TCC method is compared with commonly used BC method. FCN-8s achieved the good performance on semantic segmentation among several model presented in [18], and
395
the U-net is a successful network for medical images segmentation. FCN-8s and U-Net do not contain multi-scale architecture, so the comparison of MSFCN with FCN-8s and U-Net can verify the improtance of multi-scale architecture for gland segmentation. Since the classification method is connected with different networks for training purpose, a total of 4 methods are compared, namely
400
TCC-MSFCN, BC-MSFCN, BC-FCN, and BC-UNet. Besides, in BC method, a strategy which eroded the ground truth object to expand the gap can prevent neighbouring gland structures from merging. The eroded labels are produced by eroding the ground truth labels of training set using disk filter (radius = 5). finally, the output mask are dilated by a disk filter (radius = 5) to restore the
405
shape. 4.5. Experiments of HRB & DC The HRB and DC are proposed to be incorporated in MSFCN to improve the segmentation performance. An experiment is designed to evaluate the effectiveness of HRB and DC. Three different situations, MSFCN without HRB
410
and DC (w/o-HD), MSFCN without HRB (w/o-H), and MSFCN without DC (w/o-D), are compared to the complete MSFCN using the same implementation method described in Section 3.
18
Table 2: The Results of Different Network with BC and TCC Method BC-FCN BC-UNet BC-MSFCN TCC-MSFCN
F1 score Part A 0.771 0.871 0.893 0.914
Prat b 0.692 0.775 0.841 0.850
Object Dice Part A 0.796 0.828 0.894 0.913
Prat b 0.762 0.801 0.842 0.858
Object Hausdorff Part A 109.637 59.001 48.374 39.848
Prat b 152.135 133.617 105.227 93.244
5. Results & Discussion 5.1. The Evaluation of MSFCN & TCC 415
Following the evaluation procedure described in Section 4.4, three evaluation metrics, namely F1 score, Object Dice and Object Hausdorrf, are used to obtain the evaluation results of four methods, namely BC-FCN, BC-UNet, BC-MSFCN and TCC-MSFCN, shown in Table 2. The TCC-MSFCN shows the best performance, followed by BC-MSFCN, BC-UNet, and BC-FCN. For
420
BC methods, the performances of BC-MSFCN are better than BC-FCN and BC-UNet. The structure of U-Net is not good enough for gland segmentation because the glands have variable sizes, although its special structure pairing up features at different levels is able to provide low-level features for the decoder. Besides, the FCN achieves the worst result because the linear structure
425
of FCN cannot provide different scale features. Therefore, the experimental results prove that the multi-scale architecture of MSFCN has a significant positive effect on gland segmentation with variable sizes. On the other hand, Based on the comparison of TCC-MSFCN and BC-MSFCN, the better performance of TCC-MSFCN reflects the effectiveness of the proposed TCC algorithm. Thus
430
the TCC algorithm can indeed achieve the goal of segmenting adjacent glands. In order to visually compare the segmentation effects, some of the segmentation results are chosen to be shown in Fig. 6. As highlighted by the dashed boxes in the different pictures, the proposed TCC-MSFCN can successfully segment the close glands which is difficult to be achieved by other methods. Sub-
435
sequently, BC-MSFCN achieves good segmentation, but it is still difficult to correctly separate the pixels at the edges between two close glands denoted by
19
Original Image
Ground Truth
BC-FCN
BC-UNet
BC-MSFCN
TCC-MSFCN
Figure 6: The segmented results of different methods. First two rows denote the original images and corresponding ground truths, and the rest rows represent the segmented results of the different methods. The dashed boxes indicate glands that are difficult to separate.
Table 3: The Results of Different Situations of MSFCN Method w/o-HD w/o-D w/o-H TCC-MSFCN
F1 score Part A 0.895 0.902 0.893 0.914
Prat b 0.788 0.805 0.825 0.850
Object Dice Part A 0.885 0.886 0.896 0.913
Prat b 0.819 0.826 0.825 0.858
Object Hausdorff Part A 55.843 48.809 43.752 39.848
Prat b 130.978 104.671 109.761 93.244
the dashed boxes. Therefore, segmentation of compact glands by the TCC algorithm is more effective than the BC algorithm. The results of BC-UNet and BC-FCN are not acceptable, because their segmented glands have a severe con440
nected problem. However, BC-UNet is better than BC-FCN because it fuses low level features while FCN-8s network does not. It shows that the low level global information also used in MSFCN is beneficial to the segmentation of glands. 5.2. The Evaluation of HRB & DC The results of the three evaluation metrics evaluating HRB and DC are
445
shown in Table 3. From this table, The proposed TCC-MSFCN achieves best performance compared with other situations. When the MSFCN without HRB or DC, the performances are get worse. The presented results show that HRB and DC contribute to the MSFCN. To compare it visually, some segmented
20
image are shown in Fig. 7. As highlighted by the dashed boxes, glands seg450
mented by the proposed TCC-MSFCN are able to achieve better integrity and similarity, especially for large size of glands. When the MSFCN without HRB or DC, referring to w/o-D and w/o-H, some pixels of large glands are missing and the shape of segmented gland is not similar to the ground truths. However, the above situation is getting worse while the MSFCN without HRB and DC,
455
referring to w/o-HD. It proves that HRB and DC is useful for gland segmentation. Therefore, the proposed MSFCN combining HRB and DC can achieve better performances.
Original Image
Ground Truth
w/o-HD
w/o-D
w/o-H
TCC-MSFCN
Figure 7: The Segmented results of different situation of our proposed MSFCN.
5.3. The Evaluation of TCC-MSFCN The evaluation results of a total of 12 methods including the proposed TCC460
MSFCN are shown in Table 4. For each method, the evaluation scores calculated by three evaluation metrics introduced in Section 4.2 and its rankings in the 12 methods are presented. Rank sum is calculated by adding 6 rankings (3 evaluation metrics * 2 test sets) of each method, and lower value indicates better overall performance. The results of the 12 algorithms are arranged in
465
ascending order of rank sum. Here, all comparison methods are evaluated on the MICCAI 2015 Glas dataset, and each uses different strategy of separating tight glands. Table 4 clearly shows that the proposed TCC-MSFCN achieves the 21
Table 4: Performance Comparision to Other Methods F1 Score
Object Dice
Object Hausdorff
Method Part A
Part B
Part A
Part B
Part A
Score Rank Score Rank Score Rank Score Rank Score Rank
Part B Score
Rank Sum
Rank
TCC-MSFCN
0.914
4
0.850
3
0.913
1
0.858
1
39.848 1
93.244
2
12
Yang et al. [32]
0.921
2
0.855
1
0.904
5
0.858
1
44.736
4
96.976
3
16
MILD-Net [30]
0.914
4
0.844
4
0.913
1
0.836
6
41.540
2
105.890 5
22
Zhang et al. [31]
0.916
3
0.855
1
0.903
6
0.838
5
45.276
5
104.982 4
24
Yan et al. [28]
0.924
1
0.844
4
0.902
7
0.840
4
49.881
8
106.075 6
30
Xu et al. [26]
0.893
9
0.843
6
0.908
4
0.833
7
44.129
3
116.821 7
36
Manivannan et al. [23]
0.892
10
0.801
7
0.887
9
0.853
3
51.175
9
86.987 1
40
MIMO-Net [27]
0.913
6
0.724
8
0.906
4
0.785
9
49.150
7
133.980 8
42
CUMedVision2
0.912
7
0.716
10
0.897
8
0.781
10
45.418
6
160.347 11
52
ExB3
0.896
8
0.719
9
0.886
10
0.765
11
57.350 11
159.873 10
59
ExB1
0.891
12
0.703
11
0.882
12
0.786
8
57.413 12
145.575 9
64
ExB2
0.892
10
0.686
12
0.884
11
0.754
12
54.785 10
187.442 12
67
smallest rank sum in the evaluated 12 methods. On the whole, compared with the other 11 methods, our proposed TCC-MSFCN has better competitiveness in 470
object Dice and object Hausdorff due to the proposed multi-scale architecture and HRB. In addition, the TCC method separates the close glands from each other, helping to improve the performance of the model. The other deep-learning methods also show competitive performances. The method of Yang et al. achieves the second smallest rank sum in Table 4. It
475
presents the good performance on F1 score which indicates a good performance of detection accuracy. An ensemble of multiple networks are used and the average for the final prediction is taken. This is why it can get a higher F1 score than us. However, the lack of high-resolution information makes it uncompetitive on object Dice and object Hausdorff. Besides, multiple networks are adopted re-
480
sulting in a problem that it incurs a high model complexity and complicated training process. The same problem also occurs in other algorithms that use multiple networks, including Xu et al., MIMO-Net, and ExB. The MILD-Net achieves the third place in rank sum. The ideas of MILD-Net is similar to the proposed MSFCN that It also considers the high-resolution information and
485
edge pixels. Therefore, it achieves good performance on object Dice and object Hausdorff of part A. However, MILD-Net dose not consider the multi-scale features which are important for segmentation of multi-size objects. Yan et al. use the shape-preserving loss to balance the importance between boundary detection
22
Table 5: Eveluation Relults on CRAG
Mtehod
F1 score
Object Dice
Object Hausdorff
MILD-Net
0.825
0.875
160.140
TCC-MSFCN
0.876
0.892
130.038
and gland segmentation and achieves the best F1 score of Part A. The possible 490
reason why the method of Yan et al. is not satisfied with other evaluation indicator is that the network structure does not consider high-resolution information for reshaping the gland. Another notable method proposed by Manivannan et al. [23] is the structure prediction method who gets the first place in object Hausdorff for test dataset part B. This excellent performance represents a good
495
shape similarity achieved, especially for malignant glands. This advantage may be due to the fact that the hand crafted features are combined to capture the local structure. However, too much attention to local information and neglect of global information make the method not perform well in other evaluation indicators. Therefore, the method ranks only sixth in the rank sum, i.e. the overall
500
performance. Zhang et al. [31] use additional 100 unannotated images to train a generative adversarial network (GAN). The adversarial network can improve the performance of segmented network. However, the segmented network dose not consider the low-level features and loss high-resolution information. Therefore, the performances are not competitive. In addition, the GAN is prone to
505
collapse, making it difficult to be trained [41]. CUMedVision2 [24] also considers the edge information and multi-scale features but it does not work well on part B. The possible reason is that it just concatenates the last three convolution feature maps without considering low-level features, and excessive down-sampling is applied. This finding also proves that the proposed mutil-scale architecture
510
is workable and useful. CUMedVision2 and ExB are the top two teams in the 2015 MICCAI Glas contest. It is reasonable that their performances are not as good as the methods proposed in recent years. Their results serve as a reference. The results evaluated on CRAG dataset are shown in Table 5. MILD-Net also use this dataset. Here, the evaluation criteria still use the above three 23
Original Image
Ground Truth
TC-MSFCN
Original Image
Ground Truth
TC-MSFCN
Figure 8: The examples of the segmented result using CRAG dataset. Here, each row of the left and right regions represents one example, and a total of six examples are shown.
515
evaluation metrics introduced in Section 4.2. As shown in Table 5, the proposed TCC-MSFCN get a good performance and better than MILD-Net. Similar to the results on Warwick-QU dataset , the proposed method also achieve thestate-of-the-art performance on the CRAG dataset. It proves that the proposed method does not have data dependence and can accomplish the task of gland
520
segmentation well. Some segmented results are shown in Fig. 8. The proposed method has a shortcoming that the white lumen is sometimes categorized as the background instead of the gland region, which is shown in the first row of Fig 9. By analyzing the training set, we found that a lot of white regions are marked as the background, shown as the (b) in Fig. 9. The
525
difference between two kinds of white regions is whether they are surrounded by the nuclei or other stained histology. Therefore, it is challenging to solve this problem fundamentally.
6. Conclusion In this paper, a novel method, TCC-MSFCN, is proposed for automatic 530
segmentation of the glands. The MSFCN is designed with a multi-scale struc24
(a)
(b) Figure 9: The examples of the segmented result and training set have a white region. (a) is the example of the segmented result. The left one is original image, middle one is ground truth, and right one is segmented result. (b) is the training images. The first row is original images and second row is corresponding ground truths. the rectangular boxes represent the white region.
ture combining the high-resolution branch, dilated convolution, and residual structure. The MSFCN is suitable for segmentation of objects with large-scale variations such as for gland segmentation. Besides, proposed TCC is useful for separating the close glands. A series of experiments are applied to show the su535
periority of the proposed method and the effectiveness of various components, such as TCC, HRB, and DC. The proposed method achieves the state-of-the-art performance on the MICCAI 2015 Glas dataset and CRAG dataset. As discussed previously, some white regions are easy to be detected falsely. To address these shortcomings, the future work will focus on refining the network
540
architecture to further improve its generality. Moreover, the method will be applied to other histological images for extended application usage.
Declaration of Competing interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported
25
545
in this paper.
Acknowledgment The authors would like to thank the MICCAI 2015 Glas contest for providing the Warwick-QU dataset and Lin Yang for answering the queries about the evaluation metrics. This work was supported in part by National Natural Science 550
Foundation of China U1713203, Shenzhen Science and Technology Innovation Commission KQJSCX20180330170238897 and the Guangdong Medical Science and Technology Research Fund A2018055.
References References 555
[1] K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, P.-A. Heng, Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez, A. B?hm, O. Ronneberger, B. B. Cheikh, D. Racoceanu, P. Kainz, M. Pfeiffer, M. Urschler, D. R. Snead, N. M. Rajpoot, Gland segmentation in colon histology images: The glas challenge contest, Medical Image Analysis 35 (2017) 489 –
560
502. doi:https://doi.org/10.1016/j.media.2016.08.008. [2] W. D. Travis, E. Brambilla, M. Noguchi, A. G. Nicholson, K. Geisinger, Y. Yatabe, C. A. Powell, D. Beer, G. Riely, K. Garg, J. H. M. Austin, V. W. Rusch, F. R. Hirsch, J. Jett, P.-C. Yang, M. Gould, American Thoracic Society, International association for the study of lung can-
565
cer/american thoracic society/european respiratory society: international multidisciplinary classification of lung adenocarcinoma: executive summary, Proceedings of the American Thoracic Society 8 (5) (2011) 381385. doi:10.1513/pats.201107-042st. URL https://doi.org/10.1513/pats.201107-042ST
26
570
[3] C. C. Compton, Updated protocol for the examination of specimens from patients with carcinomas of the colon and rectum, excluding carcinoid tumors, lymphomas, sarcomas, and tumors of the vermiform appendix: a basis for checklists, Archives of pathology & laboratory medicine 124 (7) (2000) 1016–1025.
575
[4] M. May, A better lens on disease, Scientific American 302 (5) (2010) 74–77. [5] L. Zhang, H. Kong, C. Ting Chin, S. Liu, X. Fan, T. Wang, S. Chen, Automation-assisted cervical cancer screening in manual liquid-based cytology with hematoxylin and eosin staining, Cytometry. Part A : the journal of the International Society for Analytical Cytology 85 (3) (2014) 214230.
580
doi:10.1002/cyto.a.22407. URL https://doi.org/10.1002/cyto.a.22407 [6] J. Niu, C. Li, H. Wu, X. Feng, Q. Su, S. Li, L. Zhang, D. T. W. Yew, E. Y. P. Cho, O. Sha, Propidium iodide (pi) stains nissl bodies and may serve as a quick marker for total neuronal cell count, Acta Histochemica 117 (2)
585
(2015) 182 – 187. doi:https://doi.org/10.1016/j.acthis.2014.12. 001. [7] M. N. Gurcan, L. E. Boucheron, A. Can, A. Madabhushi, N. M. Rajpoot, B. Yener, Histopathological image analysis: A review, IEEE Reviews in Biomedical Engineering 2 (2009) 147–171. doi:10.1109/RBME.
590
2009.2034865. [8] S. Hinojosa, K. G. Dhal, M. A. Elaziz, D. Oliva, E. Cuevas, Entropybased imagery segmentation for breast histology using the stochastic fractal search, Neurocomputing 321 (2018) 201 – 215. doi:https://doi.org/10. 1016/j.neucom.2018.09.034.
595
[9] H. Wu, R. Xu, N. Harpaz, D. Burstein, J. Gil, Segmentation of intestinal gland images with iterative region growing, Journal of microscopy 220 (Pt 3) (2005) 190204. doi:10.1111/j.1365-2818.2005.01531.x. URL https://doi.org/10.1111/j.1365-2818.2005.01531.x 27
[10] C. Gunduz-Demir, M. Kandemir, A. B. Tosun, C. Sokmensuer, Automatic 600
segmentation of colon glands using object-graphs, Medical Image Analysis 14 (1) (2010) 1 – 12. doi:https://doi.org/10.1016/j.media.2009.09. 001. [11] R. Farjam, H. Soltanian-Zadeh, K. Jafari-Khouzani, R. Zoroofi, An image analysis approach for automatic malignancy determination of prostate
605
pathological images 72 (2007) 227–40. [12] K. Nguyen, A. Sarkar, A. K. Jain, Prostate cancer grading: Use of graph cut and spatial arrangement of nuclei, IEEE Transactions on Medical Imaging 33 (12) (2014) 2254–2270. doi:10.1109/TMI.2014.2336883. [13] A. Cohen, E. Rivlin, I. Shimshoni, E. Sabo, Memory based active contour
610
algorithm using pixel-level classified images for colon crypt segmentation, Computerized Medical Imaging and Graphics 43 (2015) 150 – 164. doi: https://doi.org/10.1016/j.compmedimag.2014.12.006. [14] S. Hussain, S. M. Anwar, M. Majid, Segmentation of glioma tumors in brain using deep convolutional neural network, Neurocomputing 282 (2018) 248
615
– 261. doi:https://doi.org/10.1016/j.neucom.2017.12.032. [15] B. Lei, S. Huang, R. Li, C. Bian, H. Li, Y.-H. Chou, J.-Z. Cheng, Segmentation of breast anatomy for automated whole breast ultrasound images with boundary regularized convolutional encoderdecoder network, Neurocomputing 321 (2018) 178 – 186. doi:https://doi.org/10.1016/j.neucom.
620
2018.09.043. [16] K. Hu, Z. Zhang, X. Niu, Y. Zhang, C. Cao, F. Xiao, X. Gao, Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function, Neurocomputing 309 (2018) 179 – 191. doi:https://doi.org/10.1016/j.neucom.2018.
625
05.011.
28
[17] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. doi:10.1109/5.726791. [18] E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for seman630
tic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (4) (2017) 640–651. doi:10.1109/TPAMI.2016.2572683. [19] Y. Zhu, C. Zhang, D. Zhou, X. Wang, X. Bai, W. Liu, Traffic sign detection and recognition using fully convolutional network guided proposals, Neurocomputing 214 (2016) 758 – 766. doi:https://doi.org/10.1016/
635
j.neucom.2016.07.009. [20] J. Mo, L. Zhang, Y. Feng, Exudate-based diabetic macular edema recognition in retinal images using cascaded deep residual networks, Neurocomputing 290 (2018) 161 – 171. doi:https://doi.org/10.1016/j.neucom. 2018.02.035.
640
[21] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4) (2018) 834–848. doi:10.1109/TPAMI.2017. 2699184.
645
[22] K. Sirinukunwattana, D. R. J. Snead, N. M. Rajpoot, A stochastic polygons model for glandular structures in colon histology images, IEEE Transactions on Medical Imaging 34 (11) (2015) 2366–2378. doi:10.1109/TMI. 2015.2433900. [23] S. Manivannan, W. Li, J. Zhang, E. Trucco, S. J. McKenna, Structure
650
prediction for gland segmentation with hand-crafted and deep convolutional features, IEEE Transactions on Medical Imaging 37 (1) (2018) 210–221. doi:10.1109/TMI.2017.2750210.
29
[24] H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, P.-A. Heng, Dcan: Deep contouraware networks for object instance segmentation from histology images, 655
Medical Image Analysis 36 (2017) 135 – 146. doi:https://doi.org/10. 1016/j.media.2016.11.004. [25] J. Dai, K. He, J. Sun, Instance-aware semantic segmentation via multitask network cascades, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3150–3158. doi:10.1109/CVPR.
660
2016.343. [26] Y. Xu, Y. Li, Y. Wang, M. Liu, Y. Fan, M. Lai, E. I. Chang, Gland instance segmentation using deep multichannel neural networks, IEEE Transactions on Biomedical Engineering 64 (12) (2017) 2901–2912. doi:10.1109/TBME. 2017.2686418.
665
[27] S. E. A. Raza, L. Cheung, D. Epstein, S. Pelengaris, M. Khan, N. M. Rajpoot, Mimonet: Gland segmentation using multi-input-multi-output convolutional neuralnetwork, in: M. Vald´es Hern´ andez, V. Gonz´ alez-Castro (Eds.), Medical Image Understanding and Analysis, Springer International Publishing, Cham, 2017, pp. 698–706.
670
[28] Z. Yan, X. Yang, K.-T. T. Cheng, A deep model with shape-preserving loss for gland instance segmentation, in: A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-L´ opez, G. Fichtinger (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, Springer International Publishing, Cham, 2018, pp. 138–146.
675
[29] Z. Yan, X. Yang, K. Cheng, A skeletal similarity metric for quality evaluation of retinal vessel segmentation, IEEE Transactions on Medical Imaging 37 (4) (2018) 1045–1057. doi:10.1109/TMI.2017.2778748. [30] S. Graham, H. Chen, J. Gamper, Q. Dou, P.-A. Heng, D. Snead, Y. W. Tsang, N. Rajpoot, Mild-net: Minimal information loss dilated network
680
for gland instance segmentation in colon histology images, Medical Image
30
Analysis 52 (2019) 199 – 211. doi:https://doi.org/10.1016/j.media. 2018.12.001. [31] Y. Zhang, L. Yang, J. Chen, M. Fredericksen, D. P. Hughes, D. Z. Chen, Deep adversarial networks for biomedical image segmentation utilizing 685
unannotated images, in: M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, S. Duchesne (Eds.), Medical Image Computing and Computer-Assisted Intervention
MICCAI 2017, Springer International
Publishing, Cham, 2017, pp. 408–416. [32] L. Yang, Y. Zhang, J. Chen, S. Zhang, D. Z. Chen, Suggestive annotation: 690
A deep active learning framework for biomedical image segmentation, in: M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, S. Duchesne (Eds.), Medical Image Computing and Computer-Assisted Intervention MICCAI 2017, Springer International Publishing, Cham, 2017, pp. 399–407.
695
[33] X. Xu, Q. Lu, Y. Hu, L. Yang, X. S. Hu, D. Z. Chen, Y. Shi, Quantization of fully convolutional networks for accurate biomedical image segmentation, CoRR abs/1803.04907. arXiv:1803.04907. URL http://arxiv.org/abs/1803.04907 [34] N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane, A. Sethi, A
700
dataset and a technique for generalized nuclear segmentation for computational pathology, IEEE Transactions on Medical Imaging 36 (7) (2017) 1550–1560. doi:10.1109/TMI.2017.2677499. [35] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recogni-
705
tion (CVPR), Vol. 00, 2016, pp. 770–778. doi:10.1109/CVPR.2016.90. URL doi.ieeecomputersociety.org/10.1109/CVPR.2016.90 [36] T.-Y. Lin, P. Doll´ ar, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature Pyramid Networks for Object Detection, arXiv e-prints (2016) arXiv:1612.03144arXiv:1612.03144. 31
710
[37] H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, IEEE Computer Society, Washington, DC, USA, 2015, pp. 1520–1528. doi:10.1109/ICCV.2015.178. URL http://dx.doi.org/10.1109/ICCV.2015.178
715
[38] P. Zhang, D. Wang, H. Lu, H. Wang, B. Yin, Learning uncertain convolutional features for accurate saliency detection, in: 2017 IEEE International Conference on Computer Vision (ICCV), Vol. 00, 2018, pp. 212–221. doi:10.1109/ICCV.2017.32. URL doi.ieeecomputersociety.org/10.1109/ICCV.2017.32
720
[39] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: N. Navab, J. Hornegger, W. M. Wells, A. F. Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing, Cham, 2015, pp. 234–241.
725
[40] W. Yao, Z. Zeng, C. Lian, H. Tang, Pixel-wise regression using u-net and its application on pansharpening, Neurocomputing 312 (2018) 364 – 371. doi:https://doi.org/10.1016/j.neucom.2018.05.103. [41] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, arXiv e-prints (2017) arXiv:1701.07875arXiv:1701.07875.
32
730
Biography
Huijun Ding received the B.Eng. degree in electronic engineering and information science from the University of Science and Technology of China, (USTC), Hefei, in 2006, and the Ph.D. degree from the School of Electrical and 735
Electronic Engineering, Nanyang Technological University (NTU), Singapore, in 2011. Afterward, she was a postdoctoral research fellow in the Department of Electronic Engineering of The Chinese University of Hong Kong (CUHK) before joining Shenzhen University, China, in 2013. Her current research interests include speech and image processing, pattern recognition, machine learning,
740
nanomaterial-enabled acoustic devices, etc.
Zhanpeng Pan received the B.E. degree from the Guangdong Pharmaceutical University, Guangzhou, in 2016. He is now a graduate student in Shenzhen University. His research interests include medical image processing and machine 745
learning. 33
Qian Cen received the B.E. degree from the Hubei University, Hubei, in 2017. She is now a graduate student in Shenzhen University. Her research interests include medical image processing and machine learning.
750
Yang Li received the bachelor of clinical medicine degree from Anhui Medical University (AHMU), Hefei, China, in 2004. He is now an Associate Chief Physician in Anhui Provincial Childrens Hospital. He is good at the diagnosis and treatment of childrens fracture, and has rich experience in the treatment 755
of congenital diseases for children. Shifeng Chen received the B.E. degree from the University of Science and Technology of China, Hefei, in 2002, the M.Phil. degree from City University of Hong Kong, Hong Kong, in 2005, and the Ph.D. Degree from the Chinese University of Hong Kong, Hong Kong, in 2008. He is now an Associate Professor in the Shenzhen Institutes of Advanced Technology,
760
Chinese Academy of Sciences, China. His research interests include computer vision and machine learning.
34