Multiscale receptive field based on residual network for pancreas segmentation in CT images

Multiscale receptive field based on residual network for pancreas segmentation in CT images

Biomedical Signal Processing and Control 57 (2020) 101828 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal...

5MB Sizes 0 Downloads 20 Views

Biomedical Signal Processing and Control 57 (2020) 101828

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Multiscale receptive field based on residual network for pancreas segmentation in CT images Feiyan Li a , Weisheng Li a,∗ , Yucheng Shu a , Sheng Qin a , Bin Xiao a , Ziwei Zhan b a b

Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, China Ucchip Information Technology Co., Ltd., China

a r t i c l e

i n f o

Article history: Received 20 February 2019 Received in revised form 16 November 2019 Accepted 8 December 2019 Keywords: Deep neural network Multiscale convolution Pancreas segmentation Residual network

a b s t r a c t Medical image segmentation has made great achievements. Yet pancreas is a challenging abdominal organ to segment due to the high inter-patient anatomical variability in both shape and volume metrics. The UNet often suffers from pancreas over-segmentation, under-segmentation and shape inconsistency between the predicted result and ground truth. We consider the UNet can not extract more deepen features and rich semantic information which can not distinguish the regions between pancreas and background. From this point, we proposed three cross-domain information fusion strategies to solve above three problems. The first strategy named skip network can efficiently restrain the over-segmentation through cross-domain connection. The second strategy named residual network mainly seeks to solve the under- and over- segmentation problem by cross-domain connecting on a small scale. The third multiscale cross-domain information fusion strategy named multiscale residual network added multiscale convolution operation on second strategy which can learn more accurate pancreas shape and restrain over- and under- segmentation. We performed experiments on a dataset of 82 abdominal contrastenhanced three dimension computed tomography (3D CT) scans from the National Institutes of Health Clinical Center using 4-fold cross-validation. We report 87.57 ± 3.26 % of the mean Dice score, which outperforms the state-of-the-art method, producing 7.87 % improvement from the predicted result of original UNet. Our method is not only superior to the other established methods in terms of accuracy and robustness but can also effectively restrain pancreas over-segmentation, under-segmentation and shape inconsistency between the predicted result and ground truth. Our strategies prone to apply to clinical medicine. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction segmentation of organs such as the spleen, liver and pancreas [1–3] in abdominal computed tomography (CT) scans is a crucial task in computer-aided diagnosis (CAD), quantitative and qualitative analyses and surgical assistance. Pancreas segmentation, in particular, is a critical component in CAD systems that perform quantitative imaging analysis of diabetic patients or pancreatic cancer detection. In recent years, with the development of deep learning methods [4–6], the application of convolutional neural networks in medical image analysis [8] and computer-aided diagnosis has become increasingly widespread [1,7]. The network structures of UNet [9], FCN [10], and SegNet [11] in the field of organ segmentation are par-

∗ Corresponding author. E-mail address: [email protected] (W. Li). https://doi.org/10.1016/j.bspc.2019.101828 1746-8094/© 2019 Elsevier Ltd. All rights reserved.

ticularly popular, and these methods have been effectively applied in the segmentation of large organs, such as the heart [12,13], kidneys [14] and lungs [15,16]. However, there is still room for improvement in the segmentation of small organs. In particular, the highly complex anatomical structures of the pancreas remain a difficult task in shape learning. Second, a limited amount of labelled medical image data is available. This inhibits the segmentation from achieving a high degree of accuracy. In addition, the position of the pancreas in the abdominal cavity varies from patient to patient, and the boundary contrast depends on the amount of visceral fat surrounding the pancreas. These factors, along with others, make pancreas segmentation very complicated and prone to over-segmentation and under-segmentation. To meet these challenges, our goal is to provide a powerful segmentation method for the pancreas to make pancreatic segmentation more accurate and robust. Organ segmentation methods are divided into two categories: top-down and bottom-up methods. In top-down methods, a pri-

2

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

ori knowledge, such as an atlas or shape model of the organ, are generated and incorporated into the framework via learning-based shape model fitting [13,14,17] or volumetric image registration [18–20]. In bottom-up methods, segmentation is performed by local image similarity grouping and growing [15] or pixel, super pixel-, or super voxel- based labelling [21]. In general, top-down methods are suitable for organs that can be modelled well by statistical shape models [13,14], whereas, bottom-up methods are more effective for highly non-Gaussian shaped [21] or pathological organs. Most previous work on pancreas segmentation from CT images incorporates a top-down approach [22–24]. Since UNet [9] is used for biomedical image segmentation, deep neural networks have been favoured for use in medical image segmentation. Zhou et al. [25] used the FCN-8 s pre-training model and proposed a fixedpoint model to optimize the pre-training results to obtain an 83.18 % mean Dice improvement over the pancreas segmentation work of Roth. H. et al. [26,27] in 2015 and 2016. Pancreas segmentation work has greatly improved, and Cai et al. [28] increased the mean Dice score to 83.7 % in the latest pancreas segmentation work in 2018. Ma et al. [29] proposed the integration of deep neural network segmentation and the Bayesian statistical shape model in the model used for pancreas segmentation to obtain a mean Dice score of 85.32 % compared to the method of Zhu et al. [30]; this approach used coarse to thin frames, and the coarse network was trained to obtain rough segments and remove the background region. Then, the rough area was passed to the thin network for accurate segmentation to obtain an improved mean Dice score of 84.59 %. Most of the above work combines deep neural networks with traditional methods or adopts new learning processes to obtain more accurate segmentation results. To us, the calculation process is more complicated and more cumbersome. The calculation is not easy to expand and use for clinical medicine. These challenging areas still require improvement. Although the accuracy of pancreatic segmentation has steadily increased yearly, some difficulties remain. The difficulty of pancreas segmentation is mainly reflected by the following aspects. First, the pancreatic organ is small relative to the entire abdominal scan and the pancreas only occupies a small part of the image. Second, the pancreas exhibits highly complex anatomical variability. Third, the boundary around the pancreas is blurred. For a pancreas with large deformation and grey-scale information similar to its surroundings, the extraction of deep features is particularly important. This paper proposes to integrate multiscale convolution and residual blocks into the UNet [9], which not only increases the availability of multiscale features but also deepens the network. Thus, deeper and more advanced semantic features can be extracted. In particular, our method effectively solve the problem of pancreas over-segmentation, under-segmentation and shape inconsistency between the predicted result and ground truth. The below experiment was performed using 4-fold crossvalidation on a dataset of 82 abdominal contrast enhanced 3D CT scans (NIH-82 dataset) of the National Institutes of Health Clinical Center (https://wiki.cancerimagingarchive.net/display/Public/ Pancreas-CT) [31,32]. The main contributions of the proposed work are as follows: • There are three cross-domain information fusion strategies were proposed to efficiently solve the problem of pancreas over-segmentation, under-segmentation, and shape inconsistency between the predicted result and ground truth. • The first strategy named skip network efficiently restrained the over-segmentation through cross-domain connection which added residuals between the encoder and decoder directly for the task of pancreas segmentation. • The second strategy named residual network improved the over- and under- segmentation problem simultaneously which

added residuals to the continuous convolution blocks of the encoder and decoder separately. • The last multiscale cross domain information fusion strategy named multiscale residual network (MR Net) not only constrained over- and under- segmentation but also enhanced shape learning through multiscale convolution operations. • We performed a 4-fold cross-validation experiment on the NIH-82 dataset. To suggest that our method does not further overfit, we experimented with 5 %, 10 %, 20 %, and 50 % data for training, and the test results remain relatively stable. The work of this paper are as follows: Section 1 presents the introduction and a literature survey on pancreas segmentation. Section 2 presents the materials and the proposed method. Section 3 demonstrates experimental settings, experimental results and discussion. Section 4 presents our conclusions. 2. Materials and methods 2.1. Multiscale convolution module The use of convolution kernels of different sizes will result in receiving fields of different sizes, as shown in Fig. 1. The features extracted under different receptive fields are naturally different. The network uses multiple convolution kernels to obtain multiscale information for each pixel. Multiscale methods are able to obtain accurate segmentation details because large scales implicitly provide spatial information, while smaller scales provide detailed information about the local neighbourhood of each pixel. We construct the multiscale convolution module by combining multiple branches with different kernels and convolution layers. Then, a multiscale convolution module was designed to extract feature information at different scales, so the extracted features are more comprehensive and advanced; the module was added to the network structure to deepen the extraction of input image features. In particular, this approach has a significant effect on images with complex backgrounds and similar features to the pancreas. The idea is to use the multiscale expansion convolution proposed by Liu et al. [33] for target detection. Fig. 2. shows the multiscale convolution module. There are three convolution kernels at different scales that are applied to the same input feature map to obtain three different characteristic maps and concatenated to obtain the output feature map Yout . Since the convolution kernel has a size of 1 × 1, it does not need to consider the relationship between target pixels and surrounding pixels. It is mainly used to adjust the number of channels, linearly combine pixels on different channels, and then perform nonlinear operations. Then, the upgrading and dimension reduction functions are performed. To make the number of channels and the network structure consistent, we use the 1 × 1 convolution operation to reduce the number of channels. The multiscale convolution module is described as formula (1): Yout = (Yin × K1 + Yin × K2 + Yin × K3 ) × K1

(1)

K1 , K2 , K3 express convolution kernels with scales 1, 3, and 5, respectively. 2.2. Residual module Through the deep network, we can achieve high accuracy image recognition, speech recognition and other capabilities. Therefore, it is easy to think that a deep network is generally better than a shallow network. To further improve the accuracy of the model, the most direct method is to design the network as deep as possible to increase the accuracy of the model. As the network level increases, the accuracy of the model increases.

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

3

Fig. 1. Different receptive fields with different scale convolution kernels under the same feature map. (1 × 1 convolution kernel, 3 × 3 convolution kernel and 5 × 5 convolution kernel from left to right).

Fig. 2. Schematic diagram of multiscale convolution module.

In fact, He et al. [34] experimentally verified that the training accuracy and test accuracy decrease rapidly when the network level increases to a certain number. However, as the network level becomes deeper, the networks become more difficult to train. Suppose the existing shallow network has reached its saturation accuracy by adding several constant mapping layers. The depth of the network is increased, and the minimum error does not increase. Thus, the deeper network should not increase the error when applied to the training set. The idea of using the identity map to directly pass the output of the previous layer to the back is the inspiration for the famous deep residual network ResNet [34]. The main idea of residual is shown in Fig. 3. ResNet is the champion of image classification, detection and positioning in ImageNet, which He Kaiming, Zhang Xiangyu, Ren Shaoqing and Sun Jian jointly proposed in 2015. Residual networks are easier to optimize and can increase accuracy by adding considerable depth. The core aim is to solve the side effects caused by the increase in depth, which can improve network performance by simply increasing the network depth. 2.3. The proposed network architecture Proposed method for pancreas segmentation from CT scan is discussed in this section. This part mainly expounds three types of network structures: skip network, residual network and multiscale residual network. The pancreas is a small organ relative to the human body. Given its high anatomic variability, its size, shape and position are easy to change, and CT scan images of the pancreas are complex. In CT scans of the pancreas, there is minimal grey difference between

the pancreatic region and the background, and the shape of pancreas is complex and changeable, which is the main reason for the difficulties encountered with pancreas segmentation. We found that although UNet has made great achievements in medical image segmentation, there are still some problems in pancreatic segmentation. First, over-segmentation is noted outside the pancreatic region. Due to the changeable shape, position and size of the pancreas and the minimal difference in the background grey level, the network over segmented the non-pancreatic region. Second, under-segmentation occurs inside the pancreas in some cases. Influenced by the partial volume effect in CT scan, the CT value of lesions less than the thickness of the layer is affected by other tissues in the thickness of the layer. Thus, the measured CT value cannot represent the true CT value of the lesion, resulting in a grey value that cannot truly represent the grey value of the pancreas. Finally, it is not easy to learn of the shape of the pancreas. Because the shape of the pancreas itself is variable, more accurate segmentation results cannot be obtained in some cases. In view of the above three problems, we propose a symmetric encode-decode residual network of multiscale convolution inspired by multiscale convolution and residual network. To enhance feature learning, we deepen the basic convolution layer. Our encoder and decoder are based on a continuous convolution block (CCB), as shown in Fig. 4. This block is composed of three consecutive convolution operations. Fig. 5 presents a symmetrical encode-decode network structure of double skip connections composed of CCB. The last convolution layer in each CCB of the encoder is connected with the first convolution layer in each CCB of the corresponding decoder. The first convolution layer in each CCB of the encoder is added to the last convolution layer of each CCB of the corresponding decoder. This

4

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

Fig. 3. Schematic diagram of residual module.

Fig. 4. The diagram of continuous convolution block.

feature strengthens the learning of depth feature and effectively solves the over-segmentation problem. In this network, the over-segmentation problem of pancreas segmentation has been improved obviously, but a few oversegmentation and under-segmentation problems need to be solved. We consider that the scope of add operation is too large, so we propose to narrow the scope of the addition operation, which is only applied to the first convolution layer and the last convolution layer in each CCB. This step is mainly performed to prevent information loss in the process of deepening the convolution layer. As shown in Fig. 6, the network is called residual network. Based on this setup, we established an effective pancreatic segmentation model that improved the problems of undersegmentation and over-segmentation. However, due to the changeable shape of pancreas, shape learning still needs to be

improved. We solve this problem with a multiscale convolution principle because the convolution of different scales can obtain the feature information of different sizes. For example, the 1 × 1 convolution can be effectively capture the boundary information of pancreas, and the convolution of 3 × 3 or 5 × 5 can capture local features with different receptive fields. It is possible that the area of interest is close to the head or tail of pancreas. To verify the validity of our idea, we replace the concatenate operation in Fig. 9 between CCB1 and CCB9 with the multiscale convolution block in Fig. 7. In experiments, we find that replacing the skip connection between the CCB1 of the encoder and CCB9 of the decoder with different multiscale convolution block will obtain different segmentation results. When replacing the skip connection between the CCB1 of the encoder and CCB9 of the decoder with MCB4, the pancreatic segmentation accuracy will be significantly reduced, as shown in Fig. 8. All of the evaluation metrics decreased significantly and resulted in serious over-segmentation and under-segmentation. Please refer to Section 3.2 for the specific experimental process. In summary, for the pancreas segmentation task, we propose a multiscale symmetric encode-decode residual network, which can improve pancreatic segmentation performance, especially in the pancreatic over-segmentation problem, as shown in Fig. 9. We finally use the MCB3 in Fig. 7 to replace the skip connection between CCB1 of the encoder and CCB9 of the decoder.

Fig. 5. The diagram of the skip network.

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

5

Fig. 6. Residual network diagram.

Fig. 7. The diagram of different multiscale convolution block. (a) represents only a 1 × 1 convolution operation. (b) represents adding a 3 × 3 convolution operation to (a). (c) represents adding a 5 × 5 convolution operation to (b). (d) represents adding a 7 × 7 convolution operation to (c).

3. Experiments and discussion 3.1. Datasets and experimental settings

Fig. 8. The diagram of changes in segmentation results in different multiscale convolution blocks (MCBs).

Experiments are conducted on the public NIH-82 dataset [31,32,35], which contains 82 abdominal contrast-enhanced 3D CT volumes at a size of 512 × 512 × D(D ∈ [181, 146]) under 4-fold cross validation. Briefly, 4-fold cross-validation is a wellestablished experimental method used in pancreatic segmentation [25,28,29,36–38]. Slices are cropped to [192, 256]. The images are then clipped between [-100,240] HU. Considering the common intensity distribution of the pancreas, we scale the image to the range [0, 1]. In this paper, the device configuration used in the experiment is 12-core Intel Xeon (R) CPU E5-2603 v4@ 1.70 GHz, GeForce

6

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

Fig. 9. The diagram of multiscale convolution residual network.

Table 1 The evaluation metrics of pancreas segmentation. Evaluation Metrics

Description

Remarks

Dice Jaccard Precision Recall

Dice = 2(|Y ∩ Yˆ |)/(|Y | ∪ |Yˆ |) Jaccard = |Y ∩ Yˆ |/|Y ∪ Yˆ | Precision = TP/(TP + FP) Recall = TP/(TP + FN) F1 − score = 2 × Precision × Recall/(Precision + Recall)

Y represents ground truth Yˆ represents predict value

F1-score

TP represents true positive value FP represents false positive value FN represents false negative value

GTX TITAN X. In the training process, we set the learning rate to 0.0001, epochs = 100, batch size = 16, and optimizer = Adam. It takes approximately 8 h to train a complete network and 30 s to predict a case. There are five metrics to evaluate pancreatic segmentation results objectively, as described in Table 1. In this paper, the evaluation metric Dice is used as the loss function in the training process. Loss function is defined as follows: L(Y, Yˆ ) =

−2





y i i

y yˆ i i i

+



yˆ i i

(2)

where yi ∈ Y, yˆ i ∈ Yˆ , Y is the ground truth, and Yˆ is the prediction. 3.2. Experiments and results analysis of multiscale residual network with different multiscale convolution blocks To verify that the method of replacing the skip connection between the CCB1 of the encoder and CCB9 of the decoder with MCB3 is the most effective for pancreas segmentation, as shown in Fig. 7, we replace the skip connection between the CCB1 of the encoder and CCB9 of the decoder with MCB1, MCB2, MCB3, and MCB4, separately. The training process of the multiscale residual network with 5 % training data under four different convolution scale is shown in Fig. 10. Table 2 shows the average test results of multiscale residual network with five evaluation metrics under four different MCBs. Of note, replacing the skip connection between the CCB1 of the encoder and CCB9 of the decoder with MCB3 is better than that for MCB1, MCB2 and MCB4. The objective test results of each are presented in Fig. 11. The specific subjective evaluation of each case is shown in Fig. 12, which shows a representative case. As the scale increases, pancreas oversegmentation and under-segmentation are gradually improved until MCB4. The last column in Fig. 12 demonstrates that MCB4 does not improve the performance of pancreas segmentation. MCB4 also

Fig. 10. The training process of the multiscale residual network with 5 % training data under four different MCBs. The abscissa represents the number of iterations, dice coef1 to dice coef4 represent the Dice score of MCB1 to MCB4 under different iterations, and loss1 to loss4 represent the loss value of MCB1 to MCB4 under different iterations.

leads to serious over-segmentation and under-segmentation problems, such as the location marked by the yellow box in Fig. 12. 3.3. Experiments and results analysis between multiscale residual network and the existing state-of-the-art methods We conducted experiments on NIH-82 dataset with 4-fold crossvalidation. The training process of the proposed three network structures, and the test results are shown in Fig. 13. It can be seen that the three network structures converge during the training process, and the test results are also robust. The specific average results are shown in Table 3. All the comparison methods in Table 3 were conducted by 4-fold cross-validation. In terms of the segmentation results, the multiscale residual network gave the highest average Dice score at 87.57 % with a smaller standard deviation of 3.26, and the Dice score for the worst case was 76.67 %. The residual network added residuals to the continuous convolution blocks of the encoder and decoder separately, while the skip network added residuals between the encoder and decoder directly. All three of our networks are better than the state-of-theart approach. Thus, our proposed method is robust in extremely challenging cases. Our experimental results are compared with the state-of-the-art approach to pancreatic CT image segmentation. As shown in Table 3, compared with the fixed-point model proposed by Zhou et al. [25] based on FCN, our method improves pancreatic segmentation by 11.58 percentage points in the most challenging cases.

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

7

Table 2 The average test results of multiscale residual network with five evaluation metrics under four different MCB. Training data

5% Random1

Mean ± std [min, max] (%) Scale

Dice

Jaccard

Precision

Recall

F1-score

MCB1 MCB2 MCB3 MCB4

78.07 ± 5.32 [62.82, 86.82] 78.92 ± 5.11 [66.86, 87.05] 79.79 ± 4.47 [69.87, 87.50] 76.78 ± 6.25 [58.29, 86.50]

65.79 ± 6.87 [47.06, 78.14] 66.87 ± 6.59 [51.96, 78.10] 67.94 ± 5.84 [54.78, 78.49] 64.12 ± 7.82 [43.21, 77.35]

77.31 ± 6.01 [60.76, 88.35] 80.42 ± 5.67 [66.18, 90.90] 77.57 ± 5.16 [64.13, 86.33] 76.56 ± 4.93 [62.51, 85.11]

81.08 ± 6.47 [64.07, 90.64] 79.46 ± 6.28 [64.09, 89.81] 84.25 ± 5.67 [66.46, 92.24] 79.48 ± 8.99 [58.51, 92.35]

78.09 ± 5.31 [62.85, 86.83] 78.94 ± 5.11 [66.89, 87.06] 79.80 ± 4.47 [69.88, 87.51] 76.81 ± 6.24 [58.32, 86.50]

Fig. 11. Each case experimental test results of the multiscale residual network with 5 % training data under four different MCBs.

Fig. 12. Subjective experimental results of a multiscale residual network with 5 % training data under four different MCBs.

8

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

Fig. 13. The training processes and test result diagram of the skip network, residual network and multiscale residual network. Among them, (a) indicates the training process of skip network, (b) indicates the test results of each case of skip network under the 4-fold cross-validation experiment; (c) indicates the training process of residual network, (d) indicates the test results of each case of residual network under the 4-fold cross-validation experiment; (e) indicates the training process of the multiscale residual network, and (f) indicates the test results of each case of the multiscale residual network under the 4-fold cross-validation experiment. In (a) (c) (e), the abscissa represents the number of iterations, the ordinates dice coef1 to dice coef4 represent the Dice score of different iterations under four-fold cross-validation experiment, and the loss1 to loss4 represent the loss value of different iterations under the four-fold cross-validation experiment. In (b) (d) (f), the abscissa represents the case number, and the ordinate represents the different evaluation metrics.

Table 3 Comparison of experimental results between the skip network, residual network, multiscale residual network and existing state-of-the-art methods. Method Zhou et al. [25] Cai et al. [36] Cai et al. [28] Ma et al. [29] DenseUNet [29] H. R. Roth [37] H. R. Roth [38] UNet [9] Skip network Residual network Multiscale residual network

Mean ± std [min, max] (%) Dice

Jaccard

83.18 ± 4.81 [65.10, 91.03] 82.40 ± 6.70 [60.00, 90.10] 83.70 ± 5.10 [59.00, 91.00] 85.32 ± 4.19 [71.04, 91.47] 73.39 ± 8.78 [45.60, 86.50] 76.80 ± 9.40 [43.70, 89.40] 81.30 ± 6.30 [50.60, 88.90] 79.70 ± 7.60 [43.40, 89.30] 86.84 ± 3.63 [76.04, 93.34] 87.04±3.24 [77.83, 93.38] 87.57±3.26 [76.68, 93.40]

– 70.60 ± 9.00 [42.90, 81.90] 72.30 ± 7.04 [41.80, 83.50] 74.61 ± 6.19 [-] 58.67 ± 10.47 [-] – 68.80 ± 8.12 [33.90, 80.10] 66.80 ± 9.60 [27.70, 80.70] 77.75 ± 4.56 [66.68, 87.03] 77.55 ± 4.44 [67.69, 87.50] 78.77 ± 4.34 [68.94, 90.31]

Compared with pancreatic segmentation based on RNN reported by Cai et al. [28], [36], our method is more robust. Ma et al. [29] also achieved 85.32 % accuracy by fine-tuning DenseUNet combined with a Bayesian method, which is among the most advanced state-of-the-art approaches for pancreas segmentation.

Table 4 Comparison of experimental results of the worst case and best case. Method

Worst case Dice (%)

Best Case Dice (%)

Skip network Residual network Multiscale residual network

#50 76.04 #50 77.83 #50 76.67

#72 93.34 #72 93.38 #72 93.40

Our method improved the average Dice score by 7.87 percentage points compared with biomedical image segmentation [9] in 2015. H. R. Rotha [37] proposed a general method for medical image segmentation, but the results are not obvious for the pancreas. The accuracy of the method proposed by H. R. Roth [38] is 6.27 percentage points lower than that of our method. For challenging cases in particular, the accuracy of the method reported by H. R. Roth [38] is 26.07 percentage points worse than that of our method, which shows that our method is more robust. For clinical medicine, our method is a better choice. From Table 3, we can see that our modified network structure exhibits the best performance in the most challenging cases and the most easily segmented cases. As shown in Table 4, the most challenging case is #50, and the most easily segmented case is #72.

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

9

Fig. 14. Segmentation results of the worst and best case under the skip network, residual network and multiscale residual network.

Fig. 14 shows the segmentation results of case #50 and #72 in the NIH-82 dataset. We select the first slice, the last slice, and the middle slice for display. The three network structures we propose are applicable in the cases that are the most difficult and easy to segment. The multiscale residual network not only outperforms other methods with respect to average Dice score but is also optimal in subjective evaluation. This method consistently delivers segmentation results that are most similar to the ground truth in shape. Fig. 15 shows subjective experiment results for the UNet, skip network, residual network, and multiscale residual network. Compared with UNet, the skip network can improve the oversegmentation problem, please refer to slices #1 145, #13 78, #19 117, #45 101 in the yellow box in Fig. 15. The residual network can improve the under-segmentation problem, as seen in slice #9 102 in Fig. 15. The multiscale residual network combines the advantages of the skip network and residual network and shows outstanding shape of the pancreas, as seen in slice #21 88.

with 5 %, 10 %, 20 %, and 50 % data and test the remaining data. This experimental method draws on the information published in the 2019 MICCAI specifically for overfitting problems segmentation tasks [39]. We conducted three random experiments with 5 %, 10 %, 20 %, and 50 % data, separately. Our test results are also the average values obtained by three random experiments. The experimental results are shown in Table 5. First of all, from the proportion of training data, we can see that with the decrease of training data, our five evaluation metrics had not decreased dramatically. Even if only 5 % of the training data was used for training, our experimental evaluation metrics Dice and Jaccard are also comparable to the experimental results of UNet using the 4-fold cross-validation. Second, under the different proportion of training data, we adopted three random experiments. The train set and test set of each random experiment are different, and the test results are quite stable. It can be seen that there is no obvious overfitting problem in our method.

3.4. Experiments and results analysis of multiscale residual network with 5 %, 10 %, 20 %, and 50 % data for training

4. Conclusion

In fact, 4-fold cross-validation demonstrates the effectiveness of our method. Pancreatic segmentation work [25,28,29,36–38] on the NIH-82 dataset adopted 4-fold cross-validation. To further prove the effectiveness of our method, we conducted experiments

There is large shape variation and a similar grey scale between the pancreas and surrounding environment. These features easily lead to over-segmentation, under-segmentation and inconsistency between the predicted shape and the ground truth. To overcome three problems, three cross-domain information fusion strategies

10

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

Fig. 15. Comparison of subjective experiment results under UNet, skip network, residual network, and multiscale residual network.

Table 5 The average test results of multiscale residual network with 5 %, 10 %, 20 %, and 50 % training data under three times random experiments. Training data

5%

10 %

20 %

50 % Skip network Residual network MR Net

Mean±std [min, max] (%) Random

Dice

Jaccard

Precision

Recall

F1-score

Random1 Random2 Random3 Average Random1 Random2 Random3 Average Random1 Random2 Random3 Average Random1 Random2 Random3 Average

79.79±4.47 [69.87, 87.50] 79.23±4.65 [67.38, 90.00] 78.28±5.31 [60.86, 87.52] 79.10±4.84 [60.86, 90.00] 80.82±5.56 [67.27, 89.40] 81.64±4.70 [70.11, 88.56] 82.67±4.10 [73.13, 90.14] 81.71±4.90 [67.27, 90.14] 84.34±3.86 [74.09, 92.19] 84.64±4.25 [74.03, 92.00] 84.07±4.16 [73.07, 90.96] 84.35±4.08 [73.07, 92.19] 86.43±3.21 [79.52, 91.57] 86.74±3.02 [80.13, 92.91] 86.84±2.95 [80.39, 91.11] 86.67±3.04 [79.52, 92.91] 86.84±3.63 [76.04, 93.34] 87.04±3.24 [77.83, 93.38]

67.94±5.84 [54.78, 78.49] 67.24±6.03 [52.70, 82.20] 66.15±6.55 [45.63, 78.43] 67.11±6.16 [45.63, 82.20] 69.38±7.33 [54.21, 81.23] 70.59±5.98 [57.47, 79.92] 71.76±5.61 [59.56, 82.46] 70.57±6.39 [54.21, 82.46] 74.11±5.32 [61.38, 85.70] 74.51±5.89 [60.58, 85.52] 73.87±5.58 [59.20, 84.07] 74.16±5.58 [59.20, 85.70] 77.11±4.51 [67.86, 84.75] 77.49±4.36 [68.74, 86.96] 77.66±4.24 [68.61, 84.01] 77.42±4.34 [67.86, 86.96] 77.75±4.56 [66.68, 87.03] 77.55±4.44 [67.69, 87.50]

77.57±5.16 [64.13, 86.33] 76.43±5.71 [62.58, 89.25] 80.48±6.62 [62.59, 91.99] 78.16±6.08 [62.58, 91.99] 82.67±5.01 [70.20, 93.11] 84.03±5.25 [69.86, 92.82] 80.64±5.68 [64.82, 91.63] 82.44±5.47 [64.82, 93.11] 83.65±4.91 [67.46, 91.93] 84.55±4.67 [71.99, 93.42] 86.11±4.12 [72.04, 93.67] 84.77±4.67 [67.46, 93.67] 83.82±4.16 [73.50, 91.36] 84.69±3.80 [75.12, 92.23] 86.67±3.04 [80.16, 91.54] 85.06±3.86 [73.50, 92.23] 86.48±3.83 [76.52, 94.68] 86.38±3.60 [76.49, 93.09]

84.25±5.67 [66.46, 92.24] 84.48±5.39 [68.23, 92.54] 78.72±7.21 [57.63, 87.99] 82.48±6.67 [57.63, 92.54] 81.13±7.32 [59.19, 91.08] 82.01±8.03 [61.54, 92.18] 86.66±5.15 [72.87, 95.02] 83.27±7.33 [59.19, 95.02] 86.68±5.75 [67.01, 93.75] 86.04±5.48 [71.82, 94.05] 83.74±5.99 [64.66, 92.36] 85.49±5.85 [64.66, 94.05] 90.60±3.45 [81.13, 95.89] 90.12±4.38 [80.33, 95.00] 88.24±4.81 [74.26, 93.32] 89.65±4.34 [74.26, 95.89] 88.48±4.52 [74.84, 94.73] 88.34±4.66 [73.54, 95.42]

79.80±4.47 [69.88, 87.51] 79.26±4.64 [67.42, 90.01] 78.30±5.30 [60.88, 87.52] 79.12±4.84 [60.88, 90.01] 80.85±5.64 [67.30, 89.41] 81.67±4.69 [70.15, 88.56] 82.70±4.09 [73.14, 90.15] 81.74±4.88 [67.30, 90.15] 84.36±3.85 [74.10, 92.19] 84.66±4.23 [74.03, 92.01] 84.10±4.14 [73.11, 90.97] 84.37±4.07 [73.11,92.19] 86.46±3.18 [79.53, 91.58] 86.76±3.01 [80.14, 92.92] 86.87±2.92 [80.41, 91.12] 86.69±3.02 [79.53, 92.92] 86.92±3.18 [78.68, 92.98] 86.90±3.14 [79.89, 93.50]

87.57±3.26 [76.68, 93.40]

78.77±4.34 [68.94, 90.31]

86.63±3.70 [75.76, 93.00]

89.55±4.03 [75.96, 94.69]

87.57±2.89 [80.38, 93.22]

were proposed. Compared with UNet, the skip network can extract deeper semantic information and restrain over-segmentation. The residual network worked better than the skip network for over-segmentation and, especially, for under-segmentation. The multiscale residual network not only solved the problem of oversegmentation and under-segmentation but also constrained the

shape of pancreas, which made the predicted result fit better with the ground truth. It is more effective to add residuals to the continuous convolution blocks of the encoder and decoder separately than to add residuals between the encoder and decoder directly for the task of pancreas segmentation through experiments. We experimentally verified that the addition of the convolution blocks

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

of three different scales between the encoder and decoder can effectively restrain over-segmentation and under-segmentation, especially for pancreas shape. Our method is different from those presented by Zhou et al. [25], who modified the training method, and Ma et al. [29], who combined deep learning with traditional ideas. Compared with advanced methods, our method is easier to use and more robust. We conduct experiments using the public NIH-82 pancreas dataset and report an average Dice score of 87.57 %, which outperforms the state-of-the-art methods. Our method can also be effective when data are limited. CRediT authorship contribution statement Feiyan Li: Conceptualization, Formal analysis, Writing - original draft, Validation, Visualization. Weisheng Li: Funding acquisition, Resources, Project administration, Supervision, Writing - review & editing. Yucheng Shu: Funding acquisition, Writing - review & editing. Sheng Qin: Investigation, Data curation. Bin Xiao: Methodology. Ziwei Zhan: Software. Acknowledgements This work was supported by Chongqing Graduate Student Research and Innovation Project [Nos. CYS19261], the National Natural Science Foundation of China [Nos.61972060, U1713213, 61906024] and the Natural Science Foundation of Chongqingcstc2016jcyjA0407. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] O. Gloger, K. Tönnies, R. Laqua, H. Völzke, Fully automated renal tissue volumetry in MR volume data using prior-shape-based segmentation in subject-specific probability maps, IEEE Trans. Biomedical Engineering 62 (October (10)) (2015) 2338–2351. [2] J.I. Orlando, E. Prokofyeva, M.B. Blaschko, A discriminatively trained fully connected conditional random field model for blood vessel segmentation in fundus images, IEEE Trans. Biomedical Engineering 64 (January (1) (2017) 16–27. [3] R.E. Neal, P.A. Garcia, H. Kavnoudias, F. Rosenfeldt, C.A. Mclean, V. Earl, J. Bergman, R.V. Davalos, K.R. Thomson, In vivo irreversible electroporation kidney ablation: experimentally correlated numerical models, IEEE Trans. Biomedical Engineering 62 (February (2)) (2015) 561–569. [4] A. Krizhevsky, I. Sutskever, G. Hinton, ImageNet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (January (2)) (2012) 1097–1105. [5] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: IEEE Conf. Computer Vision and Pattern Recognition, Columbus, 2014. [6] D. Shen, L. Wang, Guest editorial special issue on deep learning in medical imaging, IEEE Trans. Biomedical Engineering 65 (September (9)) (2018) 1898–1899. [7] Z. Yan, X. Yang, K. Cheng, Joint segment-level and pixel-wise losses for deep learning based retinal vessel segmentation, IEEE Trans. Biomedical Engineering 65 (September (9)) (2018) 1912–1923. [8] G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, J.A.W.M. van der Laak, B. van Ginneken, C.I. Sánchez, A survey on deep learning in medical image analysis, Medical Image Analysis, Med. Image Anal. 42 (December) (2017) 60–88. [9] O. Ronneberger, P. Fischer, T. Brox, “U-net: convolutional networks for biomedical image segmentation, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, Munich, 2015, pp. 234–241. [10] P.V. Tran, A fully convolutional neural network for cardiac segmentation in short-axis MRI, in: IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, 2016. [11] V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: a deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence 39 (December (12)) (2017) 2481–2495.

11

[12] Q. Zheng, H. Delingette, N. Duchateau, N. Ayache, 3-D Consistent and Robust Segmentation of Cardiac Images by Deep Learning with Spatial Propagation, IEEE Trans. Medical Imaging 37 (September (9)) (2018) 2137–2148. [13] Y. Zheng, A. Barbu, B. Georgescu, M. Scheuering, D. Comaniciu, Four-chamber heart modeling and automatic segmentation for 3-D cardiac CT volumes using marginal space learning and steerable features, ” IEEE Trans. Medical Imaging 27 (November (11)) (2008) 1668–1681. [14] R. Cuingnet, R. Prevost, D. Lesage, L. Cohen, B. Mory, R. Ardon, Automatic detection and segmentation of kidneys in 3D CT images using random forests, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, Nice, 2012, pp. 66–74. [15] T. Zhao, D. Gao, J. Wang, Z. Tin, Lung segmentation in CT images using a fully convolutional neural network with multi-instance and conditional adversary loss, in: 2018 IEEE 15th Int. Symposium on Biomedical Imaging (ISBI 2018), Washington, 2018, pp. 505–509. [16] A. Mansoor, U. Bagci, Z. Xu, B. Foster, K.N. Olivier, J.M. Elinoff, A.F. Suffredini, J.K. Udupa, D.J. Mollura, A generic approach to pathological lung segmentation, IEEE Trans. Medical Imaging 33 (December (12)) (2014) 2293–2310. [17] H. Ling, S. Zhou, Y. Zheng, B. Georgescu, M. Suehling, D. Comaniciu, Hierarchical, learning-based automatic liver segmentation, in: IEEE Conf. Computer Vision and Pattern Recognition, Alaska, 2008, pp. 1–8. [18] R. Wolz, C. Chu, K. Misawa, M. Fujiwara, K. Mori, D. Rueckert, Automated abdominal multi-organ segmentation with subject-specific atlas generation, IEEE Trans. Medical Imaging 32 (September (9)) (2013) 1723–1730. [19] C. Chu, M. Oda, T. Kitasaka, K. Misawa, M. Fujiwara, Y. Hayashi, Y. Nimura, D. Rueckert, K. Mori, Multi-organ segmentation based on spatially-divided probabilistic atlas from 3D abdominal CT images, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, Nagoya, 2013, pp. 165–172. [20] Z. Wang, K.K. Bhatia, B. Glocker, A. Marvao, T. Dawes, K. Misawa, K. Mori, D. Rueckert, Geodesic patch-based segmentation, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, Boston, 2014, pp. 666–673. [21] A. Lucchi, K. Smith, R. Achanta, G. Knott, P. Fua, Supervoxelbased segmentation of mitochondria in em image stacks with learned shape features, IEEE Trans. Medical Imaging 31 (February (2)) (2012) 474–486. [22] R. Wolz, C. Chu, K. Misawa, K. Mori, D. Rueckert, Multiorgan abdominal CT segmentation using hierarchically weighted subject specific atlases, in: in Int. Conf. Medical Image Computing and Computer Assisted Intervention, Nice, 2012, pp. 10–17. [23] A. Shimizu, T. Kimoto, H. Kobatake, S. Nawano, K. Shinozaki, Automated pancreas segmentation from three-dimensional contrast-enhanced computed tomography, Int. journal of computer assisted radiology and surgery 5 (January (1)) (2010) 85–98. [24] T. Okada, M.G. Linguraru, Y. Yoshida, M. Hori, R.M. Summers, Y. Chen, N. Tomiyama, Y. Sato, Abdominal multi-organ segmentation of CT images based on hierarchical spatial modeling of organ interrelations, Abdom. Imaging 2014 (2014) 7029 (September) (2011) 173–180. [25] Y. Zhou, L. Xie, W. Shen, Y. Wang, E. Fishman, A. Yuille, A fixed-Point model for pancreas segmentation in abdominal CT scans, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, Quebec, 2017, pp. 693–701. [26] H. Roth, L. Lu, A. Farag, H. Shin, J. Liu, E. Turkbey, R. Summers, deepOrgan: multi-level deep convolutional networks for automated pancreas segmentation, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, Munich, 2015, pp. 556–564. [27] H. Roth, L. Lu, A. Farag, A. Sohn, R. Summers, Spatial aggregation of holistically-nested networks for automated pancreas segmentation, in: in Int. Conf. Medical Image Computing and Computer Assisted Intervention, Athens, 2016, pp. 451–459. [28] J. Cai, L. Le, F. Xing, L. Yang, Pancreas segmentation in CT and MRI images via domain specific network designing and recurrent neural contextual learning, in: IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake, 2018. [29] J. Ma, F. Lin, S. Wesarg, M. Erdt, A novel bayesian model incorporating deep neural network and statistical shape model for pancreas segmentation, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, Granada, 2018, pp. 480–487. [30] Z. Zhu, Y. Xia, W. Shen, E.K. Fishman, A.L. Yuille, A 3D coarse-to-fine framework for volumetric medical image segmentation, in: Int. Conf. 3D Vision, Verona, 2018, pp. 682–690. [31] H.R. Roth, A. Farag, E.B. Turkbey, L. Lu, J. Liu, Ronald M. Summers,. Data From Pancreas-CT, The Cancer Imaging Archive, 2016, http://dx.doi.org/10.7937/ K9/TCIA.2016.tNB1kqBU. [32] K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, F. Prior, The Cancer imaging archive (TCIA): maintaining and operating a public information repository, J. Digit. Imaging 26 (December (6)) (2013) 1045–1057. [33] S. Liu, D. Huang, Y. Wang, Receptive Field block net for accurate and fast object detection, in: European Conf. Computer Vision, Munich, 2018, pp. 404–419. [34] K. He, X. Zhang, S. Ren, Deep residual learning for image recognition, in: IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, 2016, pp. 770–778. [35] R.R. Holger, L. Lu, A. Farag, H. Shin, J. Liu, E.B. Turkbey, R.M. Summers, DeepOrgan: multi-level deep convolutional networks for automated pancreas segmentation, in: in Int. Conf. Medical Image Computing and Computer Assisted Intervention, Munich, 2015, pp. 556–564. [36] J. Cai, L. Lu, Y. Xie, F. Xing, L. Yang, Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss

12

F. Li, W. Li, Y. Shu et al. / Biomedical Signal Processing and Control 57 (2020) 101828

function, in: IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, 2017. [37] H.R. Rotha, H. Odaa, X. Zhoub, N. Shimizua, Y. Yanga, Y. Hayashia, M. Odaa, M. Fujiwarac, K. Misawad, K. Moria, An application of cascaded 3D fully convolutional networks for medical image segmentation, Comput. Med. Imaging Graph. 66 (June) (2018) 90–99. [38] H.R. Roth, L. Lu, N. Lay, A.P. Harrison, A. Farag, R.M. Summers, Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation, Med. Image Anal. 45 (April) (2018) 94–107. [39] Z. Li, K. Kamnitsas, B. Glocker, Overfitting of neural nets under class imbalance: analysis and improvements for segmentation, in: Int. Conf. Medical Image Computing and Computer Assisted Intervention, China, 2019. Feiyan Li graduated from Chongqing University of Posts and Telecommunications in July 2017. She is an M.S. candidate in computer science and technology at Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications. Her research focuses on medical image segmentation. Weisheng Li graduated from School of Electronics & Mechanical Engineering at Xidian University, Xi’an, China, in July 1997. He received the M.S. degree and Ph.D. degree from School of Electronics & Mechanical Engineering and School of Computer Science & Technology at Xidian University, in July 2000 and July 2004, respectively. Currently, he is a Professor of Chongqing University of Posts and Telecommunications and the director of Chongqing Key Laboratory of Image Cognition. His research focuses on intelligent information processing and pattern recognition.

Yucheng Shu received his M.S. degress and Ph.D. degree from School of Software Engineering and School of Computer Science & Technology at Huazhong University of Science & Technology. Currently, he is an assistant professor of Chongqing University of Posts and Telecommunications and Chongqing Key Laboratory of Image Cognition. His research mainly focuses on computer vision and medical image processing.

Sheng Qin graduated from Chongqing University of Science and Technology in July 2018. He is an M.S. candidate in computer science and technology at Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications. Her research interests include image processing and pattern recognition.

Bin Xiao was born in 1982. He received his B.S. and M.S. degrees in Electrical Engineering from Shaanxi Normal University, Xi’an, China in 2004 and 2007, received his Ph. D. degree in computer science from Xidian University, Xi’an, China. He is now working as a professor at Chongqing University of Posts and Telecommunications, Chongqing, China. His research interests include image processing, pattern recognition and digital watermarking.

Ziwei Zhan is an M.S. and graduated from computer science and technology at Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications. His research interests include image processing and pattern recognition. He is currently an algorithmic engineer working on Ucchip Information Technology Co., Ltd.