Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks

Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks

Author’s Accepted Manuscript Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks Xipeng P...

1MB Sizes 12 Downloads 172 Views

Author’s Accepted Manuscript Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks Xipeng Pan, Lingqiao Li, Huihua Yang, Zhenbing Liu, Jinxin Yang, Lingling Zhao, Yongxian Fan www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(16)31376-5 http://dx.doi.org/10.1016/j.neucom.2016.08.103 NEUCOM17742

To appear in: Neurocomputing Received date: 29 February 2016 Revised date: 24 August 2016 Accepted date: 29 August 2016 Cite this article as: Xipeng Pan, Lingqiao Li, Huihua Yang, Zhenbing Liu, Jinxin Yang, Lingling Zhao and Yongxian Fan, Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2016.08.103 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks Xipeng Pana, Lingqiao Lia,b, Huihua Yanga,b,*, Zhenbing Liub, Jinxin Yanga, Lingling Zhaob, Yongxian Fanb a

School of Automation, Beijing University of Posts and Telecommunications, Beijing, China

b

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin,

China *

Corresponding author. E-mail address: [email protected]

Abstract Automated cell segmentation is a critical step for computer assisted pathology related image analysis, such as automated grading of breast cancer tissue specimens. However, automated cell segmentation is complicated by (1) complexity of the data (possibly touching cells, stains, background clutters, and image artifacts) and (2) the variability in size, shape, appearance, and texture of the individual nuclei. Recently, there has been a growing interest in the application of “Deep Learning” strategies for the analysis of natural and pathological images. Histopathology, given its diversity and complexity, represents an excellent use case for application of deep learning strategies. In this paper, we put forward an automated nuclei segmentation method that works with hematoxylin and eosin (H&E) stained breast cancer histopathology images, which represent regions of whole digital slides. The procedure can be divided into three main stages. Initially, the sparse reconstruction method is employed to roughly remove the background and accentuate the nuclei of pathological images. Then, deep convolutional networks (DCN), cascaded by multi-layer convolution networks, are trained using gradient descent techniques to efficiently segment the cell nuclei from the background. In this stage, input patches and its corresponding labels are randomly sampled from the pathological images and fed to the training networks. The size of the sampled patches can be flexible, and the proposed method is robust when the times of sampling and the number of feature maps vary in a wide range. Finally, morphological operations and some prior knowledge are introduced to improve the segmentation performance and reduce the errors. Our method achieves about 92.45% pixel-wise segmentation accuracy and the F1-measure is 0.8393. This result leads to a promising segmentation performance, equivalent and sometimes surpassing recently published leading alternative segmentation methods with the same benchmark datasets.

Keywords Nuclei segmentation, Deep convolutional networks, Histopathological images, Sparse reconstruction

1. Introduction Cell image analysis plays a very important role in the pathological diagnosis, and cell segmentation often constitutes the base and key step in cell pathological image diagnosis. Manual inspection of the histopathology images is an extremely tedious and time consuming process, and the results are also subject to intra- and inter-individual variability. At the same time, automated image segmentation for cell analysis is generally a difficult problem due to the large variability (different microscopes, stains, cell types, cell inhomogeneous intensities) and complexity of the data (possibly touching cells, background clutters, image artifacts such as bright halos or shade-off and containing large numbers of cells) [1-3]. In recent years, computer-aided diagnosis (CAD) systems are promising technology for ensuring a standardized, objective pathology specimen analysis and are of great research and clinical interest [4]. In this work, we propose a novel combined strategy to robustly segment the nuclei region of breast cancer histopathology images. In the next part of this section, we review the recent methods and present the motivation for developing a novel approach overcoming the current limitations.

1.1. Related work Automated segmentation of cell nuclei is now a well-studied topic for which a large number of algorithms have been described in the literature [5-13]. Most of the developed cell and nuclei segmentation techniques revolve around thresholding, watershed segmentation, active contours and pixel-wise clustering/classification or a combination of the above, supplemented by different pre-processing and post-processing steps and detection/localization schemes. Zhou et al. [5] have used the adaptive thresholding and watershed algorithm for cell nuclei segmentation followed by a fragment merging method that combines two scoring models based on trend and no trend features. Using the context information of time-lapse data, the phases of cell nuclei are identified accurately via a Markov model. Experimental results show that the proposed system is effective for nuclei segmentation and phase identification. Chen et al. [6] have employed Otsu’s method to segment nuclei from the background and then deployed an improved watershed technique to further separate touching nuclei. The two papers similarly adopt the time-lapse fluorescence microscopy images as a study case. Law et al. [7] have proposed a semi-supervised optimization model that determines an efficient segmentation of input images. The model only requires minimal tuning of model parameters during the initial stage. In [8], a marker-controlled watershed segmentation method has designed with multiple scales and different markers. The

procedure can be divided into four main steps: 1) pre-processing with color unmixing and morphological operators, 2) marker-controlled watershed segmentation at multiple scales and with different markers, 3) post-processing for rejection of false regions and 4) merging of the results from multiple scales. The algorithm proposed in [9] is a combined method. After preprocessing the image, the authors employ the maximally stable extremal regions (MSER) algorithm to separate all foreground objects from the background. Then, they split clusters of multiple cells through marker-based water shedding. Lu et al. [10] have employed a hybrid morphological reconstruction module to reduce the intensity variation within the nuclei regions and suppress the noise in the image. A local region adaptive threshold selection module, based on local optimal threshold, is used to segment the nuclei. The technique incorporates domain-specific knowledge of skin histopathological images to obtain more accurate segmentation results. Al-Kofahi et al. [11] have presented a robust and accurate novel method for segmenting cell nuclei using a combination of ideas. The image foreground is extracted automatically using a graph-cuts-based binarization. Next, nuclear seed points are detected by combining multiscale Laplacian-of-Gaussian filtering constrained by distance-map-based adaptive scale selection. These points are used to perform an initial segmentation that is refined using a second graph-cuts-based algorithm incorporating the method of alpha expansions and graph coloring to reduce computational complexity. Ali et al. [12] have proposed an elegant segmentation method that uses boundary- and region-based active contours with statistical shape model to accurately detect all the specific shapes in the scene. Furthermore, they have presented their model in a multiple level set formulation to segment multiple objects under mutual occlusion. Their proposed model is accurate compared to traditional active contours and statistical shape models. Lu et al. [13] have proposed an algorithm that addresses the challenging problem of segmenting each individual cell’s nucleus and cytoplasm from a clump of cervical cells deposited on a microscope slide. This method based on a joint optimization of several level set functions was demonstrated to perform well on clumps of up to 10 cells. All the methods reviewed above yield good segmentation results under certain circumstances. However, in general, almost all of them have some limitations. For example, both [6] and [8] are sensitive to the initializations and local noisy gradients, easy to be over-segmented [5, 9]. The presence of staining variations (intra-image and inter-image) and similar backgrounds can pose obstacles for accurate segmentation of the nuclei in H&E–stained histopathological skin epidermis images [10]. Over-segmentation usually happens when a nucleus’ chromatin is highly textured

(especially true for large nuclei) or when the nucleus shape is extremely elongated [12]. In [13], they have only tested the algorithm on normal appearing cervical cytology images, and not on abnormal cervical cells. Also, reliable prior knowledge of the nucleus cytoplasm structure is highly needed. Without this information, it is likely that the effectiveness of the methodology will be severely compromised. Recently, there has been a growing interest in the application of “Deep Learning” strategies for the analysis of natural and pathological image. Histopathology, given its diversity and complexity, represents an excellent use case for application of deep learning strategies. The main challenge in terms of the computational techniques is to analyze all individual cells for accurate diagnosis, since the differentiation of most disease grades highly depends on the cell-level information. To this end, deep convolutional neural network has been investigated to robustly and accurately detect and segment cells from histopathological images [14-17], which can significantly benefit the cell-level analysis for cancer diagnosis [17-19]. Liu et al. [14] have employed maximum weight independent set selection to choose the heaviest subset from a pool of cell detection candidates generated from different algorithms using various parameters and a deep convolutional neural network to compute the weights of the graph. Xie et al. [15] have proposed a novel deep voting model for accurate and robust nucleus localization, which extended the convolutional neural network (CNN) model to jointly learn the voting confidence and voting offset by introducing a hybrid non-linear activation function. Su et al. [16] have proposed a novel cell detection and segmentation algorithm. To handle the shape variations, inhomogeneous intensity, and cell overlapping, the sparse reconstruction, using an adaptive dictionary and trivial templates, is proposed to detect cells. In the segmentation stage, a stacked denoising autoencoder (sDAE) trained with structural labels is used for cell segmentation. Xu et al. [3] have employed stacked sparse autoencoder (SSAE) to detect the nuclei efficiently on high-resolution histopathological images of breast cancer. Zhang et al. [17] focus on the rank-level fusion of local and holistic features for the image-guided diagnosis of breast cancer and employ content-based image retrieval to discover clinically relevant instances from an image database, which can be used to infer and classify the new image. Cruz-Roa et al. [20] have presented a novel unified approach for learning image representations, visual interpretation and automatic basal-cell carcinoma cancer detection from routine H&E histopathology images. Their approach demonstrates that a learned representation is better than a canonical predefined representation. Hatipoglu et al. [21] have introduced a cell classification method in histopathological images using the spatial information via CNN. A

coarse-to-fine nucleus segmentation framework is developed with multiscale convolutional network (MSCN) and graph-partitioning-based method [22]. In [23], a new deep convolutional neural network based model has been proposed for segmentation and classification of epithelial and stromal regions within histopathological images, and the present model outperforms handcrafted features based models. Su et al. [24] have applied a fast scanning deep convolutional neural network (fCNN) to pixel-wise region segmentation. The fCNN removes the redundant computations in the original CNN without sacrificing its performance. Wan et al. [25] have presented a computer-aided grading method based on multi-level features and cascaded SVM classification to automatically distinguish histopathological breast cancer images with low, intermediate, and high grades. Pixel-, object-, and semantic-level features are extracted to quantitatively characterize morphological patterns and interpretable concepts from the breast cancer tissue images. Semantic-level features have been extracted by a CNN approach and the performance suggests that the method could be useful in developing a computational diagnostic tool for differentiating breast cancer grades. In [26], a novel architecture has been defined as the combined semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. The proposed method achieves state-of-the-art segmentation results on PASCAL VOC 2011-2, NYUDv2, and SIFTS Flow. Ronneberger et al. [27] have presented a network and training strategy that relies on the data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. Almost all the deep learning networks reviewed above include not only convolutional layer but also subsampling layer. Subsampling is an important strategy in object recognition, where it helps achieve invariance to distortions of the visual image by discarding positional information about image features and details. However, it will produce an output representation of much lower resolution than the input image. Many image processing applications require precise positional information. The segmentation of H&E histopathology images of breast tissue is a good example. Therefore, our convolutional networks would not include subsampling.

1.2. Contributions of this work In this paper, we propose a method to segment the nuclei regions from the pathological images. The main contributions of this work include three points. First, a deep convolutional network, cascaded by multi-layer convolution operation without subsampling layers, is employed for cell segmentation. At the same time, the sparse reconstruction

method is employed to accentuate the nuclei region of pathological images before the network training. To improve the segmentation performance, morphological operations and some prior knowledge are introduced as the post-processing step. Second, input patches and its corresponding labels are randomly sampled from the pathological images and fed to the training networks. The size of the sampled patches can be flexible, and the proposed method is robust when the times of sampling and the number of feature maps vary in a wide range. (See section 3.3 for detailed description). Finally, extensive experiments and comparisons with recently published models show that our method achieves a promising segmentation performance, equaling and sometimes surpassing the published leading alternative segmentation methods with the same benchmark datasets. In the following sections, we demonstrate that our proposed method can accurately segment the nuclei regions from the pathological images. Section 2 presents our methodological contributions in detail, while Section 3 describes the experimental setup, validation results and detailed comparisons with the previous methods. Section 4 gives some discussions and presents some insights for potential extension of the method. Section 5 concludes the paper.

2. Methodology We propose a combined and efficient strategy to segment the H&E cell images. An overview of the proposed method is shown in Fig. 1. Sparse reconstruction (SR) method is first employed to roughly remove the background and accentuate the nuclei in the H&E breast cell images. Next, a DCN, cascaded by multi-layer convolution networks, is trained to efficiently segment the cell nuclei from the background. At last, a series of morphological operations and some prior knowledge are introduced to improve the segmentation performance and reduce the errors.

Fig. 1. An overview of the proposed method (see in the figure files)

2.1. Sparse reconstruction Sparse reconstruction is employed to obtain higher signal-to-noise ratio image, as shown in Fig. 2. First, we transform the RGB image into grayscale images and smooth the gray images by using the anisotropic diffusion filter (ADF). Second, K-SVD algorithm is used to train a dictionary based on the given gray image. It is a highly effective method, and is successfully applied in several image processing tasks [28-30]. Then, Batch-OMP, an implementation of the Orthogonal

Matching Pursuit (OMP) algorithm, is specifically optimized for sparse-coding of signals over the same dictionary. Lastly, we obtain the denoising image and it is as input of the DCN. Although the implementation of K-SVD is available in [28], the algorithm is designed for natural images, so we should tune the parameters (i.e., size of dictionary to train, number of training blocks) to be suitable for the complicated pathological images.

Fig. 2. The procedure of sparse reconstruction (see in the figure files)

2.2. Deep convolutional networks Deep convolutional networks have been applied with great success for high-level computer vision tasks such as object recognition [31]. Recent studies have shown that they can also be used as a general method for low-level image processing problems, such as restoration [32], denoising [33], classification [34] and mitosis detection [35]. In [32], convolutional networks are trained to solve the problem of restoring nanoscale brain images, and the performance is better than Markov Random Fields (MRFs) or Conditional Random Fields (CRFs). In [33], the authors have demonstrated that convolutional networks can get state-of-the-art performance with the problem of natural image denoising. In this study, deep convolutional network is proposed to segment the H&E stained cancer cell images. The result of the segmentation can be used as the input for other tasks such as the classification, tracking or diagnosis of nucleus feature extraction. A convolutional network is an alternating sequence of linear filtering and nonlinear transformation operations. The input and output layers include one or more images, while intermediate layers contain “hidden” units with images called feature maps that are the internal computations of the algorithm. The activity of feature map n in layer k is given by k I nk  f(  I mk 1 wnm  bnk )

(1)

m

k 1

where I m are feature maps that provide input to I n , and  denotes the convolution operation. k

k k is a convolution kernel, and bn is a bias parameter. The function f is the sigmoid function, wnm x and the formula is f ( x)  1/ (1  e ) . The diagram of one layer convolutional architecture is

shown in Fig. 3.

Fig. 3. The diagram of single layer convolutional architecture (see in the figure files)

2.2.1. Training workflow of the network Multilayer convolution networks are connected layer by layer to form a deep training network. We randomly choose the patches of the input images to send into the network and get the errors between the outputs of the network with corresponding labels. Backpropagation strategy is used to adjust the network weights, and we train it using gradient descent. After training, the network provides significantly much more accurate reconstructions on the test set than thresholding. A demonstration of deep convolutional networks is shown in Fig. 4. The size of convolution kernel is 5  5 , the number of feature maps is 12, and the two parameters are adjustable including the numbers of convolution network layers. If the network has 2 hidden layers and each feature map is obtained through convolution operation between all feature maps in the previous layer and the corresponding convolution kernels, then the total number of free parameters is 4,225. A convolutional network is then trained on H&E histopathology cell images of breast tissue. Fig. 4. The demonstration of deep convolutional networks (see in the figure files)

2.2.2. Randomly sample input patches and its corresponding labels In the training phase, input patches and its corresponding labels are randomly sampled from the pathological images and fed to the training networks. Back propagation and gradient descent method are adopted to optimize parameters of the network (the parameters optimization will be described in section 2.3). In this section, we introduce the algorithm for obtaining random input patch and the corresponding labels, as table 1 shown. Initially, due to the effect of multiple valid convolutions, we have to compute the size of input patch that generates the desired size of output patch (i.e., as figure 4 shows that the size of output patch is 6×6 and the size of input patch is 18×18). Then, the geometric center of each patch is obtained randomly, and the coordinates of random input patches and the corresponding labels are computed. Finally, we select a random input patch and its corresponding labels from the pathological images. When a sampled patch is close to the image border, its window will include pixels outside the image boundaries; such pixels are synthesized by mirroring the pixels in the actual image across the boundary (see Fig. 4). Table 1. The procedure of obtaining the training datasets. Algorithm for obtaining random input patch and the corresponding labels Input: pathological image and their labels Output: random input patch and the corresponding labels

0

1

2

3

4

1

Assuming that the size of convolution kernel is a1×a2, the size of the output patch is b1×b2, the number of hidden layers is h, and the size of pathological image and their labels is w1×w2 Compute the size of the input patch:c1×c2 c1=b1+(2×(floor(a1/2))×(h+1)) c2=b2+(2×(floor(a2/2))×(h+1)) Obtain the geometric center of each input patch:(x,y) x=ceil(rand×(w1-c1))+floor(c1/2) y=ceil(rand×(w2-c2))+floor(c2/2) Compute the coordinates of input patch and the corresponding labels train_coords=[x-floor(c1/2)+1:x+floor(c1/2); y-floor(c2/2)+1:y+floor(c2/2)] label_coords=[x-floor(b1/2)+1:x+floor(b1/2); y-floor(b2/2)+1:y+floor(b2/2)] Obtain the patch and the corresponding labels input_patch=path_image(train_coords(1,:), train_coords(2,:)) label_patch=label_image(label_coords(1,:), label_coords(2,:))

In table 1, floor, ceil and rand are the Matlab built-in functions. floor(x) rounds the elements of x to the nearest integers towards minus

infinity, ceil(x) rounds the elements of x to the nearest integers towards infinity and rand generates uniformly distributed pseudorandom numbers.

2.3. Optimization via gradient descent The convolutional network architecture is highly complex, non-linear and non-convex, which often makes the convex optimization techniques useless in machine learning. However, it is cleverly designed to be fully differentiable with respect to all free parameters of the network. If the cost function chosen to be optimized is similarly differentiable, the gradient descent techniques can be utilized to optimize the network. We denote the function of the convolutional network from the input X to the output Y as Y  Fw,b  X  , where w and b are free parameters of the network. Y * is the ground truth binary label with respect to Y . The cost function is given by E  X , Fw,b , Y *   Fw,b  X   Y *

2 2

(2)

The learning process of the convolutional network could be formulated as the minimization of the cost function given above with respect to all free parameters w and b . arg min E ( X , Fw,b , Y * ) w, b

(3)

It is easy to know that Eq. (1) and (2) are differentiable. Thus, the gradient  w E and b E can be calculated. The gradient-decent is used to train parameters (w, b) which make the cost function minimized. The following update step can be defined as: w  w  w w E

(4)

b  b  bb E

(5)

where  are some sufficiently small values called learning rates, which control the convergence speed. In this paper, the learning rate  are set to 0.001 in the last layer, and the value of other layers are 0.1[33]. Before learning, a weight in filter w is initialized to a random value chosen from a normal distribution with zero mean and a variance inversely proportional to the square root of the number of pixels in the filter (e.g., 1

25 for a 5  5 filter). Although such details are not critical to the

success of the learning procedure, in practice they are found to speed convergence rates.

2.4. Postprocessing To improve the segmentation performance and reduce the errors due to background clutters, stains, and image artifacts, a series of morphological operations are employed and some prior knowledge is introduced. First, the isolated pixels and small fragments of noises can be eliminated by the morphological clean and opening operation. Second, the area of nuclei should be higher than a prespecified value. Namely, if the area of a whole cell is lower than the specified value, this region is regarded as noise. Third, if there are still holes in the nuclei, we should fill them.

3. Experiments 3.1. Dataset description There are 58 Hematoxylin and Eosin (H&E) histopathology images of breast tissue from Yale, David Rimm’s Laboratory, with 32 benign and 26 malignant images respectively. http://medicine.yale.edu/bbs/molecularcell/people/david_rimm.profile Fig. 5. H&E histopathology images of breast tissue (see in the figure files) As shown in Fig. 5, the color histopathology images have 3 channels (RGB color channels). Each channel is represented in an analog image as an 8 bit, 896×768 grayscale image. For each image, a pixel-level markup is specified within a designated truth window of about 200×200. The original images and the associated ground truth data are provided by the Bio-image Informatics Center of UCSB, and can be downloaded as a part of UCSB Bio-Segmentation benchmark dataset [36, 37]. Two histopathology images within the truth window and their associated ground truth labeling images are illustrated in Fig. 6.

http://bioimage.ucsb.edu/research/bio-segmentation

Fig. 6. Two histopathology images within the truth window and their associated ground truth labeling images (see in the figure files)

3.2. Performance metrics Accuracy (ACC), precision (P), recall (R), and F1-measure (F1) are adopted as the performance metrics, which are given as: TP  TN ACC  , TP  FP  FN  TN TP P , TP  FP TP R , TP  FN 2 P R F1  , PR

(6) (7) (8) (9)

Accuracy is the value which the number of correctly classified pixels divided by the total number of pixels. Out of the total number of nuclei detected by the proposed method, precision signifies the fraction of them that are actually present in the ground truth image. It is also called as positive predictive value. Out of the total number of nuclei present in the ground truth image, recall signifies the fraction of them detected by the proposed method. It is also called as sensitivity. F1-measure is computed as the harmonic mean of precision and recall for the foreground object. Where TP is true positive, FP is false positive, FN denotes false negative, and TN represents true negative. TP refers to those nucleuses in the ground truth image which are correctly detected by the proposed method. FP refers to those regions that are not nuclei in the ground truth image but are incorrectly detected as nuclei by the proposed method. FN refers to those nucleuses that are actually present in the ground truth image but are not detected by the proposed method. TN refers to those regions that are not nucleus in the ground truth image that are correctly detected by the proposed method.

3.3. Parameters setting 1) DCN: The DCN cascaded by three layers convolution networks. The size of convolution kernel is 5  5 , the output size is 6  6 , so the size of input patch is 18  18 . We also explore the interaction between the segmentation performance and the number of feature maps in the hidden layer. The result is shown in Fig. 7(a). As one can tell that the performance increases as the number of feature maps increases, and it keeps stable after the number is 12. Actually, it starts to obtain relatively good F1-measure when the number of feature maps is equal to 6. In other words, the network can be robust when the number of feature maps is in a wide range. The sampling number of each training image needed for a reasonable performance depends on the variation of the data. In our setting, it is observed that around 2000 samples for each training image are sufficient. The interaction between the performance and the times of sampling

is shown in Fig. 7(b), where we can obtain the good performance when the times of sampling for each training image is equal to or bigger than 1500. That is to say, the model can be robust with the sampling number in a wide range.

Fig. 7. (a) F1-measure with different number of feature maps in the hidden layer. (b) F1-measure with different times of sampling for each training image. (See in the figure files) 2) Postprocessing: For each candidate nucleus clump, the area of each nuclei region obtained by morphological opening operation with the structure element is selected as a disk. A radius of 5 pixels is empirically selected in this paper. We filter the nuclei if the whole nuclei region’s area is lower than 50 pixels.

3.4. Experiments results and comparisons We randomly choose 30 images of the whole 58 images as the training set to optimize the segmentation algorithm, and the other images as the testing set which contains 28 images are used to quantitatively evaluate the performances. In histopathological H&E images, tissue components are stained as dark purple as nuclei and pink as cytoplasm and the extracellular connective tissue. Based on that, we choose the R channel of the original images as the input of the training network. Gray image and its corresponding SR image are also as the inputs of the training network to do comparative experiments. The study was implemented with MATALAB R2013a on a workstation with Intel(R) Xeon(R) E5-2650 v2 CPU and 32 GB memory. Comparison experiments were carried out between our proposed method and extensive methods or models. On the one hand, three commonly used segmentation methods, including OTSU [38], FCM-based [39-41], and Watershed-based [9] were applied to the dataset to produce the baseline performances for comparison purposes. On the other hand, we searched the literature that did the segmentation task using the same UCSB benchmark datasets [36, 37] with us. We find that there are six methods available in [7, 21, 42-45].

3.4.1. Comparison with some commonly used methods Otsu's method [6, 38] is used to automatically perform clustering-based image thresholding. Our breast cell images contain two classes of pixels following bi-modal histogram (cell pixels and background pixels), it then calculates the optimum threshold separating the two classes so that their combined spread (intra-class variance) is minimal, or equivalently (because the sum of

pairwise squared distances is constant), so that their inter-class variance is maximal. Fuzzy C-Means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters and is widely used on image segmentation [39-41]. In [39], the study has employed Fuzzy C-Means clustering as the segmentation technique to quantify this chromatin pattern, enabling different degrees of chromatin segmentation to be performed on sample images of non-neoplastic squamous cells. The authors apply FCM or its hybrid version to brain tumor segmentation or its area calculation in brain MR images [40, 41]. The comparison results are shown in Table 2. The value in the table is the average of each test image performance. It can be observed that the proposed method (DCN) outperforms other methods in terms of most of the metrics. We also observe that the performance using the SR image as input is superior to that with the gray image as input in all the four methods, especially for the merits named Accuracy, Precision and F1-measure. In our proposed method, the Accuracy, Recall and F1-measure are better than the performance with the R channel image as input. We have to admit that the Precision is slightly lower than that with the R channel image as input; the main reason is that SR is based on the gray version of the pathological image and gray image is much more complicated than the R channel image. In most cases, the F1-measure of our method with SR as input is at least 6% better than the others, except the DCN with other two inputs. Our method with SR as input is around 1.6% better than DCN with the R channel image, and 5.7% better than DCN with the gray version image as input. To demonstrate statistical significance, we perform t-test for the F1-measure obtained by DCN with SR as input and DCN with other two inputs, under the null hypothesis using a significance level of 0.05. The p-values are found as 7.96 10-3 and 5.99 10-5 respectively, demonstrating that F1-measure values achieved by the proposed technique are indeed significantly better than the others. As shown in Fig. 8-9, the box plots2 of the segmentation performances indicate that our approach beats the other methods on the whole. We observe that the overall performance of the proposed method is the best among all the methods. The segmentations of two testing images are shown in Fig. 10. The proposed method is able to extract the nuclei accurately. Furthermore, more experimental comparisons are carried out with the literatures using the same UCSB benchmark datasets in the next subsection.

3.4.2. Comparison with the methods using the same benchmark datasets In this section, we would like to compare the performance of our nuclei segmentation method with other available approaches from the literatures.

To validate the proposed semi-supervised multiple-image model (MI) [7], the paper compares MI with four other methods, namely, the classical support vector machine, the k-nearest neighbor, the k-means, and the semi-supervised k-means. The first two methods are supervised methods, the third is unsupervised, and the fourth is semi-supervised. MI obtains the best performance in these five methods. The accuracy and the F-measure are about 89.55% and 0.7733 separately, which are the 50% percentile value from the boxplots in the paper. From table 2, we can see that the accuracy and F1-measure of our proposed method are 2.9% and 6.6% higher than MI model [7], respectively. It should be noting that for a medical diagnosis, even a 1% increase of diagnosis accuracy is significant [42]. Convolutional neural networks have designed to do the pixel-level classification with the same histopathology images of breast tissue [21]. The networks consist of two convolution layers, two subsampling layers and an output layer. Network layers are arranged in a feedforward structure: each convolution layer is followed by a subsampling layer, and the last subsampling layer is followed by the output layer. Extensive experiments are carried out and overall accuracy (OA) is estimated by ratio between correctly classified test samples and the total number of test samples. The best OA is 86.88%, which is 5.57% lower than the performance of our proposed method. Hatipoglu et al. [43] have utilized CNN to classify digital histopathological images. Different from the prior version [21], this work first extracts the Fourier features from RGB color space of pathological images; second, equal number of different cellular and extra-cellular structures in spatial domain is selected from the images; at last, visual and numerical outputs are presented for comparison purpose in the experimental results section. The best OA is 87.07%, which is slightly better than the result of [21] but is 5.38% lower than the performance of our proposed method. Li et al. [42] have addressed the problem of color space selection for digital cancer classification using H&E stained images, and investigate the effectiveness of various color models in breast cancer diagnosis. The authors set up a diagnosis framework which consists of four modules. The basis module is the pre-processing module and the nuclei segmentation block is achieved by a watershed based algorithm for subsequent morphology analysis and structural analysis. The paper presents the classification performance (Classification accuracy is 0.802, 90AUC

is 0.6594) of normal and breast cancer images, but does not show the performance of

nuclei segmentation. Li et al. [44] have introduced a blind stain decomposition method for histopathology images, which is capable of addressing the spectral variation in stains. In this work, the authors make use of saturation-weighted histogram to limit impacts of achromatic colors, and adopt circular

thresholding to address the periodicity of hue properly when processing color information. The paper reports that the dice coefficient of nuclei segmentation using blind stain decomposition approach over the UCSB dataset is 0.7469. Cheng et al. [45] have proposed a learning based approach using a bag-of-words model (BWM) and dedicated feature design to deal with cellular segmentation in microscopy images. TP The paper is based on the following criteria: pxl.score  , and the best pixel TP  FN  FP classification rate on breast cancer dataset is 69.64%. We change the form of pxl.score with P R precision (P), recall (R), and the formula is pxl.score  . The pixel score of our P  R  P R proposed method is 72.69%, which is 3.05% higher than BWM model. From the above, we can see that our method leads to a promising segmentation performance; equivalent or surpassing recently published leading alternative segmentation methods. As table 2 implies that the values of recall are higher than precision in almost all the models. In our opinion, this is because of the imbalanced number between nuclei pixels and background pixels. The more detail can be found in the discussions section. Table 2. The comparison results of the segmentation performance. Methods

Input

Accuracy (%)

Precision (%)

Recall (%)

F1-measure

OTSU [38]

Gray image R channel SR

75.37 86.99 88.77

48.09 64.86 71.36

96.18 94.14 85.08

0.6412 0.7680 0.7762

FCM-based [39]

Gray image R channel SR

74.92 86.60 88.84

47.63 64.04 71.63

96.47 94.49 84.86

0.6378 0.7634 0.7768

Watershed-based [9]

Gray image R channel SR

81.30 84.63 88.56

55.89 61.37 70.26

86.80 88.57 87.01

0.6800 0.7250 0.7775

Our method (DCN)

Gray image R channel SR

88.12 91.65 92.45

69.54 84.89 82.41

90.72 80.84 86.04

0.7814 0.8234 0.8393

89.55 86.88

-

-

0.7733 -

87.07

-

-

-

MI [7] CNN [21] CNN with Fourier features [43]

Fig. 8. Box plots of (a) the accuracy, (b) the precision, (c) the recall of the segmentations and (d) the F1-measure for the nucleus region retrieved by OTSU, FCM-based, Watershed-based and our

method (DCN) with gray image, R channel of the RGB image and the SR inputs. (See in the figure files)

Fig. 9. Box plots of (a) the accuracy, (b) the precision, (c) the recall of the segmentations and (d) the F1-measure for the nucleus region retrieved by OTSU, FCM-based, Watershed-based and our method (DCN) with the SR inputs. (See in the figure files) 2

In the box plots, the upper, middle, and lower line of the box denote the 75%, 50%, and 25% percentile value, respectively. The line

above the box denotes the 75% Percentile Value+1.5×Interquartile Range. The line below the box denotes the 25% Percentile Value-1.5×Interquartile Range. The pluses denote outliers.

Fig. 10. Segmentation of two of the testing images obtained by the proposed method (DCN) and others (the green lines depict the segments’ boundaries). (See in the figure files)

4. Discussions In most of previous image segmentation approaches, domain specific knowledge is extracted based on studying the characteristic and structure of the images and then manually embedded into the behavior of the algorithm. These methods are simple and surprisingly effective in some cases. However, there are some limits in most domain-specific applications due to the difficulty in encoding complex high-level knowledge into the general low-level algorithms. Moreover, in histopathology, distinguishing between cytoplasm and stroma is more difficult, especially when the inter-image variability in staining is taken into account. Therefore, the convolutional networks are utilized to distinguish the two regions. We also show two important properties of the convolutional networks as a segmentation method. First, the deep convolution networks encode enough high-level domain-specific knowledge into the final segmentation strategy by learning the training data. Second, the convolutional networks can use appropriate amount of context information in segmenting by optimizing the weights of the filters in the networks through the gradient descent learning process. In fact, we transform the nuclei segmentation problem to the pixel-wise classification issue. The number of background pixels is much more than that of nuclei and the proportion is about 3:1, which is unbalanced. In our opinion, this will increase the probability of more regions that are not nuclei in the ground truth image but are incorrectly detected as nuclei, and decrease the probability of regions that are nuclei in the ground truth image but are not detected. In other words, the value of FP will be relatively big and the value of FN will be relatively small. It is one of the research

directions that we are currently considering. At the same time, the UCSB Bio-Segmentation benchmark dataset only gives the area of nucleus (pixel level) as ground truth, but not given the nuclei area one by one. Therefore, the experiments of our study focus on the segmentation of nuclei regions in pixel-wise. Splitting the nuclei one by one is also the issue worthy of further inquiry under the condition of the corresponding ground truth.

5. Conclusions In this paper, we propose an automatic image processing pipeline that is able to accurately detect and segment nuclei in breast pathological images. First, sparse reconstruction with K-SVD and Batch-OMP algorithms are employed to enhance the nucleus area and remove background preliminarily. Moreover, the segmentation stage exploits the DCN trained with structural labels to obtain the accurate pixels of the cell nuclei. Finally, morphological operations and some prior knowledge are introduced to improve the segmentation performance and reduce the errors. The proposed algorithm is a general approach that can be adapted to many pathological applications. Both qualitative and quantitative experimental results demonstrate the superior performance of our method compared with several states-of-the-arts.

Acknowledgments The authors would like to thank Dr. Gelasca et al. for publishing the dataset, the authors of [9, 28] for making the code available. We are grateful for helpful comments from the anonymous reviewers. This research was supported by the National Natural Science Foundation of China (Grant 21365008, 61105004, 61562013 and 61462018).

References [1] E. Meijering, Cell segmentation: 50 years down the road, IEEE Signal Process. Mag. 29 (5) (2012) 140-145.

[2] H. Su, Z. Yin, T. Kanade, S. Huh, Phase contrast image restoration via dictionary representation of diffraction patterns, in: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer-Verlag, 2012, pp. 615-622. [3] J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, A. Madabhushi, Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images, IEEE Trans. Med. Imaging 35 (1) (2015) 119-130. [4] F. Xing, L. Yang, Robust nucleus/cell detection and segmentation in digital pathology and microscopy images: a comprehensive review, IEEE Rev. Biomed. Eng. pp (99) (2016). [5] X. Zhou, F. Li, J. Yan, S.T.C. Wong, A novel cell segmentation method and cell phase identification using markov model, IEEE Trans Inf. Technol. Biomed. 13 (2009) 152-157. [6] X. Chen, X. Zhou, S.T.C. Wong, Automated segmentation, classification, and tracking of cancer cell nuclei in time-lapse microscopy, IEEE Trans. Biomed. Eng. 53 (4) (2006) 762-766. [7] Y. N. Law, H.K. Lee, M. K. Ng, and A. M. Yip, A semisupervised segmentation model for collections of images, IEEE Trans. Image Process. 21 (6) (2012) 2955-2968. [8] M. Veta, P.J.V. Diest, R. Kornegoor, A. Huisman, M.A. Viergever, et al., Automatic nuclei segmentation in H&E stained breast cancer histopathology images. PLoS ONE 8(7): e70221, 2013, doi:10.1371/journal.pone.0070221. [9] F. Buggenthin, C. Marr, M. Schwarzfischer, P.S. Hoppe, O. Hilsenbeck, T. Schroeder, F.J. Theis, An automatic method for robust and fast cell detection in bright field images from high-throughput microscopy, BMC Bioinform. 14 (1) (2013) 297. [10] C. Lu, M. Mahmood, N. Jha, M. Mandal, A robust automatic nuclei segmentation technique for quantitative histopathological image analysis, Analytical and Quantitative Cytology and Histology, 34 (6) (2012) 296-308. [11] Y. Al-Kofahi, W. Lassoued, W. Lee, and B. Roysam, Improved automatic detection and segmentation of cell nuclei in histopathology images, IEEE Trans. Biomed. Eng. 57 (4) (2010) 841-852. [12] S. Ali and A. Madabhushi, An integrated region-, boundary-, shape based active contour for multiple object overlap resolution in histological imagery, IEEE Trans. Med. Imaging 31 (7) (2012) 1448-1460. [13] Z. Lu, G. Carneiro, A. Bradley, An improved joint optimization of multiple level set functions for the segmentation of overlapping cervical cells, IEEE Trans. Image Process. 24 (4) (2015). [14] F. Liu, L. Yang, A novel cell detection method using deep convolutional neural network and maximum-weight independent set, in: International Conference on Medical Image

Computing and Computer-assisted Intervention, Springer International Publishing, 2015, pp. 349-357. [15] Y. Xie, X. Kong, F. Xing, F. Liu, H. Su, L. Yang, Deep voting: a robust approach toward nucleus localization in microscopy images, in: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer International Publishing, 2015, pp. 374-382. [16] H. Su, F. Xing, X. Kong, Y. Xie, S. Zhang, L. Yang, Robust cell detection and segmentation in histopathological images using sparse reconstruction and stacked denoising autoencoders, in: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer International Publishing, 2015, pp. 383-390. [17] X. Zhang, H. Dou, T. Ju, J. Xu, S. Zhang, Fusing heterogeneous features from stacked sparse autoencoder for histopathological image analysis, IEEE J. of Biomedical and Health Informatics. pp (99) (2015). [18] X. Zhang, W. Liu, M. Dundar, S. Badve, S. Zhang, Towards large-scale histopathological image analysis: hashing-based image retrieval, IEEE Trans. Med. Imaging. 34 (2) (2015) 496-506. [19] X. Zhang, F. Xing, H. Su, L. Yang, S. Zhang, High-throughput histopathological image analysis via robust cell segmentation and hashing, Medical Image Analysis. 26 (1) (2015) 306-315. [20] A. Cruz-Roa, J. A. Ovalle, A. Madabhushi, F. González, A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection, in: International Conference on Medical Image Computing and Computer-assisted Intervention, Springer Berlin Heidelberg, 2013, pp. 403-410. [21] N. Hatipoglu, G. Bilgin, Classification of histopathological images using convolutional neural network, in: 2014 IEEE 4th International Conference on Image Processing Theory, Tools and Applications (IPTA), 2014, pp. 1-6. [22] Y. Song, L. Zhang, S. Chen, D. Ni, B. Lei, T. Wang, Accurate segmentation of cervical cytoplasm and nuclei based on multiscale convolutional network and graph partitioning, IEEE Trans. Biomed. Eng. 62 (10) (2015). [23] J. Xu, X. Lou, G. Wang, H. Gilmore, A. Madabhushi, A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images, Neurocomputing 191 (2016) 214-223. [24] H. Su, F. Liu, Y. Xie, F. Xing, S. Meyyappan, and L. Yang, Region segmentation in histopathological breast cancer images using deep convolutional neural network, in: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), 2015.

[25] T. Wan, J. Cao, J. Chen, Z. Qin, Automated grading of breast cancer histopathology using cascaded ensemble with combination of multi-level image features, Neurocomputing, Available online 7 June 2016. [26] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation (2014), arXiv preprint arXiv:1411.4038. [27] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation (2015), arXiv preprint arXiv:1505.04597. [28] M. Elad, R. Rubinstein, and M. Zibulevsky, Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit, Technical Report - CS, Technion, April 2008. [29] M. Elad and M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Trans. Image Process. 15 (12) (2006) 3736-3745. [30] M. Aharon, M. Elad, and A.M. Bruckstein, The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation, IEEE Trans. Signal Process. 54 (11) (2006) 4311-4322. [31] Y. LeCun, F.J. Huang, L. Bottou, Learning methods for generic object recognition with invariance to pose and lighting, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 2004, pp. II-97-104. [32] V. Jain, J.F. Murray, F. Roth, S.C. Turaga, V.P. Zhigulin, K.L. Briggman, M. Helmstaedter, W. Denk, and H.S. Seung, Supervised learning of image restoration with convolutional networks, in: IEEE International Conference on Computer Vision, 2007, pp. 1-8. [33] V. Jain, H.S. Seung, Natural image denoising with convolutional networks, in: Advances in Neural Information Processing Systems, 2008, pp. 769-776. [34] C. Malon, M. Miller, H.C. Burger, E. Cosatto, and H.P. Graf, Identifying histological elements with convolutional neural networks, in: Proceedings of the 5th International Conference on Soft Computing As Transdisciplinary Science and Technology, 2008, pp. 450-456. [35] H. Wang, A. Cruz-Roa, A. Basavanhally, H. Gilmore, N. Shih, M. Feldman, J. Tomaszewski, F. Gonzalez, and A. Madabhushi, Cascaded ensemble of convolutional neural networks and handcrafted

features

for

mitosis

detection,

SPIE

Medical

Imaging,

2014,

9041(2):90410B-90410B-10. [36] E.D. Gelasca, J. Byun, B. Obara, B.S. Manjunath, Evaluation and benchmark for biological image segmentation, in: IEEE International Conference on Image Processing, 2008, pp. 1816-1819.

[37] E.D. Gelasca, B. Obara, D. Fedorov, K. Kvilekval, B.S. Manjunath, A biosegmentation benchmark for evaluation of bioimage analysis methods, BMC Bioinformatics, 10 (368) (2009) 1-12. [38] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans. Sys., Man. Cyber. 9 (1) (1979) 62-66. [39] J.R. Tang, N.A. Mat Isa, E.S. Ch’ng, A Fuzzy-C-Means-Clustering approach: quantifying chromatin pattern of non-neoplastic cervical squamous cells, PLoS ONE 10(11): e0142830, 2015, doi:10.1371/journal.pone.0142830. [40] J. selvakumar, A. Lakshmi, T. Arivoli, Brain tumor segmentation and its area calculation in brain MR images using K-Mean clustering and fuzzy C-Mean algorithm, in: IEEE International Conference On Advances In Engineering, 44 (S1) ( 2012) 186-190. [41] M. Gong, Y. Liang, J. Shi, W. Ma, and J. Ma, Fuzzy C-Means clustering with local information and kernel metric for image segmentation, IEEE Trans. Image Process. 22 (2) (2013) 573-584. [42] X. Li and K.N. Plataniotis, Color model comparative analysis for breast cancer diagnosis using h and e stained images, in: Proc. SPIE Medical Imaging 2015: Digital Pathology, vol. 9420, Feb. 2015. [43] N. Hatipoglu, G. Bilgin, Segmentation of histopathological images with convolutional neural networks using Fourier features, in: Signal Processing and Communications Applications Conference (SIU), 2015, pp. 455-458. [44] X. Li and K.N. Plataniotis, Blind stain decomposition for histopathology images using circular nature of chroma components, in: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2015. [45] L. Cheng, N. Ye, W. Yu, and A. Cheah. A Bag-of-Words model for cellular image segmentation. In Advances in Bio-Imaging: From Physics to Signal Understanding Issues, 2012, pp. 209-222.

Vitae

Xipeng Pan is a Ph.D. candidate with School of Automation, Beijing University of Posts and Telecommunications, China. His research interests include machine learning, medical image processing. He is a member of CCF.

Lingqiao Li is a Ph.D. candidate with School of Automation, Beijing University of Posts and Telecommunications, China. Currently he is assistant researcher at the Guilin University of Electronic and Technology, China and his research interests include machine learning, spectrum analysis.

Huihua Yang received his Ph.D. degree from East China University of Science and Technology, China in 2005. He was a postdoctoral research fellow of Tsinghua University from 2005 to 2007. Currently, he is a professor of School of Automation, Beijing University of Posts and Telecommunications, China. His research interests include machine learning, spectrum analysis, and optimization. Dr. Yang has published more than 40 papers and serves as Director of China Instrument and Control Society (CICS), Vice Director of NIR Division of CICS, and is a senior Member of CCF, and a member of ACM.

Zhenbing Liu received his Ph.D. degree in Computer Science from Huazhong University of Science and Technology, China in 2010. He was a visiting research fellow at Pennsylvania University in 2015. Currently, he is a professor of Guilin University of Electronic Technology, China. His research interests include machine learning and medical image processing. He has published more than 30 papers.

Jinxin Yang is a Master’s candidate with School of Automation, Beijing University of Posts and Telecommunications, China. His research interests include deep learning and segmentation of medical cell image.

Lingling Zhao is a Master’s candidate with School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, China. His research interests include machine learning and segmentation of medical cell image.

Yongxian Fan received his Ph.D. degree in Department of Automation from Shanghai Jiao Tong University,

China

in

2013.

Currently,

he

School of Computer Science and Information Security,

is

an

Guilin

associate University

professor of

of

Electronic

Technology, China. His research interests include machine learning, pattern recognition and bioinformatics.

DCN Training

(c) Loss function L(z,y)

(h) The segmentation result

Patches sampling

SR

z

y

(e) Trained model

(a) A train image

(d) Label

DCN Testing SR

(f) Predicted value (b) A test image

Sparse reconstruction (SR)

(g) The result of postprocessing

Image postprocessing

Deep convolutional network (DCN)

Fig. 1. An overview of the proposed method

K-SVD +Batch-OMP

Rgb2gray +ADF

RGB input image

Gray image

Denoising image

Fig. 2. The procedure of sparse reconstruction

 I mk 1

k wnm

bnk

I nk

sigmoid function

Fig. 3. The diagram of single layer convolutional architecture

Layer1 12@14×14

Patch Size: 1@18×18

Patch Size

Layer2 12@10×10 Layer3 1@6×6

C1 d

mirroring

d

· ·

C2

Patch Size : 18×18, C1, C2, C3: convolutional layer, 5×5 convolution kernel.

mirroring

Fig. 4. The demonstration of deep convolutional networks

Benign (size: 896*768)

Malignant (size: 896*768)

Fig. 5. H&E histopathology images of breast tissue

C3

Malignant

Ground truth

Input image

Benign

Fig. 6. Two histopathology images within the truth window and their associated ground truth labeling images

(a)

(b)

Fig. 7. (a) F1-measure with different number of feature maps in the hidden layer. (b) F1-measure with different times of sampling for each training image.

Fig. 8. Box plots of (a) the accuracy, (b) the precision, (c) the recall of the segmentations and (d) the F1-measure for the nucleus region retrieved by OTSU, FCM-based, Watershed-based and our method (DCN) with gray image, R channel of the RGB image and the SR inputs.

Fig. 9. Box plots of (a) the accuracy, (b) the precision, (c) the recall of the segmentations and (d) the F1-measure for the nucleus region retrieved by OTSU, FCM-based, Watershed-based and our method (DCN) with the SR inputs.

Malignant

DCN

Watershed-based

FCM-based

OTSU

Ground truth

Benign

Fig. 10. Segmentation of two of the testing images obtained by the proposed method (DCN) and others (the green lines depict the segments’ boundaries).