Neurocomputing 359 (2019) 494–508
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Brief papers
Stacked sparse autoencoder and case-based postprocessing method for nucleus detection Siqi Li a, Huiyan Jiang a,∗, Jie Bai b, Ye Liu c, Yu-dong Yao d a
Department of Software College, Northeastern University, Shenyang 110819, China Department of Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang 110819, China c Department of Pathology, Fifth Affiliated Hospital of Sun Yat-sen University, No. 52, Meihua Dong Road, Zhuhai, Guangdong 519000, PR China d Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken NJ 07030, USA b
a r t i c l e
i n f o
Article history: Received 16 July 2018 Revised 25 April 2019 Accepted 2 June 2019 Available online 6 June 2019 Communicated by Dr Jie Wang Keywords: Automated nucleus detection Stacked sparse autoencoder Case-based postprocessing method Transfer learning Coarse-to-fine manner
a b s t r a c t Accurate nucleus detection is of great importance in pathological image analyses and diagnoses, which is a critical prerequisite for tasks such as automated grading hepatocellular carcinoma (HCC) nuclei. This paper proposes an automated nucleus detection framework based on a stacked sparse autoencoder (SSAE) and a case-based postprocessing method (CPM) in a coarse-to-fine manner. SSAE, an unsupervised learning model, is first trained using image patches of breast cancer. Then, the transfer learning and sliding window techniques are applied to other cancers’ pathological images (HCC and colon cancer) to extract the high-level features of image patches via the trained SSAE. Subsequently, these high-level features are fed to a logistic regression classifier (LRC) to classify whether each image patch contains a complete nucleus in a coarse detection process. Finally, CPM is developed for refining the coarse detection results which removes false positive nuclei and locates adhesive or overlapped nuclei effectively. SSAECPM achieves an average nucleus detection accuracy of 0.8748 on HCC pathological images, which can accurately locate almost all nuclei on the pathological images with serious differentiation. In addition, our proposed detection framework is also evaluated on a public dataset of colon cancer, with a mean F1 score of 0.8355. Experimental results demonstrate the performance advantages of our proposed SSAECPM detection framework as compared with related work. While our detection framework is trained on the pathological images of breast cancer, it can be easily and effectively applied to nucleus detection tasks on other cancers without re-training. © 2019 Elsevier B.V. All rights reserved.
1. Introduction As is well-known, the diagnosis from pathological images remains the “gold standard” in diagnosing a number of diseases including most cancers [1]. The automated pathological image analysis has recently become a significant research focus in early diagnosis. Computer-aided diagnosis (CAD) of pathological images could objectively provide quantitative evaluations and facilitate the final diagnosis [2–4]. Nucleus detection is considered as a critical prerequisite in CAD, providing the counts and locations of nuclei, which are used as quantitative information for accurate diagnosis. Generally, pathologists look for visual cues or features of nuclei (e.g., counts and locations) under a microscope in the diagnosis process, which is obviously time-consuming and operator subjective. Therefore, it is important to achieve an automated,
∗
Corresponding author. E-mail address:
[email protected] (H. Jiang).
https://doi.org/10.1016/j.neucom.2019.06.005 0925-2312/© 2019 Elsevier B.V. All rights reserved.
accurate and robust nucleus detection, aiming for rapid diagnosis on pathological images. 1.1. Related work Traditionally, there are a number of methods based on local and global information for accurate nucleus/cell detection, such as distance transform [5], Laplacian of Gaussian (LoG) Filtering [6], Hough Transform (HT) [7], Maximally Stable Extremal Region (MSER) detection [8] and Radial Symmetry Based Voting [9]. For example, Lin et al. [10] exploited a gradient weighted distance transform method as a version of distance transform to locate nucleus centroids in multiplying the distance map and the normalized gradient magnitude image. However, this method was influenced by the variety of nucleus shape/intensity and presence of noise. Aiming at the weakness of LoG, Chang et al. [11] constrained the LoG based nucleus marker selection by successfully exploiting the response strength and the blue ratio intensity locating on hematoxylin and eosin (H&E) stained histopathology
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
images. HT can be regarded as a feature detection algorithm, Bergmeir et al. [12] first performed a mean-shift filtering for the noise elimination in cervical images, and thereafter applied a randomized HT to Canny detector-generated edge maps for nucleus detection, which is served as the contour initialization for subsequent level set based nuclei segmentation. To enhance the performance of MSER detection, eccentricity had been used in [13] to evaluate the MSER detected blobs and filter out some of these candidates if the eccentricity of the blob was larger than a threshold. A simple improvement for enhancing detection performance was to consider an adaptive threshold instead of the fixed value. For the purpose of handling elliptical objects and adapting to geometric perturbation, Yang and Parvin [14] proposed an iterative method, namely regularized centroid transform, to locate the clustered nuclei in fluorescence microscopy images. It moved the boundary points to the local object center along a minimum energy path, which was regularized with a smoothness constraint. As the incomplete information was present at a given focal plane, the proposed method may produce incorrect results. In addition, there are also some conventional machine learning methods (feature extraction plus classifier design) used for nucleus detection. For instance, Khurd et al. [15] exploited a linear support vector machine (SVM), which input RGB-value features to detect nuclei in H&E prostate images within a non-max suppression heuristic. The lack of other effective features was the general issue of the methods based hand-crafted features. In [16], a general object detection algorithm named Hough forests was introduced to construct discriminative class-specific part appearance codebooks, along with random forests (RF) that were able to cast probabilistic votes. But objects with high pose variations (e.g., irregular nuclei) were difficult to be detected using this method. Finally, Su et al. [17] performed a sparse reconstruction based on the template matching to locate nuclei in H&E stained lung cancer and brain tumor images. Specifically, it exploited the K-selection (a dictionary learning algorithm with a locally constrained sparse representation) [18] to learn a nucleus patch dictionary for testing images and then generated corresponding probability maps. Recently, with the rapid development of deep learning (DL), more and more DL strategies have been applied to automated nucleus detection on pathological images. For example, Hou et al. [19] proposed a sparse convolutional autoencoder (CAE) for fully unsupervised, simultaneous nucleus detection and feature extraction in histopathology tissue images. The CAE was used to detect and encode nuclei into sparse feature maps that encoded both the location and appearance of nuclei. Xu et al. [20] utilized a stacked sparse autoencoder (SSAE) to learn high-level features from just pixel intensities alone in order to identify distinguishing features of nuclei and a sliding window operation was applied to each image for nuclei detection. However, these two methods were only tested on pathological images without complex nucleus scenarios (adhesion or overlap). Besides, there are some other detection methods developed via convolutional neural network (CNN). Xue and Ray [21] located cell centers via a sparsely labeled pixel space based on a CNN model and then recovered sparse cell locations from the pixel space using L1 norm optimization. A little weakness of this detection framework was lack of an end-to-end training process. Liu et al. [22] used the Inception (V3) architecture automatically to detect and localize tumors as small as 100 × 100 pixels in gigapixel microscopy images sized 10 0,0 0 0 × 10 0,0 0 0 pixels. Finally, Sirinukunwattana et al. [23] proposed a spatially constrained convolutional neural network (SC-CNN) to predict the probability of a pixel being the center of a nucleus, where SC-CNN was a new variant of CNN that included a parameter estimation layer and a spatially constrained layer for spatial regression.
495
1.2. Contributions However, the accurate automated nucleus detection remains a challenging task because of two main issues. One is the variability in size, shape, appearance, and texture of nuclei in different pathological images and another is complex nucleus scenario (adhesion or overlap) in tumor areas. To address those two issues, in this paper, we propose an automated nucleus detection framework based on a stacked sparse autoencoder (SSAE) and a case-based postprocessing method (CPM) in a coarse-to-fine manner. The main contributions of this paper are as follows. (1) Only R channel’s images (2-dimension inputs) are used to train the stacked sparse autoencoder plus logistic regression classifier (SSAE + LRC) model instead of RGB images (3-dimension inputs) [20], where an image enhancement method [24] and the Z-score normalization [25] are performed on original images before training to improve the model performance. (2) We apply the trained SSAE + LRC model (via breast cancer images) to detect nuclei on other cancers’ pathological images directly and this is a transfer learning strategy. To detect all nuclei effectively (minimizing false negatives), we expand the number of testing patches 15 times using multi-scale scaling and channel separations. Note that, other channels’ testing inputs (G and B) are tested via the same SSAE + LRC model trained by R channel’s images. (3) As the coarse detection could cause the increase of false positive rates (FPR), CPM (an adaptive postprocessing method) is developed to reduce FPR and also used to locate the centroids of adhesion or overlapped nuclei. 1.3. Structure of the paper The remainder of the paper is organized as follows. Section 2 introduces the training data and testing data. A detailed description of the nucleus detection framework based on SSAE and CPM is presented in Section 3. In Section 4, we describe the experiment settings and the training process. Section 5 presents our experimental results and compares with other existing detection methods. Finally, discussions and conclusions are summarized in Sections 6 and 7. 2. Dataset The training images are composed of 143 H&E stained breast histopathology images and the corresponding segmentation results [26]1 , which have been scanned into a computer using a high resolution whole slide scanner at 40 optical magnification. A few patients have more than one image associated with them (137 patients vs 143 images). The size of each image is 20 0 0 × 20 0 0 pixels. In our experiments, 26,832 image patches are used to train the SSAE + LRC model, where there are 12,732 patches with nuclei (contain complete nucleus boundaries) and 14,100 patches without nuclei (contain incomplete boundaries or only backgrounds). One testing set involving 127 HCC pathological images (76 patients) and manual annotations of all nuclei (the centroid of each nucleus) are obtained by a graduate student under the guidance and validation of three pathologists (they mainly determine the centroids of adhesive or overlapped nuclei) in the pathology department of a large hospital in Shenyang, China. There are a number of adhesive or overlapped nucleus clumps on three pathological grades of HCC images (well differentiation, moderated differentiation and poorly differentiation) in this dataset and this 1 Dataset segmentation/.
URL:
http://www.andrewjanowczyk.com/use- case- 1- nuclei-
496
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
Fig. 1. The automated nucleus detection framework.
may give unacceptable detection results. The size of each image is 1280 × 960 pixels. Through the discussions with the pathologists, the specific processes of obtaining the experimental data are as follows. Firstly, tissue slices are acquired through paraffin-embedding. Then, the H&E slides are cut at 4 μm thickness by a microtome and stained using H&E for 7.5 min. Finally, fast slide scanners are used to generate digital images for image capture at 20x magnification. In addition, each image’s resolution is 0.35 mm/pixel. Another testing set involves 100 H&E stained histology images of colorectal adenocarcinomas [23]. All images have a common size of 500 × 500 pixels and are cropped from non-overlapping areas of 10 whole-slide images from 9 patients, at a pixel resolution of 0.55 μm/pixel (20 optical magnification). The whole-slide images are obtained using an Omnyx VL120 scanner. The cropping areas are selected to represent a variety of tissue appearance from both normal and malignant regions of the slides. 3. Detection framework As shown in Fig. 1, the automated nucleus detection framework is described, which briefly summarizes as four phases. First, all pathological images are enhanced for improving detection accuracy. Second, for the sake of high sensitivity (detect all nuclei possible), we expand the number of testing patches 15 times using multi-scale scaling and channel separations. Third, the SSAE + LRC model is trained via breast cancer images and then used to detect nuclei on pathological images of other cancers, which gives coarse detection results. Finally, CPM is developed for refining the coarse detection results which can reduce FPR and locate adhesive or overlapped nuclei. 3.1. Image enhancement Generally, H&E images vary significantly in color. The detection performance could be hampered by color and intensity variations and these differences arise from a number of factors, including specimen preparation and staining protocol inconsistencies (e.g.,
temperature of solutions), variations in fixation characteristics, interpatient variation, and the scanner used to digitize the slides. In order to minimize this influence, an image enhancement method based on the guided image filter (GIF) and an improved bias field correction [24] is utilized to improve image contrast and quality. Specifically, the image enhanced steps are as follows. •
•
•
•
Firstly, a preprocessing including stain normalization and wavelet denoising is performed for haematoxylin and eosin (H and E) stained pathological images. Secondly, a factor of detail discontinuity is introduced to the traditional bias field model for enhancing the influence of light for high-frequency part and correcting the intensity inhomogeneity and detail discontinuity of image. Thirdly, the high dynamic range (HDR) pathological image is generated based on the least square method. Finally, in order to avoid image gradient reverse and reduce calculation cost, GIF is used to enhance image details instead of bilateral filtering (BF).
Fig. 2 shows a case of image enhancement. Fig. 2(a) is an original H&E pathological image and Fig. 2(b) is the enhanced image. 3.2. Data augmentation As previously mentioned, this paper is to detect all nuclei in a coarse-to-fine manner, which is regarded as a process of object centroid localization. Noticeably, CPM could only remove false positive patches and locate adhesive or overlapped nuclei, it is unable to address the missing detection caused by the unsatisfactory coarse detection. Therefore, we augment the testing patches via multi-scale scaling and channel separations to improve the detection sensitivity and then remove the false positive patches in the coarse results through CPM. Specifically, the steps of data augmentation are as follows. First, based on the sliding window process, there are five scales’ patches centered at each pixel, including 25 × 25, 30 × 30, 35 × 35, 40 × 40 and 45 × 45, respectively. Then, due to the fixed input size of the trained SSAE + LRC model
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
497
Fig. 2. A case of image enhancement. (a) An original H&E pathological image. (b) The corresponding enhanced image.
Fig. 3. The augmented testing data. The top row represents the different scales’ image patches centered at one pixel and the second row represents the corresponding scaling results with the bilinear interpolation. The remaining rows are the R, G and B channel patches respectively.
(45 × 45), we transform the other scales (25 × 25, 30 × 30, 35 × 35, 40 × 40) into the same scale space via the bilinear interpolation. Finally, the RGB patches are separated into R, G and B channels. Fig. 3 shows an example of the data augmentation. The top row represents the different scales of patches centered at one pixel and the second row represents the corresponding scaling results with the bilinear interpolation. The remaining rows are the R, G and B channel patches respectively. Following this process, one patch could thus obtain 15 different inputs (the last three rows in Fig. 3) and then acquire 15 labels (Li = {l1 , l2 , . . . , l15 }, i ∈ 1, . . . , M × N, M × N are the image size) via the trained SSAE + LRC model. It needs to be emphasized that if one label of Li is 1, the label of the patch is 1 (label 1 means that this is a nucleus patch). 3.3. Stacked sparse autoencoder for coarse detection The process of training a stacked sparse autoencoder plus a logistic regression classifier (SSAE + LRC) is shown in Fig. 4, where stack means the double layer sparse autoencoder (SAE) and the basic concept of SAE is presented in an appendix in [20]. SAE means
that we add a sparsity constraint (e.g., if ρˆ < 0.05, the outputs of the activation function are 0. ρˆ represents the mean activation value of one hidden node.) into the hidden layer which could suppress most hidden nodes to learn the discriminative representations. For simplicity, the decoder parts of each basic SAE are not described in Fig. 4. The specific training process is as follows. Step 1. There are 12,732 patches with nuclei (contain complete nucleus boundaries) and 14,100 patches without nuclei (contain incomplete boundaries or only backgrounds) for training the SSAE + LRC model. According to the experimental results (Tables 2 and 3), we only use the R channel’s image patches to determine the SSAE + LRC model and other channels’ (G and B) patches are tested on the same model directly. Step 2. For each input patch pi , the image matrix with 45 × 45 is first reshaped as a column vector x = [x1 , x2 , . . . , x2025 ]T . The first SAE is used to learn the first-order representations h1 of each input x via the encoder in an unsupervised feature learning algorithm and the output layer is effectively a decoder which is trained to reconstruct an approximation x¨ from h1 . The optimal parameters W1 (weights and offsets) are determined by minimizing the
498
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
Fig. 4. The illustration of a stacked sparse autoencoder (SSAE) plus a logistic regression classifier (LRC) for distinguishing the presence or absence of a complete nucleus’s boundary in each image patch.
discrepancy between x and its reconstruction x¨ (e.g., x ≈ x¨) using a back-propagation algorithm. Step 3. Similar to Step 2, training the second SAE is also to find optimal parameters W2 by minimizing the discrepancy between the input and its reconstruction, which could learn the secondorder representations h2 of each original input patch pi . Note that, the inputs of the second SAE are h1 . After training the SSAE, the learned second-order representations of each pi , as well as its label ({h2 (pi ), y(pi )} (y(pi ) ∈ {0, 1}, where 1 and 0 represent the nucleus and non-nucleus patches, respectively.), are fed to the classification layer. Step 4. The classification task is performed using a LRC,
gWLRC (h2 ) =
1 T
1 + exp(−(W LRC ) h2 )
(1)
where gWLRC () is the sigmoid function. We define that the label is 1 if g ≥ 1/2, otherwise is 0. The LRC is trained using {h2 (pi ), y(pi )} by minimizing the cost function defined as follows,
L ( g( x ) , y ) = −
m 1 (yi log(g(xi )) + (1 − yi )log(1 − g(xi ))) m
(2)
i=1
where g(xi ) represents the output probability of each sample via (1) and yi is the corresponding label. m is the number of the training data. Following the gradient descent based approach [27], the parameters WLRC could be determined. After the training procedure of our SSAE + LRC model is completed, the optimal set of parameters ({W1 , W2 , WLRC }) is determined using the breast cancer images. Taking advantage of the transfer learning and the sliding window technique, each patch extracted from other cancers’ pathological images is first represented by the trained SSAE model and its second-order representations are fed to the trained LRC. Finally, each coarse detection result is obtained via the output labels of all selected patches. 3.4. Case-based postprocessing method for fine detection Ideally, each coarse detection could locate all nuclei via the testing patch augmentation and the trained SSAE + LRC model. This, however, may cause the increase of FPR (more non-nucleus patches
are recognized as nucleus patches). In addition, the coarse detection could not locate adhesive or overlapped nuclei effectively. To address these issues, we propose a case-based postprocessing method (CPM) for removing false positive patches and determining the centroids of adhesive or overlapped nuclei. The core of CPM is to determine the threshold parameters adaptively instead of the fixed values. In other words, for each case (patient), the related thresholds are calculated based on the coarse detection results of his/her own pathological image(s) rather than all patients. The flowchart of CPM is drawn in Fig. 5, which could be divided into a classification and two threshold judgement tasks as follows. For one coarse detection result, each connected region is regarded as a nucleus clump and we could calculate its area (a), centroid coordinate (c1 , c2 ), major axis’ length of the minimum enclosing rectangle (lmaj ), minor axis’ length of the minimum enclosing rectangle (lmin ) and area of the convex hull (ac ), respectively. Our first threshold judgement task is to distinguish uncertain nucleus clumps and certain nucleus clumps using the infimum of area thresholds below,
Taiinf = μ(a )i − s(a )i where Tai
inf
(3)
is the infimum of area thresholds for ith case (patient).
μ(a)i and s(a )i are the mean value and the standard deviation of all nucleus clumps’ areas on ith case, respectively. Following (3), we define that, if a nucleus clump’s area is smaller than Tai , the inf nucleus clump is labeled as an uncertain nucleus clump, otherwise as a certain nucleus clump. Note that, μ(a) of all cases are larger than the corresponding s(a ) in our experiments. Generally, uncertain nucleus clumps are categorized as the positive nuclei (Fig. 6(a)) or false positive nuclei, where false positive nuclei include the only backgrounds (caused by noise, Fig. 6(b)) or the incomplete boundaries (Fig. 6(c)). To reduce FPR effectively, we design a classification task via the gray, gradient and distance information. First, we crop 11 kinds of scales’ image patches (from 25 × 25 to 45 × 45 in two pixels’ steps) at each centroid coordinate of uncertain nucleus clumps on original pathological images. Then, the mean value and standard deviation of each patch’s gray and gradient values are calculated to construct the statistical feature vectors. In addition, as the object boundary is more likely to be
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
499
Fig. 5. The flowchart of the case-based postprocessing method (CPM).
Fig. 6. Three examples of uncertain nucleus clumps. (a) A positive nucleus patch, the gray mean value is 79.8414, M=22.6982, S=1.3691. (b) A background patch, the gray mean value is 114.9514, M=33.4786, S=2.7256 (c) A patch with the incomplete boundary, the gray mean value is 87.6044, M=21.8738, S=5.0288.
located in the regions where there is high intensity gradient [28], 50 pixels with the larger gradient magnitude on each patch (ranking from large to small) are regarded as the boundary points and we then calculate the distances between them and the corresponding patch’s center. Next, we define the following distance statistical features,
positive nuclei are smaller than S of the false positive nuclei. Finally, the gray, gradient and distance features are used to train an SVM classifier to recognize positive or false positive nuclei. For certain nucleus regions, the second threshold judgement task is to identify them as the individual nuclei or the non-individual nuclei via the following three thresholds,
M = μ(e(b p1 (x, y ), c (x, y )), .., e(b p50 (x, y ), c (x, y ))
(4)
Taisup = μ(a )i + s(a )i
S = s(e(b p1 (x, y ), c (x, y )), . . . , e(b p50 (x, y ), c (x, y ))
(5)
where M and S are the mean value and standard deviation of the distance vector respectively. e(•,•) is the euclidean distance function. bpi (x, y) (i = 1, 2, . . . , 50) represents the coordinates of boundary points and c(x, y) is the centroid coordinate of the patch. Notice that (4) and (5) are only calculated on the patches with 45 × 45 pixels. Fig. 6 shows three examples of uncertain nucleus clumps. Fig. 6(a) is a positive nucleus. Fig. 6(b) and (c) is both the false positive nuclei. Yellow and white points shows the boundary points and the center of each patch respectively. We can see that, (1) the gray mean value of each patch centered at the background points is larger than the positive nucleus patches’. (2) S of the
T lima j = μ lmin
T aic = μ a
lma j lmin
a i c
a
(6)
i
+s
+s
lma j lmin
i (7)
a i c
(8)
a
where Taisup is the supremum of area thresholds, T li
ma j lmin
represents
a ratio threshold which is calculated by the ratio of the lmaj to the lmin , T ai c is another ratio threshold which is computed by the a
l
ratio of the ac to the a. Based on (6), (7) and (8), if ai , ( lma j )i min
500
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508 Table 1 Parameters of our SSAE + LRC model. Parameters
Value
Input size Image channel Normalization method Average activation of the hidden units Weight decay parameter Weight of sparsity penalty term Number of first layer’s hidden units Number of second layer’s hidden units Iterations
45 × 45 R Z-score 0.05 3e-3 3 625 225 150
the determined concave points, they all belong to the stationary points. The dotted lines denote the examples of the horizontal and vertical lines across the circumscribed rectangle. Fig. 7. The concave point determination based on the line approximation method. Yellow points are the ordinary intersection points (not stationary points). Blue points are the convex points. Red points represent the determined concave points. The dotted lines denote the examples of the horizontal and vertical lines across the circumscribed rectangle.
4. Experiments 4.1. Performance metric In this paper, five measures are used to evaluate detection TP+TN TP performance of different models, ACC = TP+FN+TN+FP , P = TP+FP ,
and ( aanc )i of a nucleus clump are smaller than Taisup , T li
ma j lmin
and
T ai c simultaneously, the nucleus clump is labeled as an individual a
nucleus. Otherwise we further handle the non-individual nuclei using the following concave point detection method. In our study, the intention of concave point detection is to predict the nucleus number of each non-individual nucleus clump via the concave point number of its boundary. However, it is cumbersome to express the analysis formula of each boundary contour due to the curve irregularity and thus we can not obtain the concave points via the derivative method. To address this issue, a line approximation method is introduced to find the concave points as described in the following. First, the direction of long axis of each nucleus clump is adjusted to horizontal direction and a circumscribed rectangle of the contour is built through four horizontal and vertical lines (Here there are four initial stationary points). Then, we slide two lines (horizontal and vertical) across the rectangle and calculate the number of the intersection points between the contour curve and horizontal and vertical lines. For each line, if the number of intersection points is uneven, there is a stationary point among these intersection points. Next, we compute the slopes ki between each intersection point (x0 , y0 ) and its nearest adjacent points on both sides (xl , yl ), (xr , yr ), where y −y ki = xi −x0 , i is chosen as l (left) or r (right). The intersection 0 i point is considered as a stationary point if the signs of kl and kr are different. Finally, the number of concave points (Nconcave ) N
is approximated to Nconcave = floor( stationary ), where floor() is 2 the Gauss rounding function and Nstationary means the number of all stationary points except for those four stationary points. If Nconcave is smaller than 2, the nucleus clump is also regarded as an individual nucleus and we could locate its centroid directly. On the contrary, if Nconcave is equal or larger than 2, we could determine ve ), the nucleus number of the nucleus clump, Nnucleus = floor( Nconca 2 where Nnucleus represents the number of individual nucleus inside the nucleus clump. Thus, the stepwise erosion operation with different disks is performed on the nucleus clump and stopped when the number of connected regions within the nucleus clump is Nnucleus . Following this process, we can locate the centroids of adhesive or overlapped nuclei effectively. An example illustrating this method for determining concave points is shown in Fig. 7. Yellow points are the ordinary intersection points (not stationary points). Blue points are the convex points and red points represent
TP TN R(TPR ) = TP+FN , TNR = TN+FP , Fβ = (1 + β 2 ) × (β 2P××R)+ . Where TP P R and FN are the number of nucleus patches which are correctly detected and incorrectly detected, respectively. TN and FP are the number of non-nucleus patches which are correctly detected and incorrectly detected, respectively. ACC is the overall classification accuracy. P is the precision and R (TPR) is the recall (true positive rate). TNR represents the true negative rate. Fβ represents the F-score, which is the weighted average of P and R. In our experiments, we choose β = 1.
4.2. Parameter setting for our SSAE + LRC model The parameters of SSAE + LRC model are determined via the grid-search technology and 10 cross-validation experiments, where each experiment is randomly composed of 20,0 0 0 training patches and 6832 validation patches. Note that, each patch has been tested at least once. The optimal parameters are presented in Table 1. 4.3. Training our SSAE + LRC model As shown in Fig. 4, the training strategy of SSAE + LRC is considered as the greedy layer-wise approach in a sequential manner. First, the first SAE is applied to the pixel intensity of input patches to learn the parameters W1 and obtain the first-order representations h1 . Second, h1 are fed into the second SAE to learn the parameters W2 and acquire the second-order representations h2 . Finally, the LRC is trained using h2 and the corresponding labels and we could determine the parameters WLRC . Our experiments first train and test the SSAE + LRC model using R, G, B, haematoxylin (H) and gray input images. Table 2 shows the average ACC, P, R, TNR and F1 of SSAE + LRC models trained by different channels’ inputs. It can be seen that R channel performs best in terms of all performance metrics. Then, to consider the influences of image noise and data normalization, Gaussian convolution filter and median filter are performed on the original RGB images respectively and we extract their R channel images. Subsequently, all R channel’s images (extracted from the images that are denoised by [24], Gaussian convolution filter and median filter) are normalized via the standard linear normalization ([0, 1]) and the Z-score normalization [25]. The average ACC, P, R, TNR and F1 of SSAE + LRC models with different combinations between image denoising and data normalization methods are presented in Table 3. Collectively, the combination between the enhanced
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
501
Table 2 The average ACC, P, R(TPR), TNR and F1 ( ± std) with different channels’ inputs. Inputs
ACC
P
R(TPR)
TNR
F1
R G B H Gray
0.9268 ± 0.0212 0.8984 ± 0.0285 0.9009 ± 0.0260 0.9148 ± 0.0181 0.9016 ± 0.0230
0.8938 ± 0.0301 0.8457 ± 0.0382 0.8564 ± 0.0345 0.8809 ± 0.0287 0.8675 ± 0.0324
0.9272 ± 0.0215 0.9125 ± 0.0291 0.9037 ± 0.0302 0.9100 ± 0.0259 0.8891 ± 0.0297
0.9266 ± 0.0234 0.8890 ± 0.0284 0.8990 ± 0.0263 0.9181 ± 0.0261 0.9095 ± 0.0286
0.9102 ± 0.0249 0.8778 ± 0.0323 0.8794 ± 0.0319 0.8952 ± 0.0269 0.8782 ± 0.0301
Table 3 The average ACC, P, R(TPR), TNR and F1 ( ± std) with different combinations between image denoising and data normalization methods. Data preprocessing
ACC
P
R(TPR)
TNR
F1
[24] + Linear [24] + Z-score Gaussian + Linear Gaussian + Z-score Median + Linear Median + Z-score
0.9245 ± 0.0198 0.9522 ± 0.0124 0.8892 ± 0.0248 0.9278 ± 0.0176 0.8954 ± 0.0237 0.9049 ± 0.0206
0.8946 ± 0.0235 0.9632 ± 0.0109 0.8397 ± 0.0321 0.8938 ± 0.0263 0.8491 ± 0.0387 0.8418 ± 0.0275
0.9222 ± 0.0139 0.9288 ± 0.0212 0.8935 ± 0.0268 0.9299 ± 0.0188 0.8979 ± 0.0209 0.9227 ± 0.0201
0.9268 ± 0.0196 0.9757 ± 0.0114 0.8863 ± 0.0227 0.9256 ± 0.0253 0.8937 ± 0.0219 0.8871 ± 0.0239
0.9082 ± 0.0184 0.9456 ± 0.0154 0.8658 ± 0.0299 0.9115 ± 0.0208 0.8728 ± 0.0283 0.8804 ± 0.0223
R channel image and Z-score normalization ([24] + Z-score) is superior to other combinations in terms of most evaluation metrics and contributes to improve the performance of the model. In conclusion, following Tables 2 and 3, the parameters with the highest performance metrics are determined as the optimal parameter set for our SSAE + LRC model, which will be compared with other methods and used for nucleus detection on other cancers’ pathological images in the following sections. 5. Results and comparisons 5.1. Comparisons with other models Performance results of our SSAE + LRC model and comparisons are presented in this section. There are five other models for comparison, including single SAE, three layers’ SAE (TSAE), SSAE [20], two-layer encoders, two-layer CNN, five-layer CNN, eightlayer CNN and deep belief network (DBN). Our contrast experiments test these models on image patches of breast cancer without CPM by 10 cross-validation experiments, where each experiment is randomly composed of 20,0 0 0 training patches and 6832 validation patches. Note that, each patch has been tested at least once. (1) Single SAE + LRC: A single SAE is used to learn the representations of input patches and then they are fed into LRC. The number of hidden units is 625. (2) T SAE + LRC: TSAE is a neural network consisting of three layers’ SAE and its training process is similar to SSAE in a sequential manner. The number of three layers’ hidden units is 625, 225 and 100, respectively. LRC is trained using the representations learned by the third SAE. (3) SSAE + SMC [20]: Different from our SSAE model, SSAE in [20] is trained via original RGB image patches and the number of two layers’ hidden units is 625 and 225. (4) T wo − layer encoders + LRC : Two-layer encoders are used for learning the representations of input patches in an end-toend form rather than the auto-encoder process. The number of two layers’ hidden units is 625 and 225. (5) T wo − layer CNN + LRC: The implementation of CNN is based on an available toolbox in Matlab. Specifically, two-layer CNN is composed of two convolutional layers, two maxpooling layers, one full connection layer, and an output layer. The kernel’s sizes of the first and second convolutional layers are 12 × 12 and 8 × 8. The operational form of convolution is ‘valid’ and the number of each layer’s kernels is 128. The size of max-pooling is 2 × 2.
(6) F ive − layer CNN + LRC: Five-layer CNN is composed of five convolutional layers, two max-pooling layers, one full connection layer, and an output layer. The first two convolution layers are followed by a max-pooling layer respectively and the convolutional kernel’s size is 5 × 5. Then, three continuous convolutional layers with the kernel size of 3 × 3 are used for further feature extraction. The operational form of convolution is ‘valid’ and the number of each layer’s kernels is 128. (7) Eight − layer CNN + LRC: Eight-layer CNN is composed of eight convolutional layers, two max-pooling layers, one full connection layer, and an output layer. The first two convolution layers are followed by a max-pooling layer respectively and the convolutional kernel’s size is 5 × 5. Then, six alternating ‘valid’ and ‘same’ convolutional layers with the kernel size of 3 × 3 are used for further feature extraction. The number of each layer’s kernels is also 128. (8) DBN + LRC: DBN is performed using the same available toolbox. Our designed DBN is composed of two restricted Boltzmann machines (RBM) and the number of two hidden layer nodes is set to 100. To sum up, the aim of comparing with single SAE and TSAE is to show the efficiency of shallow (single layer) and deep (three layers) architectures. The comparison with SSAE + SMC in [20] is to demonstrate the performance variation of SSAE models when using different inputs. The goal of comparing with two-layer encoders is to show the importance of AE’s training form. The intention of comparing with two-layer CNN and DBN is to present the feature extraction abilities of different DL models, where CNN and DBN are both designed as two layers’ architectures. The experiments with regards to five-layer CNN and eight-layer CNN are performed to demonstrate the influence of different layers of CNN on the feature extraction for nucleus or non-nucleus patches. The average ACC, P, R, TNR and F1 are presented in Table 4. We can see that our SSAE + LRC model performs best in terms of most evaluation indexes (ACC, P, TNR and F1 ) and SSAE in [20] is slightly better than ours in R (TPR). In addition, Fig. 8 shows the precision-recall (PR) curves and receiver operating characteristic (ROC) curves to assess the performance of nucleus detection provided by these models. Fig. 8(a) shows the P-R curves and Fig. 8(b) presents the ROC curves. Similarly, our SSAE + LRC model achieves the best comprehensive performance. Finally, Table 5 presents the number of parameters in different models. As can be seen that under the condition of the same model’s size, our SSAE is better than two-layer encoders and deep layer’s CNN in identifying nuclei or non-nuclei.
502
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508 Table 4 The average ACC, P, R(TPR), TNR and F1 ( ± std) with different models. Models
ACC
P
R(TPR)
TNR
F1
Single SAE TSAE [20] Two-layer encoders Two-layer CNN Five-layer CNN Eight-layer CNN DBN Our SSAE
0.8665 ± 0.0203 0.8881 ± 0.0236 0.9268 ± 0.0163 0.8536 ± 0.0216 0.9324 ± 0.0164 0.9362 ± 0.0179 0.9298 ± 0.0153 0.9411 ± 0.0139 0.9522 ± 0.0124
0.8076 ± 0.0237 0.90 0 0 ± 0.0259 0.9036 ± 0.0243 0.8653 ± 0.0209 0.9446 ± 0.0135 0.9510 ± 0.0139 0.9568 ± 0.0128 0.9485 ± 0.0126 0.9632 ± 0.0109
0.8745 ± 0.0241 0.8416 ± 0.0267 0.9333 ± 0.0196 0.8376 ± 0.0223 0.9018 ± 0.0206 0.9198 ± 0.0232 0.9002 ± 0.0187 0.9171 ± 0.0197 0.9288 ± 0.0212
0.8612 ± 0.0279 0.9346 ± 0.0213 0.9004 ± 0.0228 0.8696 ± 0.0217 0.9630 ± 0.0147 0.9562 ± 0.0163 0.9564 ± 0.0124 0.9651 ± 0.0128 0.9757 ± 0.0114
0.8397 ± 0.0239 0.8699 ± 0.0262 0.9182 ± 0.0219 0.8512 ± 0.0198 0.9227 ± 0.0173 0.9351 ± 0.0191 0.9276 ± 0.0169 0.9326 ± 0.0163 0.9456 ± 0.0154
Fig. 8. The P-R curves and ROC curves on the detection accuracies of our SSAE model compared to single SAE, TSAE, SSAE in [20], two-layer encoders, two-layer CNN, five-layer CNN, eight-layer CNN and DBN. (a) The P-R curves. (b) The ROC curves.
Table 5 The number of parameters in different models. Models
Number of parameters
Single SAE TSAE [20] Two-layer encoders Two-layer CNN Five-layer CNN Eight-layer CNN DBN Our SSAE
≈ 1267K ≈ 1429K ≈ 2521K ≈ 1357K ≈ 1067K ≈ 861K ≈ 1034K ≈ 213K ≈ 1357K
5.2. Nucleus detection on HCC pathological images The transfer learning in [34] is to use the parameters of the trained models (via natural images) to extract medical images’ features or initialize the parameters of a particular model. Motivated by [34], our strategy of transfer learning for nucleus detection is to determine the parameters of the model using a kind of cancer’s pathological images (breast cancer). Then, we test SSAE + LRC model on other pathological images (HCC and colon cancer) directly. Different from the fine-tuning method, there are three important reasons with respect to our transfer learning strategy: (1) H&E pathological images are imaged by different digital devices, but the image enhancement method (Section 3.1) used in our experiments can alleviate the influences of color differences and uniform the color distribution of each image approximatively (e.g., nucleus regions are darker that others). (2) Although the training and testing data come from different tissue samples, the nuclei are
similar in shape and appearance. (3) As is well-known, it is impractical to use complete dataset to train a deep learning model because the labels’ acquirement of pathological images is expensive. Our ultimate goal of this study is to train an automatic detection model that can be used for testing different tissue samples’ H&E pathological images without fine-tuning, and then implement the online learning function. A sliding window scheme is utilized for selecting candidate patches in one pixel step size. In other words, our model identifies the presence or absence of a nucleus within every individual image patch centered at each pixel. As described in Section 3.2, the testing patches are expanded to 15 times by multi-scale scaling and channel separations. The label of each testing patch is then determined via the set with 15 labels. Note that, if one label in the set is 1, the patch is identified as a nucleus patch. Except for the models in Table 4, the following nucleus detection methods are selected for comparison, including color deconvolution (CD) [29], expectation-maximization (EM) [30] and structural regression CNN (SR-CNN) [31]. We implement those methods and determine the parameters based on the processes as described in the literatures (e.g., the learning and dropout rates in SR-CNN are set as 0.0 0 05 and 0.2.) Note that, SR-CNN is trained and tested using HCC pathological images. The trained models (via breast cancer) in Table 4 are tested on HCC pathological images directly. Subsequently, CPM acts on the coarse detection results generated by SSAE [20], single SAE, TSAE, CNN, DBN and our SSAE models. In addition, we also apply two end-to-end detection methods to our nucleus detection task, one is you only look once v3 (YOLOv3) [32] and another is single shot multibox detector (SSD) [33]. The initial weights of these two networks are pre-trained using
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
503
Fig. 9. The nucleus detection results on HCC pathological images using different methods. (a) CD [29]. (b) EM [30]. (c) SR-CNN [31]. (d) YOLOv3 [32]. (e) SSD [33]. (f) SSAE [20]-CPM. (g) Single SAE-CPM. (h) TSAE-CPM. (i) Two-layer encoders-CPM. (j) Two-layer CNN-CPM. (k) Five-layer CNN-CPM. (l) Eight-layer CNN-CPM.(m) DBN-CPM. (n) Our SSAE-CPM. (o) Ground truth.
natural images and fine-tuned via the breast cancer data, then we use them to detect nuclei on HCC pathological images directly. The final detection results of CD, EM, SR-CNN, YOLOv3, SSD, SSAE [20]CPM, single SAE-CPM, TSAE-CPM, two-layer encoders-CPM, twolayer CNN-CPM, five-layer CNN-CPM, eight-layer CNN-CPM, DBNCPM, our SSAE-CPM and ground truth on a magnified region of interest (ROI) are shown in Fig. 9. Fig. 9(a)–(n) is the detection results using CD, EM, SR-CNN, YOLOv3, SSD, SSAE [20]-CPM, single SAE-CPM, TSAE-CPM, two-layer encoders-CPM, two-layer CNNCPM, five-layer CNN-CPM, eight-layer CNN-CPM, DBN-CPM and our SSAE-CPM, respectively. Fig. 9(o) is the ground truth obtained by pathologists. These results appear to suggest that our trained SSAE model performs well in learning beneficial features in identifying nuclei or non-nuclei and the proposed CPM could locate adhesion and overlapped nuclei accurately. The average ACC, P, R(TPR), TNR and F1 of detection results using these 14 methods are presented in Table 6. Here the correct detection of nucleus patches (TP) are defined as those instances in which the distance between the center of each detected nucleus and the nearest annotated nucleus by pathologist is less than or equal to 6 pixels. We can see that SSAECPM outperforms other ten nucleus detection methods. Finally, the detection results of the proposed SSAE-CPM for two whole HCC pathological images are illustrated in Fig. 10. The two cases show that almost all nuclei are detected using SSAE-CPM. To further test the robustness of our SSAE-CPM, we perform nucleus detection on HCC pathological images with Gaussian noise. Fig. 11 shows the nucleus detection results on an original image and its noise images. Fig. 11(a) is a magnified ROI extracted from a poorly differentiated HCC pathological image and it can be
Table 6 The average ACC, P, R(TPR), TNR and F1 of nucleus detection on HCC pathological images with different models. Models
ACC
P
R(TPR)
TNR
F1
CD [29] EM [30] SR-CNN [31] YOLOv3 [32] SSD [33] SSAE [20]-CPM Single SAE-CPM TSAE-CPM Two-layers encoders-CPM Two-layers CNN-CPM Five-layers CNN-CPM Eight-layers CNN-CPM DBN-CPM Our SSAE-CPM
0.6771 0.7167 0.7885 0.6089 0.7325 0.8401 0.8191 0.8273 0.7913 0.8464 0.8507 0.8278 0.8536 0.8719
0.6729 0.7214 0.8023 0.7544 0.8228 0.8697 0.8298 0.8233 0.7967 0.8328 0.8509 0.8344 0.8726 0.8923
0.6891 0.7059 0.7658 0.5405 0.7204 0.8002 0.8028 0.8334 0.7823 0.8659 0.8524 0.8209 0.8279 0.8579
0.6650 0.7274 0.8112 0.7184 0.7518 0.8802 0.8234 0.8212 0.8004 0.8262 0.8510 0.8322 0.8792 0.8978
0.6809 0.7136 0.7744 0.6298 0.7773 0.8335 0.8161 0.8283 0.7894 0.8490 0.8516 0.8276 0.8497 0.8748
seen that there are a number of adhesive or overlapped nuclei. Fig. 11(b)-(e) are the noise images with the same mean value (m=0) and different variance (v = 0.05, v=0.1, v=0.15 and v=0.2). Fig. 11(f)–(j) is the nucleus detection results on Fig. 11(a)–(e) using our SSAE-CPM. These detection results show that a smaller variance noise has little impact on the detection performance of SSAE-CPM and a larger variance noise may cause performance degradation. They also demonstrate that it is necessary for image enhancement and denoising before nucleus detection. As previously mentioned, CPM is designed for refining coarse detection and its role is to reduce FPR and locate adhesive or overlapped nuclei.
504
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
Fig. 10. Nucleus detection on whole HCC pathological images.
Fig. 11. The nucleus detection results on an original image and its noise images using SSAE-CPM. (a) An original image. (b) The noise image (v=0.05). (c) The noise image (v=0.1). (d)The noise image (v=0.15). (e) The noise image (v=0.2). (f) The detection result based on (a). (g) The detection result based on (b). (h) The detection result based on (c). (i) The detection result based on (d). (j) The detection result based on (e).
Fig. 12. The phased detection results of CPM. (a) The coarse detection result on Fig. 11(a). (b) The nucleus detection result without CPM. (c) The nucleus detection result generated by CPM with only considering FPR. (d) The nucleus detection result generated by CPM with only considering adhesive or overlapped nuclei. (e) The nucleus detection result generated by CPM. Red rectangles represent the false positive nucleus centroids and blue rectangles show the unseparated adhesive or overlapped nuclei.
Fig. 12 shows the phased detection results of CPM. Fig. 12(a) is the coarse detection result of Figs. 11(a) and 12(b) is the centroid localization result based on Fig. 12(a) without CPM. Fig. 12(c) is the nucleus detection result which only removes false positive nuclei via CPM and Fig. 12(d) shows the nucleus detection result which only considers adhesive or overlapped nuclei by CPM. Fig. 12(e) is the nucleus detection result using our SSAE-CPM. Red rectangles represent the false positive nucleus centroids and blue rectangles show the unseparated adhesive or overlapped nuclei. Finally, the average ACC, P, R(TPR), TNR and F1 of nucleus detection on HCC pathological images using different CPMs are presented in Table 7.
Table 7 The average ACC, P, R(TPR), TNR and F1 of nucleus detection on HCC pathological images using different CPMs.
ACC P R(TPR) TNR F1
Without CPM
CPM with only considering FPR
0.7745 0.7664 0.8024 0.7466 0.7840
0.8142 0.8141 0.8144 0.8140 0.8143
CPM with only considering adhesive or overlapped nuclei 0.8337 0.8398 0.8246 0.8428 0.8321
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
505
Fig. 13. The nucleus detection results of a colon cancer pathological image by different models. (a) SSAE [20]-CPM. (b) Single SAE-CPM. (c) TSAE-CPM. (d) Two-layer encodersCPM. (e) Two-layer CNN-CPM. (f) Five-layer CNN-CPM. (g) Eight-layer CNN-CPM. (h) DBN-CPM. (i) Our SSAE-CPM.
5.3. Nucleus detection on pathological images of colon cancer This section presents the nucleus detection results on pathological images of colon cancer (a public dataset) [23] using our SSAE-CPM. Similarly, the trained models (via breast cancer images) in Table 4 are tested on colon cancer images directly, and then CPM is performed on these coarse detection results. Fig. 13 illustrates the nucleus detection results of the same case using different models. Fig. 13(a)–(i) is the nucleus detection results using SSAE [20]-CPM, single SAE-CPM, TSAE-CPM, two-layer encoders-CPM, two-layer CNN-CPM, five-layer CNN-CPM, eightlayer CNN-CPM, DBN-CPM and our SSAE-CPM, respectively. The average ACC, P, R(TPR), TNR and F1 of nucleus detection on colon cancer pathological images are presented in Table 8. We can see that our SSAE-CPM could achieve satisfactory detection results on pathological images of colon cancer. 6. Discussions Following Tables 2 and 3, R channel’s inputs have been proven to be more effective for training our SSAE + LRC model than other
Table 8 The average ACC, P, R(TPR), TNR and F1 of nucleus detection on colon cancer pathological images. Models
ACC
P
R(TPR)
TNR
F1
SSAE [20]-CPM Single SAE-CPM TSAE-CPM Two-layers encoders-CPM Two-layers CNN-CPM Five-layers CNN-CPM Eight-layers CNN-CPM DBN-CPM Our SSAE-CPM
0.8473 0.8060 0.8416 0.7763 0.8598 0.8492 0.8264 0.8408 0.8684
0.8401 0.7765 0.8117 0.795 0.8381 0.8593 0.8589 0.8233 0.8607
0.8002 0.7426 0.7802 0.7446 0.7996 0.8022 0.7748 0.7811 0.8118
0.8841 0.8504 0.8817 0.808 0.8991 0.8982 0.878 0.8826 0.9080
0.8197 0.7592 0.7956 0.7693 0.8184 0.8298 0.8147 0.8016 0.8355
channels’ inputs. Besides, we also use different image preprocessing and data normalization methods to further improve the model’s performance. Our SSAE + LRC model is thus determined based on the best experimental results. As a goal of this paper is to obtain nucleus detection results with high precision and recall, data augmentation is performed on testing images to enhance the recall rate of detection using multi-scale scaling and channel
506
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
Fig. 14. The visualization of learned features with three layers’ SSAE model. (a) The learned first-order representations. (b) The learned second-order representations. (c) The learned third-order representations. (a) shows some boundary and corner information of nuclei. (b) represents the combinations of these first-order representations. (c) presents that the third-order representations cause the overfitting by lack of low-level information (e.g., boundary, corner information or their combinations).
separations, which could not only consider different channels’ information but also take advantage of the foreground (nuclei) and background (non-nuclei) information at different scales (Fig. 3). Note that, the testing images of G and B channels are all detected by the trained SSAE + LRC model and this also can be regarded as a transfer learning process. Comparing results from SSAE + LRC model with other models in Table 4, our results have confirmed the same conclusion as in [20], that is, SSAE is more effective than different SAEs (single SAE and TSAE) and some other DL models (CNN and DBN) with respect to feature extraction ability for nucleus detection. DBN is also an unsupervised learning algorithm and composed of multiple RBMs. RBM extracts the high-level representation based on the probability distribution of samples, while SAE learns the hidden representation of samples through nonlinear transformation. Thus, SAE has more advantages than RBM in the extraction of single target features. On the other hand, the results of the CNN-based model are slightly inferior to SSAE for our nucleus detection and a main reason is that the designed CNN architecture is only composed of two convolution layers (explained in Section 5.1). CNN learns a set of locally connected neurons through local receptive fields for feature extraction. SAE is an encoder-decoder architecture which can extract features by minimizing the discrepancy between an input and its reconstruction. In other words, CNN is a partial connection model which focuses on local information while SAE is a full connection model that learns a single global weight matrix for feature representation. For our detection task, each input patch may contain a single target (nucleus) that would be represented more effectively by a full connection model. Besides, the detection results (Tables 4, 6 and 8) obtained by different layers of CNN models demonstrate that the nucleus detection may not require too deeper structural CNN, this is because that shallow features of a CNN model focus more on edge and corner information, which are beneficial for identifying whether each image patch contains a complete nucleus. To illustrate the advantages of a two-layer architecture, Fig. 14 shows the visualization of the learned multilayer features in the first, second and third hidden layers from training patches with SSAE. The first-order features are sensitive to boundary and corner of nuclei or non-nuclei (Fig. 14(a)). The second-order representations are combinations of different boundaries (Fig. 14(b)) and (the third-order features present the overfitting problem by lack of low-level information (Fig. 14(c)). The difference of the training strategy from [20] is that our SSAE + LRC model is trained using single channel images (R) rather than color images (RGB). Moreover, we apply our trained SSAE + LRC model to other cancers’ pathological images directly without retraining or fine-tune. The experiment results show that our SSAE + LRC model is also robust for other cancers’ pathological images.
In addition, we further compare the proposed SSAE-CPM with CD [29], EM [30], SR-CNN [31], YOLOv3 [32] and SSD [33], which are some of the popular nucleus detection and natural image detection models. CD [29] deconvolved the color information of H&E images and calculated the contribution of each of the applied stains. The detection results using CD are influenced by image quality. The core of EM [30] was to integrate the expectationmaximization with a geodesic active contour (GAC) to detect and segment nuclei. It could be impacted by the high prevalence of overlapping nuclei. These two methods require no training phases. SR-CNN [31] was a deep learning strategy, which extended the concept of a single-pixel regression of center-of-the pixel CNN (CPCNN) into a full-patch regression. YOLOv3 [32] and SSD [33] have been successfully applied in natural image detection. We reimplement these two methods for our nucleus detection task. The initial weights of YOLOv3 are pre-trained based on Darknet-53 model using natural images and then fine-tuned via the breast cancer data. The training process of SSD is similar to YOLOv3 except for replacing Darknet-53 with VGG-16. The thresholds of intersection over union (IOU) and confidence coefficient are set as 0.4 and 0.5 respectively. The visual and quantitative results show their disadvantages in nucleus detection task, this is because that, (1) YOLOv3 and SSD are both essentially patch-level detection methods and thus the larger nucleus density will cause more serious missed detection. (2) The outputs of YOLOv3 and SSD are both bounding boxes and can not be used for reducing FPR and locating adhesive or overlapped nuclei by CPM. As Figs. 9 and 13 show, our SSAE-CPM could achieve more satisfactory detection results than other methods. The main contribution of this paper is to develop CPM based on the adaptive thresholds of each case (patient), which can reduce FPR and locate adhesive or overlapped nuclei. Table 6 demonstrates that CPM is effective for refining the coarse detection results acquired by different models. According to Fig. 12 and Table 7, we can see that different CPMs could address the specific issues and then improve detection performance. It is a fact that there is a decreased detection performance for HCC and colon cancer pathological images compared to breast cancer. For both cases, a reason is the difference in the H&E image acquisition process. Specifically, there are a number of adhesive or overlapped nuclei on HCC pathological images with serious differentiation. For colon cancer, the variations of nucleus size and shape (compared to training nuclei) could impact the detection performance. Future work may be considered based on our SSAE-CPM. Fig. 15 shows the coarse detection result of an example. Fig. 15(a) is an original HCC pathological image and Fig. 15(b) is the coarse detection result using the trained SSAE + LRC model. Fig. 15(c) presents that Fig. 15(b) is superimposed on the grayscale image of Fig. 15(a). An interesting idea is that the coarse detection result
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
507
Fig. 15. The coarse detection result of a pathological image. (a) The original image. (b) The coarse detection result. (c) The coarse detection result is superimposed on the grayscale image of (a).
could also be regarded as the initial contour of level set method for segmenting nuclei. For CPM, we plan to consider additional forms of classification tasks to handle more complex nuclei. 7. Conclusions This paper proposes an automated nucleus detection framework based on SSAE and CPM in a coarse-to-fine manner. Our SSAE + LRC model is trained by breast cancer images and can be used for nucleus detection on the pathological images of other cancers. The augmented inputs consider information of different channels and scales adequately and nuclei can be detected reliably. For false positive nuclei and adhesive or overlapped nuclei, CPM, a method of the adaptive threshold determination, is proposed for refining the coarse detection results. The experimental results and analysis demonstrate that SSAE-CPM can achieve satisfactory nucleus detection performance and outperform existing detection methods. Conflicts of interest The authors declare that they have no competing interests. Acknowledgment This work was supported by the National Natural Science Foundation of China (No. 61472073, No. 61872075). References [1] S.A. Hoda, R.S. Hoda, Rubin’s pathology: Clinicopathologic foundations of medicine, 5th edition, J. Am. Med. Assoc. (JAMA) 298 (17) (2007) 2070–2075, doi:10.1001/jama.298.17.2073. [2] E. Meijering, Cell segmentation: 50 years down the road [life sciences], IEEE Signal Process. Mag. 29 (5) (2012) 140–145, doi:10.1109/msp.2012.2204190. [3] M. Gurcan, L. Boucheron, A. Can, A. Madabhushi, N. Rajpoot, B. Yener, Histopathological image analysis: a review, IEEE Rev. Biomed. Eng. 2 (2009) 147–171, doi:10.1109/rbme.2009.2034865. [4] H. Irshad, A. Veillard, L. Roux, D. Racoceanu, Methods for nuclei detection, segmentation, and classification in digital histopathology: a review 2014; current status and future potential, IEEE Rev. Biomed. Eng. 7 (2014) 97–114, doi:10.1109/rbme.2013.2295804. [5] C. Maurer, R. Qi, V. Raghavan, A linear time algorithm for computing exact euclidean distance transforms of binary images in arbitrary dimensions, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2) (2003) 265–270, doi:10.1109/tpami. 2003.1177156. [6] G. Blanchet, M. Charbit, Digital Signal and Image Processing using MATLAB®, ISTE, 2006. [7] R.O. Duda, P.E. Hart, Use of the hough transformation to detect lines and curves in pictures, Commun. ACM 15 (1) (1972) 11–15, doi:10.1145/361237. 361242. [8] J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions, in: Proceedings of the British Machine Vision Conference 2002, British Machine Vision Association, 2002. [9] D. Reisfeld, H. Wolfson, Y. Yeshurun, Context-free attentional operators: the generalized symmetry transform, Int. J. Comput. Vis. 14 (2) (1995) 119–130, doi:10.1007/bf01418978.
[10] G. Lin, U. Adiga, K. Olson, J.F. Guzowski, C.A. Barnes, B. Roysam, A hybrid 3d watershed algorithm incorporating gradient cues and object models for automatic segmentation of nuclei in confocal image stacks, Cytometry 56A (1) (2003) 23–36, doi:10.1002/cyto.a.10079. [11] H. Chang, J. Han, A. Borowsky, L. Loss, J.W. Gray, P.T. Spellman, B. Parvin, Invariant delineation of nuclear architecture in glioblastoma multiforme for clinical and molecular association, IEEE Trans. Med. Imaging 32 (4) (2013) 670–682, doi:10.1109/tmi.2012.2231420. [12] C. Bergmeir, M.G. Silvente, J.M. Benítez, Segmentation of cervical cell nuclei in high-resolution microscopic images: a new algorithm and a web-based software framework, Comput. Methods Progr. Biomed. 107 (3) (2012) 497–512, doi:10.1016/j.cmpb.2011.09.017. [13] Z. Lu, G. Carneiro, A.P. Bradley, Automated nucleus and cytoplasm segmentation of overlapping cervical cells, in: Proceedings of the Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013, Springer Berlin Heidelberg, 2013, pp. 452–460. [14] Q. Yang, B. Parvin, Harmonic cut and regularized centroid transform for localization of subcellular structures, in: Proceedings of the Object Recognition Supported by User Interaction for Service Robots, IEEE Computer Society, 2003. [15] P. Khurd, L. Grady, A. Kamen, S. Gibbs-Strauss, E.M. Genega, J.V. Frangioni, Network cycle features: application to computer-aided gleason grading of prostate cancer histopathological images, in: Proceedings of the IEEE International Symposium on Biomedical Imaging: From Nano to Macro, IEEE, 2011. [16] J. Gall, A. Yao, N. Razavi, L.V. Gool, V. Lempitsky, Hough forests for object detection, tracking, and action recognition, IEEE Trans. Pattern Anal. Mach. Intell. 33 (11) (2011) 2188–2202, doi:10.1109/tpami.2011.70. [17] H. Su, F. Xing, X. Kong, Y. Xie, S. Zhang, L. Yang, Robust cell detection and segmentation in histopathological images using sparse reconstruction and stacked denoising autoencoders, in: Deep Learning and Convolutional Neural Networks for Medical Image Computing, Springer International Publishing, 2017, pp. 257–278. [18] B. Liu, J. Huang, L. Yang, C. Kulikowsk, Robust tracking using local sparse appearance model and k-selection, in: Proceedings of the CVPR, IEEE, 2011. [19] L. Hou, V. Nguyen, D. Samaras, T.M. Kurc, Y. Gao, T. Zhao, J.H. Saltz, Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images, Pattern Recogn. 86 (2019) 188–200. [20] J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, A. Madabhushi, Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images, IEEE Trans. Med. Imaging 35 (1) (2016) 119–130, doi:10.1109/tmi.2015. 2458702. [21] Y. Xue, N. Ray, Cell detection with deep convolutional neural network and compressed sensing, arXiv preprint, arXiv:1708.03307, 2017. [22] Y. Liu, K. Gadepalli, M. Norouzi, G.E. Dahl, T. Kohlberger, A. Boyko, S. Venugopalan, A. Timofeev, P.Q. Nelson, G.S. Corrado, et al., Detecting cancer metastases on gigapixel pathology images, arXiv preprint, arXiv:1703.02442, 2017. [23] K. Sirinukunwattana, S.E.A. Raza, Y.-W. Tsang, D.R.J. Snead, I.A. Cree, N.M. Rajpoot, Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images, IEEE Trans. Med. Imaging 35 (5) (2016) 1196–1206, doi:10.1109/tmi.2016.2525803. [24] Q. Sun, H. Jiang, G. Zhu, S. Li, S. Gong, B. Yang, L. Zhang, HDR pathological image enhancement based on improved bias field correction and guided image filter, BioMed Res. Int. 2016 (2016) 1–11, doi:10.1155/2016/7478219. [25] T. Hara, T. Kobayashi, S. Ito, X. Zhou, T. Katafuchi, H. Fujita, Quantitative analysis of torso FDG-PET scans by using anatomical standardization of normal cases from thorough physical examinations, PLOS ONE 10 (5) (2015) e0125713, doi:10.1371/journal.pone.0125713. [26] A. Janowczyk, A. Madabhushi, Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases, J. Pathol. Inf. 7 (1) (2016) 29, doi:10.4103/2153-3539.186902. [27] A. Ng, Sparse autoencoder, CS294A Lecture notes 72(2011) (2011) 1–19. [28] S. Li, H. Jiang, Y. dong Yao, B. Yang, Organ location determination and contour sparse representation for multiorgan segmentation, IEEE J. Biomed. Health Inf. 22 (3) (2018) 852–861, doi:10.1109/jbhi.2017.2705037. [29] A.C. Ruifrok, D.A. Johnston, et al., Quantification of histochemical staining by color deconvolution, Anal. Quant. Cytol. Histol. 23 (4) (2001) 291– 299.
508
S. Li, H. Jiang and J. Bai et al. / Neurocomputing 359 (2019) 494–508
[30] H. Fatakdawala, J. Xu, A. Basavanhally, G. Bhanot, S. Ganesan, M. Feldman, J.E. Tomaszewski, A. Madabhushi, Expectation–maximization-driven geodesic active contour with overlap resolution (EMaGACOR): application to lymphocyte segmentation on breast cancer histopathology, IEEE Trans. Biomed. Eng. 57 (7) (2010) 1676–1689, doi:10.1109/tbme.2010.2041232. [31] Y. Xie, F. Xing, X. Kong, H. Su, L. Yang, Beyond classification: Structured regression for robust cell detection using convolutional neural network, in: Lecture Notes in Computer Science, Springer International Publishing, 2015, pp. 358–365. [32] J. Redmon, A. Farhadi, Yolov3: An incremental improvement, arXiv preprint, arXiv:1804.02767, 2018. [33] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg, SSD: single shot multiBox detector, in: Computer Vision – ECCV 2016, Springer International Publishing, 2016, pp. 21–37. [34] H.-C. Shin, H.R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, R.M. Summers, Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning, IEEE Trans. Med. Imaging 35 (5) (2016) 1285–1298, doi:10.1109/tmi.2016.2528162. Siqi Li received the B.S. degree from the Department of mathematics and software sciences, Sichuan Normal University (SNU), China in 2011, M.S. degree from the Department of Mathematics, Bohai University, China in 2015. Now he is perusing Ph.D degree in the Department of Software College, Northeastern University, China. His research interests include medical image detection, segmentation, classification, machine learning and deep learning.
Huiyan Jiang received the B.S. degree from the Department of Mathematics, Bohai University, China in 1986, the Master Sci. Degree of Computer Application in 20 0 0 and the Doctor’s degree of Control Theory and Control Engineering in 2009 both from Northeastern University, China. From Oct. 2001 to Sep. 2002, she was a visiting scholar in Gifu University, Japan and carried out the research of image processing and medical image computer aided diagnosis (CAD) technology. Now she is a professor and the director of the Department of Digital Media Technology, Software College, Northeastern University, China; the council member of the 3D Images Technology Association, China Society of Image and Graphics. Her main research interests are focus on the digital image processing and analysis, pattern recognition, 3D visualization, 3D video processing, artificial intelligence, medical image computer-aided diagnosis (CAD).
Jie Bai received the B.S. degree from the Department of Software Engineering, Northeastern University, China in 2017. She is currently a graduate student in the Department of Sino-Dutch Biomedical and Information Engineering School Northeastern University. Her research interests include pathological image detection and classification.
Ye Liu received B.S. degree from the Department of Clinical Medicine, Jiamusi University of Medical Sciences, Heilongjiang, China in 20 0 0, M.S. degree from the Department of Ophthalmology, Jilin University, Jilin, China in 2003, and Ph.D degree from the Department of Pathology, Yamaguchi University Graduate School of Medicine Yamaguchi, Japan in 2008. He is the Vice Director, Associate Professor and Associate senior doctor of the Department of Pathology in the Fifth Affiliated Hospital, Sun Yat-sen University.
Yu-dong Yao received the B.Eng. and M.Eng. degrees from Nanjing University of Posts and Telecommunications, Nanjing, China, in 1982 and 1985, respectively, and the Ph.D. degree from Southeast University, Nanjing, in 1988, all in electrical engineering. From 1987 to 1988, he was a visiting student at Carleton University, Ottawa, Canada. From 1989 to 20 0 0, he was with Carleton University, Spar Aerospace Ltd., Montreal, Canada, and Qualcomm Inc., San Diego, USA. Since 20 0 0, he has been with Stevens Institute of Technology, Hoboken, USA, where he is currently a Professor and Chair of the Department of Electrical and Computer Engineering. He holds one Chinese patent and 13 U.S. patents. His research interests include wireless communications, machine learning and deep learning techniques, and healthcare and medical applications. He served as an Associate Editor for the IEEE COMMUNICATIONS LETTERS (20 0 0 to 20 08) and the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY (20 01 to 20 06) and as an Editor for the IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS (2001 to 2005). For his contributions to wireless communications systems, he was elected a Fellow of IEEE (2011), National Academy of Inventors (2015), and Canadian Academy of Engineering (2017).