Glomerulosclerosis identification in whole slide images using semantic segmentation

Glomerulosclerosis identification in whole slide images using semantic segmentation

Computer Methods and Programs in Biomedicine 184 (2020) 105273 Contents lists available at ScienceDirect Computer Methods and Programs in Biomedicin...

5MB Sizes 0 Downloads 37 Views

Computer Methods and Programs in Biomedicine 184 (2020) 105273

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine journal homepage: www.elsevier.com/locate/cmpb

Glomerulosclerosis identification in whole slide images using semantic segmentation Gloria Bueno a,∗, M. Milagro Fernandez-Carrobles a, Lucia Gonzalez-Lopez b, Oscar Deniz a a b

University of Castilla-La Mancha, ETSI Industriales, VISILAB, Ciudad Real, Spain Hospital General Universitario de Ciudad Real, Ciudad Real, Spain

a r t i c l e

i n f o

Article history: Received 13 July 2019 Revised 12 November 2019 Accepted 10 December 2019

Keywords: Semantic segmentation Deep learning Consecutive segmentation-classification CNN Digital pathology Glomeruli detection Sclerotic glomeruli Segnet U-Net

a b s t r a c t Background and Objective: Glomeruli identification, i.e., detection and characterization, is a key procedure in many nephropathology studies. In this paper, semantic segmentation based on convolutional neural networks (CNN) is proposed to detect glomeruli using Whole Slide Imaging (WSI) follows by a classification CNN to divide the glomeruli into normal and sclerosed. Methods: Comparison between U-Net and SegNet CNNs is performed for pixel-level segmentation considering both a two and three class problem, that is, a) non-glomerular and glomerular structures and b) non-glomerular normal glomerular and sclerotic structures. The two class semantic segmentation result is then used for a CNN classification where glomerular regions are divided into normal and global sclerosed glomeruli. Results: These methods were tested on a dataset composed of 47 WSIs belonging to human kidney sections stained with Periodic Acid Schiff (PAS). The best approach was the SegNet for two class segmentation follows by a fine-tuned AlexNet network to characterize the glomeruli. 98.16% of accuracy was obtained with this process of consecutive CNNs (SegNet-AlexNet) for segmentation and classification. Conclusion: The results obtained demonstrate that the sequential CNN segmentation-classification strategy achieves higher accuracy reducing misclassified cases and therefore being the methodology proposed for glomerulosclerosis detection. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Glomeruli are clusters of capillaries which are responsible for expelling substances composed of waste and extra fluids unnecessary for the human body. Glomerular diseases can be classified, taking into account clinical manifestations, etymology, immunopathology or morphological alterations. Taking into account the morphological alterations, the glomerular lesion presents the so-called glomerulosclerosis, which is characterized by presenting the glomerulus with different degrees of sclerosis depending on whether it affects the glomerulus globally or partially [21]. A sample is shown in Fig. 1 where global diffuse glomerulosclerosis is illustrated with almost all glomeruli fully sclerosed. In daily practice, each renal biopsy should undergo a quantification of the total number of glomeruli found in each cut. About 20 to 30 cuts are performed per renal biopsy. In addition, it is necessary to indicate those glomeruli that are globally sclerosed (the ∗

Corresponding author. E-mail address: [email protected] (G. Bueno).

https://doi.org/10.1016/j.cmpb.2019.105273 0169-2607/© 2019 Elsevier B.V. All rights reserved.

entire glomerulus). If focal sclerosis is also detected (not the entire glomerulus), it will give clues about the possible disease that the patient has. This should be reflected in each pathology report since the number of glomeruli evaluated must be representative enough to make a diagnosis. On the other hand, it will help to provide adequate treatment, that is, if the sample has many sclerosed glomeruli it means that there is chronic kidney damage, with dead glomeruli and, therefore, the patient is not suitable for certain treatments [18]. In addition, the nephrologist includes these data in the national glomerulonephritis registry. The glomerulus count is very tedious and time consuming. Therefore, image processing tools capable of accurately detecting and classifying the glomerulus are needed. Artificial intelligence provides tools to be explored in the problem mentioned. The new methodology called deep learning emerges firmly, introducing new horizons in pathology [25]. This paper presents a methodology based on deep learning and, more specifically, on semantic segmentation, for the detection and classification of glomerular conditions.

2

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

Fig. 1. Global diffuse glomerulosclerosis. Sclerosis is presented in all glomeruli (green rectangles) being also global in each glomerulus. Table 1 Previous works and performance metrics obtained. Author Glomeruli segmentation

Method

Classes

Performance Metrics

Hirohashi et al. [9] Kato et al. [11] Maree et al. [17] Temerinac-Ott et al. [28] Gagermayr et al. [5] Gallego et al. [6] Govind et al. [8] Bukowy et al. [3] Proposed Method

HOG + SVM S-HOG + SVM Ellipsoidal shape + Tree Classifier Mutual information+ CNN U-Net AlexNet + post pixel-wise analysis Butterworth band-pass filter Faster RCNN SegNet-VGG19

2 2 2 2 2 2 2 2 2

82.1% F1-Score 86.6% F1-Score 87% F1-Score 76% F1-Score 90% Dice Coefficient 93.7% F1-Score 84% Specificity85% Recall 96.94% Precision 96.79% Recall 99.88% Precision 99.24% F1-Score

Barros et al. [2] Marsh et al. [16] Kannan et al. [10]

LoG + kNN Fully CNN Inception v3

Proposed Method

SegNet-VGG19 + AlexNet

Glomeruli lesion classification 3 3 3

88.3% Accuracy 84.75% F1-Score 92.36% Accuracy

Glomeruli segmentation and classification 3

98.16% Accuracy 99.4% F1-Score

Table 2 Properties of the data base used in previous works. Author Glomeruli segmentation

BBDD Properties

Hirohashi et al. [9] Kato et al. [11] Maree et al. [17] Temerinac-Ott et al. [28] Gagermayr et al. [5] Gallego et al. [6] Govind et al. [8]

1 WSI, Aperio Scan Scope XT diaminobenzidine-based and hematoxylin 20 WSIs, Aperio Scan Scope XT 3,3-diaminobenzidine 200 WSI, Hamamatsu NanoZoomer, 20x Trichrome Masson 20 WSIs, Aperio AT2, 20x, Jones H&E, PAS, Sirius Red, CD1O 24 WSIs, Hamamatsu NanoZoomer 2.0HT, 20x, PAS 108 WSIs, (98 training, 10 test) Aperio ScanScope XT, 20x, PAS 106 images, immunofluorescence and auto-fluorescence

Barros et al. [2] Marsh et al. [16] Kannan et al. [10]

811 images Nikon E600, H&E, PAS 48 WSIs, Aperio Scanscope CS, 20x, H&E 275 images, Nikon Eclipse TE-2000, 40x, Trichrome

Glomeruli lesion classification

Glomeruli segmentation and classification Proposed Method

47 WSIs, Aperio ScanScope CS 20x, PAS

Semantic segmentation networks have already been used in histopathological images [14,30] but not in the segmentation of glomerulus. Previous studies based on handcrafted featureextraction techniques obtained a segmentation accuracy of up to 87% and classification accuracy of 88%. These values increased by up to 96% and 92% on different datasets and stains using deep learning techniques with convolutional neural network (CNN). Tables 1 and 2 show all previous methods, the results obtained and the datasets used to segment and classify glomeruli into two (nonglomerular and glomerular) and three (non-glomerular, glomerular, sclerosed) classes. The proposed method is also included in the tables for comparison.

The purpose of this paper is to twofold: (a) perform a glomerular segmentation and (b) classify between normal and sclerosed glomeruli. The classification was done taking advantage of our previous studies [6,19,27]. In that respect, two powerful semantic segmentation methodologies, SegNet and U-Net have been compared for pixel-wise segmentation. U-Net has already been employed for medical image segmentation even in glomerular segmentation tasks as shown in Table 1. However, this is the first time that the SegNet architecture is applied to glomerular segmentation presenting a novel strategy for this task. Considering both architectures, two different experiments were carried out:

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

3

Fig. 2. Glomerular structures in nephropathology images stained with PAS. (a) Non-glomerular structures, (b) Normal glomeruli and (c) Sclerosed glomeruli.

1. Three class semantic segmentation. That is a comparison of SegNet and U-Net methodologies for segmentation between non-glomerular structures, normal glomeruli and complete sclerosed glomeruli. 2. Two class semantic segmentation follows by classification. Based on [6], a classification CNN is proposed where glomerular regions obtained by a semantic segmentation are divided into normal and complete sclerosed glomeruli. The remainder of this paper is organized as follows: Section 2 describes and detailed the database, theoretical methods and approach proposed. Results and Conclusions are described in Sections 3 and 4. 2. Materials 2.1. AIDPATH kidney database The digital tissue images used in this work were obtained from the AIDPATH Kidney Database (see Acknowledgements). This dataset is composed of 5 different datasets of WSI of human kidney tissue cohorts acquired and digitalized from three European institutions: Castilla-La Mancha’s Healthcare services (Spain), The Andalusian Health Service (Spain) and The Vilnius University Hospital Santaros Klinikos (Lithuania). Tissue samples were collected with a biopsy needle having an outer diameter between 100 μm and 300 μm. Afterwards, paraffin blocks were prepared using tissue sections of 4μm and stained using PAS. PAS stain is commonly used due to its efficiency dyeing polysaccharides, which are present in kidney tissue and in highlighting glomerular basement membranes [22]. Digital WSI acquisition was performed with the Leica Aperio ScanScope CS scanner and extracted into an SVS file format. As a result, a dataset of 47 kidney WSIs was obtained. Images at 20x magnification were selected since this magnification maintain image quality and information at the same as allows to obtain valuable results reducing computational time significantly. Smaller resolutions have the disadvantage of loss image

quality and therefore information of the glomerulus. On the other hand, magnifications like 40x imply higher image size increasing the model size and slowing down the training.

2.2. Kidney database processing Once WSIs at 20x magnification were collected, they were split into 20 0 0x20 0 0 pixel patches selecting only those which contained tissue. This set of patches was examined and labeled into three classes (see Fig. 2): (i) Non-Glomerular structures: kidney tissue structures such as proximal and distal tubules, blood vessels, connective tissue stroma or inflammatory cells; (ii) Normal Glomeruli characterized by thin glomerular capillary loops, a regular number of endothelial and mesangial cells. The aspect of glomerulus surrounding tubules is normal and iii) Sclerosed Glomeruli where the whole (or nearly the whole) glomerulus presents sclerosis. As a result of the previous steps, a dataset with a total of 1055 kidney tissue images was finally obtained. Glomeruli contours were annotated, generating a mask for each image. 1245 glomerular structures were annotated, 303 of these were sclerosed glomeruli and the remaining 942 were normal glomeruli. CNN architectures typically require large datasets of images to obtain valuable results. For that reason, a data augmentation process was performed to increase the number of samples. Color normalization is one of the most common data augmentation methods used in digital pathology. Although immunohistochemical processes use the same staining marker, some color variations can appear in the tissue. It mainly depends on the commercial provider but it directly affects image analysis. Color normalization methods overcome this issue by applying a color transfer between images. Reinhard’s method (RM) [20] was selected for color normalization. To support this decision, we focus on the study performed in [4], where four different methods used for color standardization: histogram matching (HM) [26], Macenko’s method (MM) [15], RM and non-linear spline mapping method (SM) [12]. Color transfer was

4

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

Fig. 3. SegNet-VGG16 architecture. Encoder-Decoder network constituted by 26 convolutional layers, 13 in the encoder and 13 in the decoder. Decoder inputs depend on pool indices from the encoder.

applied with 5 different references therefore extending the dataset to 5275 images. Another technique widely used for data augmentation is to compute minor affine transformations on the images such as flips, mirroring, translations and rotations. Therefore, together with the RM, rotations of 90◦ and 270◦ , as well as vertical flip was performed. Finally, considering these image transformations the dataset was composed of 25,320 images. 3. Methods Semantic segmentation is a deep learning methodology that has emerged relatively recently. This segmentation is also called dense prediction and is mainly characterized by classifying pixel labels but not objects in the image. For this purpose, a one-hot encoding is performed on the input labeled images creating a distinctive image for each class [7]. The native representation of semantic segmentation was generated using convolutional layers (preserving the same padding and therefore the same size through the network layers) and ending with a segmentation map as output. It is well-known that the first layers of a CNN learn low-level features of the input image while final layers learn high-level concepts. Therefore, many CNNs perform downsampling operations such as pooling and strode convolutions to reduce computational effort. However, the final target of semantic segmentation is to produce a full-resolution labeled image related in some way to the input image. Thus, the semantic segmentation process is generally considered an encoder-decoder approach where the encoder learns to distinguish classes by performing downsampling operations and the decoder is responsible for reversing that with upsampling operations obtaining finally a full-resolution segmentation map. Depending on the selected strategy, downsampling and upsampling operations apply different methodologies. This paper is focused on two different semantic segmentation methodologies depending on the CNN selected to build the internal network structure. On the one hand, an encoder-decoder methodology was built using a SegNet (VGG16 and VGG19) architecture [1], also called Segnet-VGG16 and Segnet-VGG19 in this work. On the other hand, we used U-Net architecture [23] using 4 (as per the original U-Net implementation) and 5 encoders respectively. In this work, they are called U-Net and U-Net-5 in order to distinguish their designs. A brief description of both architectures is provided below.

Encoder-Decoder SegNet architecture [1]: • •



• •

Input layer 5 convolutional blocks for the encoder constituted by several 3x3 convolution layers, batch normalization layers, ReLU layers and 2x2 max-pooling layers (stride=2). 5 convolutional blocks for the decoder composed of maxunpooling layers and several 3x3 convolution layers, batch normalization layers and ReLU layers. Softmax layer which calculate pixel-wise classification scores Output layer with the prediction pixel map

The principal difference between SegNet-VGG16 and SegNetVGG19 lies in the number of convolutional layers for each convolutional block. Convolutional blocks based on SegNet-VGG16 are composed of two or three convolutional layers depending on the block depth while SegNet-VGG19 has convolutional blocks composed of two or four convolutional layers. Fig. 3 shows the SegNet architecture based on a VGG16 network. Encoder-Decoder U-Net architecture [23]: • •



• • •

Input layer 4 convolutional blocks for the encoder consisting of two 3x3 convolution layers, ReLU layers, 2x2 max-pooling layers (stride=2) and two dropout layers to prevent overfitting. 4 convolutional blocks for the decoder with upconvolution (transposed convolution) layers, upReLU layers and depth concatenation layers followed by two 3x3 convolution layers and ReLU layers. 1x1 convolution layer to map the 64 feature map extracted Softmax layer which calculates pixel-wise classification scores Output layer with the prediction pixel map

U-Net-5 has the same structure than U-Net adding a convolutional block in both the encoder and the decoder. Fig. 4 illustrates the original (4 encoders) U-Net architecture. The main difference between both architectures (SegNet and UNet) is found in the decoder upsampling operations. SegNet applies a 2x2 max-unpooling operation where previous corresponding max-pooling indices of each maximum activation value are used to perform the max-unpooling [29]. In the U-Net architecture, the decoder is characterized by a upconvolution layer also called transposed convolution, fractionally-strided convolution or deconvolution. This layer uses previous convolutional kernels to perform the inverse operation of convolution. It is followed by a upReLU

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

5

Fig. 4. U-Net architecture. 23 convolutional layers distributed in a encoder-decoder methodology. The encoder contains 10 convolutional layers and the decoder combines 4 up-convolutional and 9 convolutional layers.

Fig. 5. Detailed workflow process for the three-class segmentation: non glomerulus, normal glomerulus and sclerosed glomerulus.

layer and a depth concatenation layer causing the concatenated feature map. As these kinds of networks perform a pixel-level classification, they have a great advantage with respect to other networks. 3.1. Semantic segmentation for glomerular lesion detection: SegNet vs U-Net As it was previously mentioned, SegNet and U-Net architectures are applied to perform a pixel-wise segmentation categorizing non-glomerular structures, normal and sclerosed glomeruli (three classes). The workflow followed by this approach is shown in Fig. 5. Images at 20x magnification are resized to 40 0x40 0 pixels in order to adjust image size reducing network computation without losing precision. This resampling is performed applying a bicubic interpolation and performing antialiasing. 75% of the total image patches are employed for training. Thus, a total of 18990 images are trained to generate the pixel-wise segmentation model. SegNet is trained by applying a stochastic gradient descent optimization algorithm with a momentum of 0.9 to accelerate gradient

vectors, a value of 1e−4 for L2 regularization method and an initial learning rate of 0.1 and 0.05, respectively. The step decay schedule drops the learning rate with a factor of 0.1 every 2 epochs. A minibatch size of 4 and 2 epochs is used to generate a total of 9801 iterations for training. For the U-Net methodology, network parameters are established using the same values as in the SegNet methodology. The only difference is found in the number of selected epochs which are increased to 4. Training involves a total of 19602 iterations. With both methodologies, we apply a transfer learning strategy using a pretrained model from the ImageNet database [24]. 3.2. Consecutive CNNs for segmentation and classification This approach proposes the use of consecutive CNNs for segmentation and classification. Thus, firstly a semantic segmentation network is used to detect only glomerular structures (two classes) and then true positive glomerular structures obtained are employed to train an AlexNet network [13] in order to classify them into normal or sclerosed glomeruli. AlexNet is a well-known CNN architecture that has been used in several classification tasks and

6

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

Fig. 6. AlexNet architecture.

Fig. 7. Detailed workflow for the sequential segmentation and classification process using SegNet and AlexNet CNNs. True positive results (glomerular structures) from SegNet-VGG19 segmentation are used to train an AlexNet network in order to classify normal and sclerosed glomeruli.

has been selected due to its competitive accuracy/computational time ratio. This architecture mainly includes 3 convolutional blocks followed by fully-connected layers and finally, a softmax layer, see Fig. 6. Convolutional blocks are composed by convolutional, ReLU and normalization layers followed by a max-pooling downsampling layer. Moreover, after some fully-connected layers, a dropout layer is applied in order to deactivate randomly units. The softmax layer returns prediction scores that finally are identified as the prediction class in the output layer. SegNet and U-Net methodologies are used for semantic segmentation. The best model obtained for this first step, in this case, the model obtained by the SegNet-19, is used to train the AlexNet network. Fig. 7 illustrates the methodology followed by this approach for the detection and classification of glomerular structures in kidney WSIs. Therefore, the aim is to evaluate which approach achieves better accuracy: a) a semantic segmentation for normal and sclerosed glomeruli or b) a consecutive CNN which performs a segmentation into non-glomerular and glomerular structures follows by an AlexNet network to classify glomerular structures into sclerosed or normal glomeruli. For glomerular segmentation, we used the same parameters as in the three-class segmentation except for the SegNet epochs which are incremented taking a value of 3 generating a total of 14700 iterations for training. In the case of AlexNet training, we used a stochastic gradient descent optimization algorithm with a

momentum of 0.9, a value of 1e−4 for L2 regularization method and an initial learning rate of 1e−5. As in previous networks, a step decay schedule drops the learning rate by a factor 0.1 every 2 epochs. We selected a mini batch size of 40 and 60 epochs with a total of 3120 iterations. A dataset of 2340 glomeruli images was used to train the network. 3.3. Validation metrics An appropriate interpretation is essential when a diagnosis is produced. In a detection process, there are two possible results, positive and negative. However, some errors cause that positive cases might be classified as negative and vice-versa. These cases are commonly denominated false positives and false negatives, respectively. Thus, these four possible results, that is, true positive (TP), true negative (TN), false positive (FP) and false negative (FN) must be considered for interpretation. Based on these values, the performance metrics showed in Table 3 have been calculated. 4. Results 4.1. Performance of semantic segmentation Semantic segmentation applying SegNet and U-Net methodologies were performed categorizing kidney tissue into

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273 Table 5 Accuracy obtained for each class in the three-class semantic segmentation.

Table 3 Performance metrics applied. Metric Global accuracy(GlobalACC) Error Specificity Recall (Sensitivity) Precision F1-Score Matthews correlation coefficient (MCC) Cohen’s Kappa coefficient (Kappa) Mean IoU (Intersection over Union) Mean BFS (Boundary F1-Score)

7

Equation

Non-Glomerulus / Sclerosed Glomerulus / Normal Glomerulus

TP + TN TP + FP + FN + TN

Method/ ClassAcc

SegNet-VGG16

SegNet-VGG19

U-Net

U-Net-5

Non-Glomerulus Normal Glomerulus Sclerosed Glomerulus

94.69% 98.05% 84.84%

96.86% 96.06% 83.22%

94.18% 96.6% 87.41%

92.68% 96.46% 83.41%

1 − ACC TN TN + FP TP TP + FN TP TP + FP Precision ∗ Recall 2∗ Precision + Recall (T P∗T N )−(F P∗T N )) √

Table 6 Metrics calculated for glomeruli segmentation from the two-class semantic segmentation. Non-Glomerulus / Glomerulus

(T P +F P )∗(T P +F N )∗(T N+F P )∗(T N+F N )))

(predicted acc − expected acc )/(1 − expected acc )

FP FN ∗ T P+FTPP+ Where: pe positive = T P+FTPP+ +F N+T N +F N+T N TN TN ∗ T P+FF PP+ pe negative = T P+FF N+ P +F N+T N +F N+T N T Pi F Pi + F Ni + T Pi

This metric computes the F1-measure from precision and recall values considering a distance error tolerance θ over the boundary pixels.

Table 4 Metrics calculated for glomerular lesion detection from the three-class segmentation problem. Non-Glomerulus / Normal Glomerulus / Sclerosed Glomerulus Method/Measure

SegNet-VGG16

SegNet-VGG19

U-Net

U-Net-5

GlobalACC Error Specificity Recall Precision F1-Score MCC Kappa MeanIoU MeanBFS

94.77% 5.25% 97.66% 92.53% 66.8% 75.61% 68.46% 88.23% 63.78% 46.66%

96.67% 3.33% 97.44% 92.05% 74.99% 81.91% 75.69% 92.51% 71.37% 56.02%

94.24% 5.76% 97.31% 92.73% 63.93% 72.86% 65.72% 87.03% 61.07% 44.20%

92.79% 7.20% 96.61% 90.85% 60.85% 69.69% 61.39% 83.78% 57.44% 33.00%

non-glomerular structures, normal or sclerosed glomeruli (three classes). Testing was accomplished using 25% of the original dataset, taking a total of 6330 image patches to carry out pixel predictions. Table 4 shows results obtained. Summarizing, SegNet and U-Net CNNs demonstrate valuable results segmenting and detecting glomerular lesions of kidney WSI. Regarding metrics, the global accuracy is above 92%, which indicates the suitability of this CNN for detecting glomeruli in WSI. However, precision drops, indicating that false positive cases increase when normal and sclerosed glomeruli are distinguished. Analyzing other region measures such as MeanIoU and MeanBFS, results are not so satisfactory as expected. MeanIoU indicates overlapping between predicted and expected regions as low values imply an over or under segmentation. MeanBFS measures boundary matching between predicted and expected regions, so differences in contours lead to lower values. A possible explanation for these results will be provided later in the manuscript. Focusing on the accuracy of the classes (see Table 5), this approach obtained valuable results achieving approximately 94% and 96% accuracy for non-glomerular structures and normal glomeruli and more than 83% for sclerosed glomeruli. According to the results, SegNet methodologies seem to take advantage of nonglomeruli and normal glomeruli classification and original U-Net version for sclerosed glomeruli classification. Fig. 8 shows some visual results for the methodologies adopted overlapping segmentation results on the original WSI patches. Also, Fig. 8 performs a comparison of the same WSI patch segmented

Method/Measure

SegNet-VGG16

SegNet-VGG19

U-Net

U-Net-5

GlobalACC Error Specificity Recall Precision F1-Score MCC Kappa MeanIoU MeanBFS

98.58% 1.42% 97.96% 98.62% 99.86% 99.23% 89.47% 89.12% 90.01% 85.51%

98.58% 1.42% 98.32% 98.60% 99.88% 99.24% 89.55% 89.17% 97.41% 85.30%

96.73% 3.27% 96.50% 96.75% 99.75% 98.23% 79.01% 77.50% 81.05% 65.23%

94.72% 5.28% 98.18% 94.48% 99.87% 97.10% 71.37% 67.94% 74.45% 57.15%

by the different methodologies. In this case, sclerosed glomeruli are highlighted in purple and normal glomeruli in green. U-Net methodologies show some cases where both glomeruli classes are scrambled, as shown in the last sample. Data features and the detection process cause some errors in the results which can be visually identified. Segmented results illustrate four errors mainly related to: i False positive and negative pixels located in external areas of glomeruli boundary. These cases are present with all networks. Although this case is considered as a pixel classification error, it arises mainly from the labeled process. That is the reason for low mean BFS values because this error directly affects to this measure. ii Kidney tissue structures confused with glomeruli. Some structures are misclassified due to their similarity with glomerular structures. iii Some focal segmental glomerulosclerosis cases were classified as both, normal and sclerosed. iv False positive pixels generated by U-Net methodologies. U-Net methodologies also create false positives distributed throughout the image, especially false positives related to non-glomerular regions. Except for the first problem, these errors have been detected sporadically and they do not affect overall performance significantly. However, an analysis is needed in order to adjust parameters for the semantic segmentation training. An example of these errors for each methodology is shown in Fig. 9 where magenta, green, white and black colours represent false positives, false negatives, true positives (normal and sclerosed glomeruli) and true negatives (non-glomerular structures). 4.2. Performance of consecutive CNNs In the case of the sequential segmentation and classification process, semantic segmentation is firstly applied for glomerular segmentation. As in the three-class approach, 6534 image patches were used for testing. Table 6 shows the results obtained with SegNet and U-Net methodologies. The mean accuracy is above 94% for all methodologies achieving a 98.58% for the SegNet architecture. Specificity, precision and mean IoU improve for SegNet-VGG19 indicating a better segmentation with respect to the SegNet-VGG16.

8

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

Fig. 8. Segmentation results obtained from the three-class semantic segmentation. Normal and sclerosed glomeruli are highlighted in green and purple respectively.

Table 7 Accuracy obtained for glomeruli segmentation in the two-class semantic segmentation. bfNon-Glomerulus / Glomerulus Method/ ClassAcc

SegNet-VGG16

SegNet-VGG19

U-Net

U-Net-5

Non-Glomerulus Glomerulus

98.62% 97.96%

98.60% 98.32%

96.75% 96.50%

94.48% 98.18%

Regarding accuracy per class, see Table 7, relevant segmentation

percentages were obtained for glomerular segmentation where values greater than 94% and 96% were obtained for non-glomerular and glomerular structures segmentation respectively. The best segmentation accuracy per class is obtained by the SegNet-VGG19 with values greater than 98% for both classes. Fig. 10 shows segmentation results obtained where glomeruli are highlighted in cyan. Once the semantic segmentation using the SegNet-VGG19 has detected non-glomerular and glomerular structures, true positive regions are classified into sclerosed or normal glomeruli. Adopting an AlexNet network for this purpose, a 99.57% accuracy over true positive regions was achieved, significantly im-

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

9

Fig. 9. Type errors in the segmentation process identified with the code labels: FP (magenta), FN (white). Each error sample has been exposed to a different semantic segmentation network. (a) False positive and negative pixels located in external areas of glomeruli. (b) Kidney tissue structures confused with glomeruli, in this case, a dilated tubule. (c) Focal segmental glomerulosclerosis. d) False positive pixels distributed throughout the image.

Fig. 10. Segmentation results obtained from the SegNet-VGG19 where glomerular structures are highlighted in cyan.

Table 8 Metrics calculated from the AlexNet classification. Results show a perfect classification for sclerosed glomeruli and few normal misclassified glomeruli. Normal Glomerulus / Sclerosed Glomuerulus ACC 99.57

Error 0.43

Spec. 99.15

Recall 100

Prec. 100

F1-Score 99.57

MCC 99.15

Kappa 99.15

proving the glomerular classification process. Table 8 shows the measures calculated using the confusion matrix extracted for results. The average accuracy obtained with the sequential segmentation and classification process is 98.16%, with 98.58% accuracy for the semantic segmentation and 99.57% with the AlexNet classification. This result improves the previous result obtained with the three-class segmentation.

done in this study. Four different CNN networks were considered to build SegNet and U-Net methodologies, i.e., SegNet-VGG16, SegNet-VGG19, U-Net and U-Net-5. The best results were obtained with the use of consecutive CNNs for segmentation and classification. Thus, firstly we apply a SegNet-VGG19 to detect glomerular structures from non-glomerular follows by a classification wit AlexNet that discriminates between normal or sclerosed glomeruli the previously segmented glomerular structures. Results obtained with this two-class procedure was 98.16%, versus 96.67% obtained with the SegNet-VGG19 for the three-class segmentation problem. This is mainly because having previously segmented the glomerular structures the AlexNet classification reduces predictive confusions between normal and sclerosed glomeruli. Therefore, it is shown that the sequential CNN segmentation-classification strategy entails a relevant solution for glomerulosclerosis detection and for the distinction between global scloretic glomeruli and normal glomeruli.

5. Conclusion

Declaration of Competing Interest

This paper has presented a methodology to perform a glomerulosclerosis detection by means of semantic segmentation. Several previous studies have applied deep learning methodologies for glomeruli detection but, as far as the authors know, none of them have been focused on semantic segmentation techniques and more specifically on SegNet network, which makes this paper a novel contribution in this field. Comparison of different approached defining a three-class and a two-class problem has been

Authors declare no conflict of interest of the manuscript entitled: ‘Glomerulosclerosis Detection in Whole Slide Images using Semantic Segmentation’. Acknowledgments The authors acknowledge financial support from the AIDPATH European project num. 612471, http://aidpath.eu.

10

G. Bueno, M.M. Fernandez-Carrobles and L. Gonzalez-Lopez et al. / Computer Methods and Programs in Biomedicine 184 (2020) 105273

Supplementary material Supplementary material associated with this article can be found, in the online version, at 10.1016/j.cmpb.2019.105273. References [1] V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: a deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (12) (2017) 2481–2495, doi:10.1109/TPAMI.2016.2644615. [2] G. Barros, B. Navarro, A. Duarte, W. dos Santos, Pathospotter-k: a computational tool for the automatic identification of glomerular lesions in histological images of kidneys, Scientific Reports, 46769, 2017, doi:10.1038/srep46769. [3] J. Bukowy, A. Dayton, D. Cloutier, A. Manis, A.S. et al., Region-based convolutional neural nets for localization of glomeruli in trichrome-stained whole kidney sections, J. Am. Soc. Nephrol. 29 (8) (2018) 2081–2088, doi:10.1681/ASN. 2017111210. [4] M. Fernández-Carrobles, G. Bueno, M. García-Rojo, L. González-López, C. López, O. Déniz, Automatic quantification of IHC stain in breast TMA using colour analysis, Comput. Med. Imaging Gr. 61 (2017) 14–27. [5] M. Gadermayr, A. Dombrowski, B. Klinkhammer, P. Boor, D. Merhof, CNN Cascades for segmenting whole slide images of the kidney, CoRR abs/1708.00251 (2017). [6] J. Gallego, A. Pedraza, S. Lopez, G. Steiner, L. Gonzalez, A. Laurinavicius, G. Bueno, Glomerulus classification and detection based on convolutional neural networks, J. Imaging 4 (1) (2018) 20, doi:10.3390/jimaging4010020. [7] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, P. MartinezGonzalez, J. Garcia-Rodriguez, A survey on deep learning techniques for image and video semantic segmentation, Appl. Soft Comput. 70 (2018) 41–65, doi:10. 1016/j.asoc.2018.05.018. [8] D. Govind, B. Ginley, B. Lutnick, J. Tomaszewski, S. P., Glomerular detection and segmentation from multimodal microscopy images using a butterworth bandpass filter, Proc. SPIE, Med. Imaging Digit. Pathol. 10581 (2018), doi:10.1117/12. 2295446. [9] Y. Hirohashi, R. Relator, T. Kakimoto, R. Saito, Y. Horai, A. Fukunari, T. Kato, Automated quantitative image analysis of glomerular Desmin immunostaining as a sensitive injury marker in spontaneously diabetic torii rats, J. Biomed. Image Process. 1 (2014) 20–28. [10] S. Kannan, L. Morgan, B. Liang, M. Cheung, C.L. et al., Segmentation of glomeruli within trichrome images using deep learning, Kidney Int. Rep. 4 (7) (2019) 955–962, doi:10.1016/j.ekir.2019.04.008. [11] T. Kato, R. Relator, H. Ngouv, Y. Hirohashi, O. Takaki, T. Kakimoto, K. Okada, Segmental hog: new descriptor for glomerulus detection in kidney microscopy image, BMC Bioinform. 16 (316) (2015) 1–16. [12] A. Khan, N. Rajpoot, D. Treanor, D. Magee, A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution, IEEE Trans. Biomed. Eng. 61 (6) (2014) 1729–1738. [13] A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, in: F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems, 25, Curran Associates, Inc., 2012, pp. 1097–1105. [14] J. Li, K. Sarma, K. Ho, A. Gertych, B. Knudsen, W. Arnold, A multi-scale u-net for semantic segmentation of histological images from radical prostatectomies, in: Proceedings of the Annual Symposium Proceedings AMIA Symposium, 2017, 2018, pp. 1140–1148.

[15] M. Macenko, M. Niethammer, J. Marron, D. Borland, J. Woosley, X. Guan, C. Schmitt, N. Thomas, A method for normalizing histology slides for quantitative analysis, in: Proceedings of the IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2009. ISBI’09, 2009, pp. 1107–1110. [16] J. Marsh, M. Matlock, S. Kudose, T. Liu, T. Stappenbeck, J. Gaut, S. Swamidass, Deep learning global glomerulosclerosis in transplant kidney frozen sections, bioRxiv (2018), doi:10.1101/292789. [17] R. Marée, S. Dallongeville, J. Olivo-Marin, V. Meas-Yedid, An approach for detection of glomeruli in multisite digital pathology, in: Proceedings of the IEEE Thirteenth International Symposium on Biomedical Imaging (ISBI), 2016, pp. 1033–1036, doi:10.1109/ISBI.2016.7493442. [18] R. Niznik, C. Lopez, W. Kremers, A. D., S. Sethi, M. Stegall, J. Augustine, A. Rule, Global glomerulosclerosis in kidney biopsies with differing amounts of cortex: a clinical-pathologic correlation study, Kidney Med. 1 (4) (2019) 153–161, doi:10.1016/j.xkme.2019.05.004. [19] A. Pedraza, J. Gallego, S. Lopez, L. Gonzalez, A. Laurinavicius, G. Bueno, Glomerulus classification with convolutional neural networks, in: Proceedings of the Conference on Medical Image Understanding and Analysis. MIUA17, 2017, pp. 839–849, doi:10.1007/978- 3- 319- 60964- 5_73. [20] E. Reinhard, M. Adhikhmin, B. Gooch, P. Shirley, Color transfer between images, IEEE Comput. Gr. Appl. 21 (5) (2001) 34–41. [21] R. Risdon, D. Turner, Atlas of Renal Pathology, Current histopathology, Lippincott, 1980. [22] R. Robinson, A. Barakat, Renal Disease in Children: Clinical Evaluation and Diagnosis, Springer New York, 2012. [23] O. Ronneberger, P. Fischer, T. Brox, U-Net: convolutional networks for biomedical image segmentation, CoRR abs/1505.04597 (2015). [24] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, L. Fei-Fei, ImageNet large scale visual recognition challenge, Int. J. Comput. Vis. (IJCV) 115 (3) (2015) 211–252, doi:10.1007/s11263-015-0816-y. [25] S. Sarwar, A. Dent, K. Faust, M. Richer, U. Djuric, R. Van Ommeren, P. Diamandis, Physician perspectives on integration of artificial intelligence into diagnostic pathology, NPJ Digit. Med. 2 (28) (2019). [26] D. Shapira, S. Avidan, Y. Hel-Or, Multiple histogram matching, in: Proceedings of the IEEE International Conference on Image Processing, 2013, pp. 2269–2273. [27] Z. Swiderska-Chadaj, T. Markiewicz, J. Gallego, G. Bueno, B. Grala, Deep learning for damaged tissue detection and segmentation in Ki-67 brain tumor specimens based on the U-net model, Bull. Polish Acad. Sci. 66 (6) (2018) 849–856. [28] M. Temerinac-Ott, G. Forestier, J. Schmitz, M. Hermsen, J.H. Bräsen, F. Feuerhake, C. Wemmert, Detection of glomeruli in renal pathology by mutual comparison of multiple staining modalities, in: Proceedings of the Tenth International Symposium on Image and Signal Processing and Analysis, 2017, pp. 19– 24, doi:10.1109/ISPA.2017.8073562. [29] V. Turchenko, E. Chalmers, A. Luczak, A deep convolutional auto-encoder with pooling - unpooling layers in caffe, CoRR abs/1701.04949 (2017). [30] J. Wang, J. MacKenzie, R. Ramachandran, D. Chen, A deep learning approach for semantic segmentation in histology tissue images, in: Proceedings of the Medical Image Computing and Computer-Assisted Intervention. MICCAI, Springer International Publishing, 2016, pp. 176–184.