Effective segmentation and classification of thyroid histopathology images

Effective segmentation and classification of thyroid histopathology images

G Model ARTICLE IN PRESS ASOC 3489 1–13 Applied Soft Computing xxx (2016) xxx–xxx Contents lists available at ScienceDirect Applied Soft Computin...

2MB Sizes 2 Downloads 79 Views

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

Applied Soft Computing xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

Effective segmentation and classification of thyroid histopathology images

1

2

3

Q1

J. Angel Arul Jothi, V. Mary Anita Rajam ∗ Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India

4 5

a r t i c l e

6 26

i n f o

a b s t r a c t

7

Article history: Received 15 June 2015 Received in revised form 18 December 2015 Accepted 18 February 2016 Available online xxx

8 9 10 11 12 13 14

25

Keywords: Computer Aided Diagnosis (CAD) Multilevel thresholding Segmentation Feature extraction Feature reduction Rule generation Classification Particle Swarm Optimization (PSO) Variable Precision Rough Sets (VPRS) Histopathology image

27

1. Introduction

15 16 17 18 19 20 21 22 23 24

28Q3 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

Q2

This paper proposes a Computer Aided Diagnosis (CAD) system that semi-automatically segments and classifies H&E-stained thyroid histopathology images into two classes: Normal Thyroid (NT) or Papillary Thyroid Carcinoma (PTC) based on nuclear texture features. Our system segments the given histopathology image into different binary images using Particle Swarm Optimization (PSO)-based Otsu’s multilevel thresholding. From the segmented binary images, a binary image containing the nuclei is chosen manually. Nuclei are extracted from the manually selected binary image by imposing an area constraint and a roundness constraint. The intensity variations of pixels within the nuclei are quantified by extracting texture features. Variable Precision Rough Sets (VPRS)-based ˇ-reduct is used to identify redundant features and generate rules. The rules are then stored in a rule base. A novel closest-matching-rule (CMR) algorithm is proposed to classify a new test sample as PTC or NT using the rules in the rule base. We verified experimentally that the proposed CAD system provides promising results and it is supposed to assist pathologists in their decisions. © 2016 Elsevier B.V. All rights reserved.

Histopathology is a branch of pathology where biopsy samples of tissues are examined under a microscope by pathologists for the diagnosis of cancer. Specifically, a histopathology study is essential to diagnose different subtypes of a particular type of cancer. The tissue biopsy samples procured are processed, stained and fixed onto glass slides before being examined by a pathologist under a microscope. Papillary Thyroid Carcinoma (PTC) is the most common histological subtype of thyroid cancer. It accounts for eighty percentage of the total thyroid cancers diagnosed [1–4]. Unlike other histological subtypes of thyroid cancer, an early diagnosis of PTC provides better treatment planning and patient prognosis [4,5]. The diagnosis of PTC is based on a histopathology study that relies on nuclear features [4]. The PTC histopathology images contain orphan annie-eye nuclei. These nuclei are enlarged in size and may contain nuclear grooves and nuclear pseudo-inclusions [1,2,4,5]. Nuclear grooves result from irregularity of nuclear

∗ Corresponding author. Tel.: +91 9840700071. E-mail addresses: [email protected] (J. Angel Arul Jothi), [email protected] (V. Mary Anita Rajam).

contour [2]. The pseudo-inclusions in the orphan annie-eye nucleus are due to the accumulation of cytoplasm in prominent nuclear grooves. The centres of orphan annie-eye nuclei lack chromatin, while a thin ring of chromatin is present at their periphery. Contrarily, the nuclei in Normal Thyroid (NT) histopathology images are smaller in size and have dense chromatin. The presence of orphan annie-eye nuclei is one of the essential nuclear features for the diagnosis of PTC. This work is motivated by the following reasons: PTC incidence has increased in recent years [6,7]. Pathologists examine the slides under a microscope to provide diagnosis. This is a highly laborious and time-consuming task for the pathologists that would result in intra-observer and inter-observer variations. Additionally, the accuracy of the diagnosis depends upon the experience of the pathologist. An automated diagnosis system can assist the pathologist in the routine clinical examination of slides. With recent advancements in digital pathology (enables whole slide images to be acquired, stored and transmitted electronically), there is a need for developing Computer Aided Diagnosis (CAD) systems for histopathology images. CAD of medical images involves computers to process and interpret images using algorithms from image processing and pattern recognition. The CAD systems can act as a second opinion and help the doctors in diagnosis.

http://dx.doi.org/10.1016/j.asoc.2016.02.030 1568-4946/© 2016 Elsevier B.V. All rights reserved.

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

G Model ASOC 3489 1–13 2

ARTICLE IN PRESS J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

Fig. 1. Sample images from the dataset. (a) Normal Thyroid (NT) histopathology image. (b) Papillary Thyroid Carcinoma (PTC) histopathology image.

68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84

The biological and staining properties of the nuclei present in NT and PTC images vary and hence this fact has been exploited to develop a CAD system. The centre of the orphan annie-eye nucleus does not stain owing to the lack of chromatin, but its periphery stains faintly in blue due to the thin ring of chromatin [8]. A nucleus in NT histopathology image has dense chromatin and this stains deep blue. Fig. 1 shows sample NT and PTC histopathology images from the dataset respectively. The images in Fig. 1 are cropped for better visibility. This paper proposes a Computer Aided Diagnosis (CAD) system to classify thyroid H&E-stained histopathology images into two classes: NT or PTC. It consists of the following stages: pre-processing, segmentation, Region of Interest (ROI) extraction, feature extraction, feature reduction, rule generation and classification. Fig. 2 shows the steps involved in the proposed system. Initially, the digital images of the biopsy samples are acquired using

a microscope and a camera. The acquired input images in RGB colour space are given as input to the system. The pre-processing stage converts the input image into gray scale, removes noise by median filtering and enhances the contrast of the image using Contrast Limited Adaptive Histogram Equalization (CLAHE). The segmentation stage uses PSO based Otsu’s multilevel thresholding to segment/partition the image into different binary images containing different regions. The binary image containing the nuclei is selected manually and is given to the ROI extraction stage. The ROI extraction stage extracts the nuclei from the image eliminating unwanted artefacts. The feature extraction stage extracts the Gray Level Co-occurrence Matrix (GLCM) texture features from the image containing the nuclei. The feature reduction stage uses VPRS based ˇ-reduct to identify the redundant features and removes them from the list of extracted features. The rule generation stage generates decision rules and stores them in the rule base. The classification stage uses the closest-matching-rule (CMR) algorithm to classify unseen test samples. The test samples are generated from query images. The query images are subjected through all stages of the CAD system such as pre-processing, segmentation and feature extraction and are finally classified as either PTC or NT. The rest of this paper is organized as follows. We first detail the related work in Section 2. Section 3 provides the background about the techniques used in this work. Section 4 details the various stages in the decision system. The dataset, parameter tuning, metrics used for evaluation of classifiers, experiments conducted and the results obtained are provided in Section 5. Finally, the conclusion is given in Section 6.

2. Related work This section details about the related works in the field of histopathology image analysis.

Fig. 2. Steps involved in the Computer Aided Diagnosis of thyroid H&E stained histopathology images.

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

113

114 115

G Model ASOC 3489 1–13

ARTICLE IN PRESS J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

116

117 118 119 120 121 122 123 124 125 126 127

128

2.1. Automated image analysis in histopathology Automated image analysis and knowledge discovery from biopsy images are revolutionizing the field of digital pathology. Several Computer Aided Diagnosis (CAD) systems are being developed that analyze histopathology images automatically, segment the region of interest (nuclei, gland, cell, lymphocytes and mitosis), classify images into histological subtypes, provide diagnosis and determine cancer grades. CAD systems were developed for classifying different types of breast lesions [9], thyroid lesions [10–12]. CAD systems were developed for diagnosing Cervical Intraepithelial Neoplasia (CIN) [13] and for grading prostatic carcinoma [14–16], breast cancer [17–19] and hepatocellular carcinoma images [20]. 2.2. Automated image analysis in thyroid histopathology

160

In the recent past, there were several attempts to develop automated image analysis systems for thyroid histopathology images. A majority of these works aimed to distinguish between follicular thyroid lesions [10,11,21]. Other works were also done to differentiate between papillary and medullary carcinoma fine needle aspiration cytology (FNAC) images [22]. A multi-classifier system for discriminating between malignant and benign H&E stained (FNAC) thyroid images was developed [23]. When compared to the earlier methods for automated classification of thyroid histopathology images, our approach differs in the following ways: First, our proposed method utilizes PSO-based Otsu’s multilevel thresholding for segmentation and Variable Precision Rough Sets (VPRS) theory for feature reduction and rule generation. This is combined with a novel closest-matching-rule (CMR) algorithm to provide accurate results. The proposed method exhibits that the extracted texture features provide excellent discrimination capability to differentiate between the two classes of histopathology images used in this study. Second, some of the previous studies [10,11,21] that rely on nuclear morphometry used specialized Feulgen staining technique (stains only DNA) and the images were obtained at 100× oil immersion magnification. The proposed approach is advantageous from the clinical practice point since it uses the routine and simple H&E staining protocol and 40× optical magnification of the objective lens. Third, this work concentrates on images obtained from tissue biopsy samples as opposed to some of the previous approaches on FNAC images [23,22]. Fourth, the existing work concentrates on differentiating between follicular lesions of thyroid [10,11,21] or papillary and medullary thyroid carcinoma. The proposed work differentiates between PTC and NT.

161

3. Background

129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159

164

This section provides background material to thresholding, Particle Swarm Optimization (PSO), Rough Sets Theory (RST) and Variable Precision Rough Sets (VPRS).

165

3.1. Thresholding

162 163

166 167 168 169 170 171 172 173 174

Thresholding binarizes an image by partitioning its pixels into a pre-defined number of classes. There are two types of thresholding based on the number of classes. Bi-level thresholding partitions the pixels in an image into two classes, whereas multilevel thresholding segments/partitions the pixels in an image into more than two classes. Partitioning is done by selecting intensity thresholds from the image histogram. Consider a gray scale image I with L gray levels denoted as [1, . . ., L]. To partition the pixels in the image into two classes C1 and

3

C2 a single threshold at level t is required. The partitioning is done such that pixels with gray levels [1, . . ., t] are grouped under class C1 and pixels with gray levels [t + 1, . . ., L] are grouped under class C2 . Similarly, to partition the pixels into three classes, two thresholds at levels t1 and t2 are required; C1 contains pixels with gray levels [1, . . ., t1], C2 contains pixels with gray levels [t1 + 1, . . ., t2] and C3 contains pixels with gray levels [t2 + 1, . . ., L]. The number of thresholds required to partition the pixels in an image depends on the number of classes K. Generally, to segment an image into K classes, we need K − 1 thresholds denoted as t1 , . . ., tK−1 . These K − 1 thresholds are constituted into a threshold vector T. 3.1.1. Otsu’s thresholding Otsu’s method [24] is a non-parametric and an unsupervised approach that automatically selects the optimum intensity thresholds for segmenting an image. The method utilizes a discriminant criterion calculated from the gray level histogram of the image. The main objective of the discriminant criterion is to group pixels into classes providing maximum separation between classes. According to Otsu’s method, the probability distribution of an image is given by Eq. (1) pi =

ni N

(1)

where i represents an intensity level; i = 0, 1, 2, . . ., L − 1, ni denotes the total number of pixels at level i and N denotes the total number of pixels in the image. Otsu’s discriminant criterion for computing the optimum thresholds for multilevel thresholding is given by Eq. (2) B2 =

K 

ωk (k − I )2

(2)

175 176 177 178 179 180 181 182 183 184 185

186 187 188 189 190 191 192 193 194

195

196 197 198 199 200

201

k=1

where B2 denotes the between class variance, ωk represents the probability that a pixel is assigned to a class Ck and is given by Eq. (3), k represents the mean pixel intensity value of pixels assigned to a class Ck and is given by Eq. (4), I represents the mean pixel intensity of the entire image and is given by Eq. (5), pi denotes the probability of occurrence of the pixels at a level i and L is the maximum pixel intensity value of the image.

ωk =

⎧ tk  ⎪ ⎪ ⎪ pi , ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ tk ⎨ 

202 203 204 205 206 207 208

if k = 1

pi ,

if k > 1 and k < K

⎪ ⎪ i=tk−1 +1 ⎪ ⎪ ⎪ L ⎪  ⎪ ⎪ ⎪ pi , if k = K ⎪ ⎩

(3)

209

(4)

210

(5)

211

i=tk−1 +1

k =

⎧ t k  ⎪ i.pi ⎪ ⎪ , ⎪ ⎪ ⎪ ⎪ i=1 ωk ⎪ ⎪ tk ⎪ ⎨  i.p

i

if k = 1

,

if k > 1 and k < K

ωk ⎪ ⎪ i=tk−1 +1 ⎪ ⎪ ⎪ L ⎪ ⎪  i.pi ⎪ ⎪ , if k = K ⎪ ⎩ ωk i=tk−1 +1

I =

L−1 

i.pi

i=0

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

4

214

In case of multilevel thresholding, an exhaustive search of the optimum thresholds satisfying the discriminant criterion is a computationally intense task.

215

3.2. Particle Swarm Optimization (PSO)

212 213

216 217 218 219 220 221 222 223 224 225 226 227 228 229

PSO is a search and optimization technique introduced by Kennedy and Eberhart [25] based on the flocking behaviour of birds. PSO is an iterative method and uses a set of particles called swarm that moves in the search space to find a global optimum solution. A particle s is characterized by its position ps (t) and velocity vs (t) at time t. The position of a particle denotes a candidate solution to the optimization problem. The velocity of a particle controls its motion in the search space. Position and velocity of particles are constrained by maximum and minimum values denoted as pmax , pmin , vmax and vmin respectively. The particle’s fitness value f(ps ) is calculated from its current position using a fitness function. The velocity and the position of the particles at time (t + 1) are updated based on Eq. (6) and (7) respectively.

230

231 232

vs (t + 1) = w.vs (t) + ϑ1 .r1 .(pbest s (t) − ps (t)) + ϑ2 .r2 (gbest(t) − ps (t))

(6)

233

234

235 236 237 238 239 240 241 242

243

244 245 246 247 248 249 250 251 252 253 254 255

ps (t + 1) = ps (t) + vs (t + 1)

(7)

where w is the inertial weight, ϑ1 , ϑ2 are the acceleration constants and r1 , r2 are uniformly distributed random numbers in the interval [0,1], pbests (t) denotes a particle’s best position and gbest(t) denotes the best position of the entire swarm at time t. The particles travel in the search space by updating their velocity and position. Usually, the algorithm stops after a fixed number of iterations or until a specified error bound is reached. After the algorithm stops, gbest gives the optimal solution. 3.3. Rough Set Theory (RST) Rough Set Theory (RST) introduced by Zdzislaw Pawlak [26] is based on the assumption that there is information about examples from a universe of discourse U. The information observed about the examples in the universe is characterized by a set of attributes. Each example in the universe is described by the values of these attributes. In RST, a dataset or information about examples in the universe is represented as a table, where each row represents an event or an object or an example or an entity. Each column represents an attribute that can be measured for an object. This table is called an information system or an information table. The set of all elements in the information system is known as the universe.

3.3.3. Indiscernibility Any two examples are indiscernible or indistinguishable from each other whenever they assume identical values for every condition attribute in a set of attributes. In an information system, the set of all examples that are indiscernible is an elementary set [27]. In other words, an elementary set [x]R is a subset of the examples (elements) with the same values of the condition attributes. Let B ⊆ A and xg and xh be members of U. The indiscernibility relation is defined as R(B) = {(xg , xh ) ∈ U × U| ∀ a ∈ B, (xg , a) = (xh , a)}. R(B) is an equivalence relation. 3.3.4. Concepts Subsets of the set of all examples (elements) with the same value of the decision attribute are called concepts [27]. The concept X ⊆ U is the set of elements of U that have a particular value (say t) of the decision attribute d. That is, X = {x ∈ U|(x, d) = t}. Examples of the universe belonging to the concept are called positive examples and the rest of the examples are called negative examples. 3.3.5. Set approximation For each concept, the greatest definable set contained in the concept is called lower approximation of the concept and the least definable set containing the concept is called the upper approximation of the concept. The set containing the elements from the upper approximation of the concept that are not members of the lower approximation of the concept is called the boundary region. A set is said to be rough if the boundary region is non-empty. A set is said to be crisp if the boundary region is empty. The lower approximation of the concept is also called the positive region of the concept. The set of the elements in the universe other than the elements in the upper approximation is called the negative region of the concept. The lower and upper approximation of a set are interior and closure operations in a topology generated by the indiscernibility relation. The lower approximation of the concept X ⊆ U, with respect to U and equivalence relation R on U, is the union of the elementary sets [x]R of U with respect to R that are contained in X, and is denoted as RX = {x|[x]R ⊆ X}. That is, the lower approximation of the concept X with respect to the universe U and the equivalence relation R on U is the set of all elements which can be certainly classified as elements of X with respect to the equivalence relation R on U. The upper approximation of X is the union of the elementary sets of U with respect to R that  have a non-zero intersection with X, and is denoted as RX = {x|[x]R X = / ∅}. That is, the upper approximation of the concept X with respect to the universe U and the equivalence relation R on U is the set of all elements that can possibly be classified as elements of X with respect to the equivalence relation R on U. The set BN R (X) = RX − RX is called the boundary region of X. Thus, if the boundary region is non-empty, it means that there are elements in the universe that cannot be certainly classified and hence the set is rough [26–28]. 3.4. Variable Precision Rough Sets (VPRS)

256 257 258 259 260 261 262

263 264 265 266 267

3.3.1. Information system Formally, an information system T is a quadruple T = (U, A, V, ), where A is a non-empty, finite set of attributes; V = a ∈ A Va is the set of attribute values of all attributes, where Va is the set of possible values of attribute a;  : U × A → V is an information function, such that for every element x ∈ U, (x, a) ∈ Va is the value of attribute a for element x. 3.3.2. Decision system A decision system or decision table contains decision attributes in an information table. It is defined as D = (U, A ∪ {d}, V, ) where attribute d is the decision attribute and A is the set of condition attributes.

Variable Precision Rough Sets (VPRS) is a generalization of rough sets and can be used in the classification of uncertain information [29]. In the VPRS formulation, the lower and upper approximations are interpreted in probabilistic terms, leading to generalized notions of rough set approximations. The VPRS model leads to set approximation with broader positive region and narrower boundary region than the original model of rough sets [30]. This section deals with the basic notions of Variable Precision Rough Sets (VPRS). 3.4.1. ˇ-Positive and ˇ-negative regions The conditional probability of a concept X given an elementary set [x]R is the ratio of the number of elements of the elementary set

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

268 269 270 271 272 273 274 275 276 277

278 279 280 281 282 283 284

285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316

317

318 319 320 321 322 323 324 325

326 327 328

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

330

that are present in the concept to the total number of elements in the elementary set and is given by Eq. (8).

331

P(X|[x]R ) =

329

332 333 334 335 336 337 338 339 340 341 342 343 344 345 346

347 348 349 350 351 352 353 354 355

356 357 358 359 360 361

|[x]R ∩ X| |[x]R |

(8)

The asymmetric VPRS generalization of the original rough set model is based on the values of a lower and an upper limit certainty threshold parameters. Let ˇl and ˇu denote the lower and upper limit certainty threshold parameters respectively, such that 0 ≤ ˇl < P(X|[x]R ) < ˇu ≤ 1 [30,31]. The ˇu -positive region is defined by the upper limit parameter ˇu and the ˇl -negative region is defined by the lower limit parameter ˇl [32]. The ˇu -positive region is the union of the elementary sets whose conditional probability is greater than or equal to ˇu . Similarly, the ˇl -negative region is the union of the elementary sets whose conditional probability is less than ˇl . The VPRS model is symmetric if ˇu = 1 − ˇl and the upper limit certainty threshold parameter is denoted as ˇ. It is noted that ˇl = 1 − ˇ. Now, the ˇu -positive region and ˇl -negative region of a concept X are referred to as ˇ-positive region (POSˇ (X)) and ˇ-negative region (NEGˇ (X)), respectively. 3.4.2. Reducts A reduct is a minimal set of condition attributes that enables the same classification of the elements of the universe as the whole set of condition attributes. In other words, condition attributes that do not belong to the reducts are superfluous with regard to classification of elements of the universe. In the decision system D = (U, A ∪ {d}, V, ), let A be a subset of condition attributes A (i.e.,) A ⊆ A. The subset A is called a reduct of A if both the following conditions are satisfied: A

(1) The ˇ-positive region of the reduct (POSA (X)) is the same as the ˇ-positive region of the original set of conditional attributes A (POSA (X)). In other words, (POSA (X)) = (POSA (X)). (2) The reduct A is minimal, that is, no attribute a can be removed from the reduct without changing the original ˇ-positive region / (POSA (X)). given as (POSA −a (X)) =

366

There are usually several such reducts. Let the set containing all possible reducts be denoted as REDUCT. The reduct with the minimum cardinality of attributes among all reducts is called the minimal reduct (RED). The set of attributes that are common to all reducts is called the core, that is, Core = ∩ REDUCT.

367

4. Methods

362 363 364 365

369

This section details the proposed system for classifying PTC and NT histopathology images.

370

4.1. Pre-processing

368

371 372 373 374 375 376 377 378 379 380 381 382 383 384

The pre-processing step prepares the digital images for segmentation. The acquired histopathology images exhibit poor contrast due to insufficient lighting conditions and are also affected by noise. The noise in the images is eliminated by a filtering operation and the contrast of the images is enhanced by a contrast enhancement operation. We convert the input images in RGB color space into gray scale by retaining the red (R) channel of the image. This is because the nuclei region is well highlighted in the R channel. The gray scale image is filtered using the median filter to remove the effect of noise to give the filtered image. The contrast of the filtered image is enhanced by manipulating its histogram using the contrast limited adaptive histogram equalization (CLAHE) operation. The resultant image after pre-processing is the pre-processed image Ipre−processed .

5

4.2. Segmentation

385

This work uses PSO-based Otsu’s multilevel thresholding method to segment the pre-processed Ipre-processed image into different binary images corresponding to different classes. Literature shows that PSO-based multilevel image thresholding is superior in terms of speed, quality of solution, robustness and stability when compared to Genetic Algorithm (GA) [33], Bacterial Foraging Algorithm (BFA), Differential Evolution (DE), Simulated annealing (SA) and Tabu Search (TS) [34]. The formulation of the PSO-based6 Otsu’s multilevel thresholding problem [35,36] and the different initial values of the parameters used are similar to our previous work of using this technique to segment nuclei from breast histopathology images [37]. However, it is presented here for completeness as follows. The Particle Swarm Optimization (PSO) algorithm searches the optimum thresholds considering the histogram of the image as the search space. The position of a particle ps (t) corresponds to a threshold vector and the number of classes K determines the dimensions of the threshold vector, that is, ps (t) = t1 , t2 , . . ., t(K−1) , where each tj denotes a threshold value. When the algorithm terminates, the gbest value yields the optimal threshold vector, which contains the K − 1 threshold values. Otsu’s discriminant criterion is modeled as the objective function. The thresholds that maximize the betweenclass variance are the optimum thresholds as given in Eq. (9). Topt =

max

1
B2

(9)

where Topt is the optimum threshold vector, K is the number of classes into which the image is to be segmented and B2 is as given in Eq. (2). The preprocessed image Ipre-processed is partitioned into K binary images corresponding to the K classes by thresholding it using the K − 1 threshold values as follows. Let t1 , t2 , . . ., tK−1 be the thresholds obtained after applying PSO-based Otsu’s multilevel thresholding. Let Bj , j ∈ {1, 2, . . ., K} represent the corresponding binary images. If I(x, y) is the intensity value of the pixel in the input image at location (x, y), the binary image Bj (x, y) is constructed as given in Eq. (10).

Bj (x, y) =

⎧ 1, if ((1 ≤ I(x, y) ≤ t1 ) ∧ (j = 1))∨ ⎪ ⎪ ⎪ ⎪ ⎨ ((tj−1 < I(x, y) ≤ tj ) ∧ (j > 1 ∧ j < K))∨ ⎪ ⎪ ⎪ ⎪ ⎩

((tK−1 < I(x, y) ≤ L) ∧ (j = K)) 0,

(10)

386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408

409

410 411 412 413 414 415 416 417 418 419 420

421

otherwise

In this work, the number of classes (K) is found out experimentally and is initialized to 3. The three binary images obtained after the application of the PSO based Otsu’s thresholding comprises certain regions of the input image such that the pixels in a particular binary image have the same visual characteristics. Fig. 3 shows sample NT and PTC images and their segmentation results using the PSO-based Otsu’s multilevel thresholding. Note that all images are cropped for better visibility. 4.3. ROI extraction This section explains the extraction of the nuclei (ROI) from the segmented binary images. Each binary image Bj obtained after segmentation contains different regions of the original image based on the intensity values of the pixels. Since the diagnosis of PTC is based on the nuclear features, the next step is to extract the nuclei regions alone from the segmented binary images. Out of the three binary images obtained after segmentation, only one image contains the nuclear regions. The image that contains the nuclei regions Bnuclei is selected manually.

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

422 423 424 425 426 427 428 429

430

431 432 433 434 435 436 437 438 439

G Model ASOC 3489 1–13 6

ARTICLE IN PRESS J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

Fig. 3. Segmentation results of sample images from the dataset. (a) NT image. bf (b)–(d) Three binary images obtained due to PSO-based Otsu’s multilevel thresholding of image (a). (e) PTC image. (f)–(h) Binary images obtained due to PSO-based Otsu’s multilevel thresholding of image (e). Note: All images are cropped for better visibility.

440 441 442 443 444 445 446 447 448 449 450 451 452 453

Along with the nuclei regions, the selected image contains other unnecessary artefacts. It is crucial to separate the nuclei regions alone from the images containing the nuclei regions for further processing. While visually inspecting the selected binary image Bnuclei , it is noted that, the nuclei regions are smaller in area when compared to the artefacts and are approximately round to elliptical in shape. Taking advantage of this fact, we eliminate those regions that do not conform to an area constraint and a roundness constraint from the selected binary image. Area of an object gives the total number of pixels in the object and the roundness measure defines how circular the object is. The area and roundness value of an ideal nucleus are found out experimentally as 250 pixels and 0.80 respectively. These values are treated as the area

threshold  nuclei and the roundness threshold  nuclei of an ideal nucleus. Let the binary image Bnuclei selected for further processing consist of a set of m blobs or connected components denoted as C = {c1 , c2 , . . ., cm }. Let Acj , Pcj and Rcj denote the area, perimeter and roundness of a connected component cj respectively where j ∈ {1, 2, . . ., m} [38,39]. The roundness measure of a connected component is calculated using Eq. (11). Rcj =

4 Acj Pc2j

(11)

where perimeter Pcj is the sum of the distances between each adjoining pair of pixels around the boundary of a connected

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

454 455 456 457 458 459 460 461

462

463 464

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx 465 466 467 468 469 470

471

472 473 474

7

component. A connected component c j ∈ C is considered a valid nucleus and is retained in the binary image Bnuclei if its area is less than or equal to the area threshold and its roundness is greater than or equal to the roundness threshold as given by Eq. (12). Otherwise, it is considered to be an artefact and is eliminated from the binary image.

and 135◦ . Hence, a total of sixteen GLCM features are extracted. The four GLCM features are computed as given by Eqs. (13–16).

(Acj ≤ nuclei ) ∧ (Rcj ≥ nuclei )

contrast =

(12)

Small regions whose size is less than 50 pixels are removed from the resulting binary image using morphological size filtering operation. Algorithm 1 illustrates the steps involved in the ROI extraction. ROI extraction

row col

u=1 v=1 L L  

(u − v)2 Puv

(13)

509

(14)

510

(15)

511

(16)

512

u=1 v=1

homogenity =

475

Algorithm 1.

 (u − row )(v −  )Puv col

energy =



L L   u=1 v=1

Puv 1 + |u − v|

2 Puv

u,v

476

where row , col ,  row ,  col represent the mean and standard deviation of the row and column values of the GLCM matrix respectively. The term Puv is the uvth term of G divided by the sum of the elements of G. The sixteen features used in this study are represented as GLCM contrast , GLCM correlation , GLCM homogeneity and GLCM energy



=

480

Fig. 4 shows sample NT and PTC images and their ROI extraction results using the proposed approach. Note that all images are cropped for better visibility.

481

4.4. Texture feature extraction

478 479

482 483 484 485 486 487 488 489 490 491

492 493 494 495 496 497 498 499 500 501 502 503 504 505 506

where ∈

{0◦ ,

45◦ ,

90◦ ,

135◦ }.

As the staining characteristics of orphan annie-eye nuclei in PTC images are different from that of the nuclei in NT images, the properties of pixels within these nuclei vary greatly. Texture characteristics of the nuclear areas are used to measure these variations. Therefore in this work, texture features are extracted from the binary image containing the separated nuclei Bnuclei . Statistical texture features represented by gray level co-occurrence matrix (GLCM) and intensity histogram are used in this study. The following paragraphs detail the texture features that are extracted for this work. 4.4.1. GLCM features A gray level co-occurrence matrix is a matrix constructed from a gray scale image and measures texture features by characterizing intensity distribution and the relative positions of pixels in the image. It was originally proposed by Haralick [40] who extracted fourteen different features from the GLCM matrix. Formally, a gray level co-occurrence matrix G is an L × L matrix, where L denotes the maximum number of gray levels in an image. An entry G[u, v] denotes the probability that the number of times a gray level value u co-occurs with a value v at distance d and angle

in the given image. In this work, four GLCM features namely GLCM correlation, GLCM contrast, GLCM homogeneity and GLCM energy are extracted with distance between neighboring pixels d = 1. All the four features are extracted in the following four directions = 0◦ , 45◦ , 90◦

2

Li=1 (i − ) pi

513 514 515 516 517 518

4.4.2. Intensity features Two intensity histogram-based features are extracted for this work. They are as follows: Mean (). It measures the average pixel intensity of the entire image and is given by Eq. (5). Standard deviation (). It measures the average contrast of the image and is given by Eq. (17)

477

508

L

L

correlation =

507

(17)

where pi is as given in Eq. (1). Hence, a total of eighteen texture features are extracted. 4.5. Problem formulation using VPRS In this work, the Variable Precision Rough Sets (VPRS)-based ˇ-reduct method is used to remove redundant attributes and to generate rules. Rough set theory (RST) is applied in several areas such as image processing, machine learning, computer networks, finance, medicine, vibration analysis, conflict resolution, intelligent agents, control theory, process industry, marketing and so on to name a few [27,28]. In the field of medical image processing, RST is used to segment and identify the region of interest, to reduce image features, to perform classification of images and to diagnose diseases [41–44]. Rough Set Theory is used with various medical images like Magnetic Resonance Images (MRI) [45], mammogram [46] and Computed Tomography (CT) images [47]. On the other hand, there has been no work reported on using RST for H&Estained histopathology images. This section explains the problem formulation. We assume that the images in the dataset represent the knowledge about the domain. The set of images is described by an information table. Each row (x ∈ U) in the information table corresponds to an image. The 219 images in the dataset correspond to the different rows in the information table. The set of condition attributes A correspond to the eighteen texture features extracted from the nuclei. For the purpose of data analysis, we have replaced the original condition attribute values with discrete ranges following equal width binning. For example, the attribute GLCM0correlation is discretized into 7 categories denoted as 1, 2, 3, 4, 5, 6 and 7. For instance, category 1 stands for the range (0.80, 0.82], category 2 stands for the range (0.82, 0.84], category 3 stands for the range (0.84, 0.86], category 4 stands for the range (0.86, 0.88], category 5 stands for the range (0.88, 0.90], category 6

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

519 520 521 522 523 524

525

526 527

528

529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

8

Fig. 4. Nuclei (ROI) extraction results of sample images from the dataset. (a,d) Original NT and PTC images respectively. (b,e) Binary images containing nuclei that are selected manually. (c,f) Binary images showing the extracted nuclei from (b,e) respectively after imposing area and roundness constraints. Note: All images are cropped for better visibility.

558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577

578

stands for the range (0.90, 0.92] and category 7 stands for the range (0.92, 0.96]. The information table is then converted into a decision table by adding a decision attribute d and the corresponding values for the decision attribute for all the rows. In our work, the decision attribute can take two values, 0 or 1. Thus, Vd = {0, 1}. The decision attribute’s value is also called a class label. If a given image is PTC, then the value of the decision attribute is 1 and is denoted as (x, d) = 1. If a given image is NT, then the value of the decision attribute is 0 and is denoted as (x, d) = 0. This decision table is used to classify a given image as either PTC or NT. We define the concept X ⊆ U as the set containing all the PTC images; that is, for each element in X the value of the decision attribute is 1. The images that have similar values of condition attributes are indiscernible and belong to the same elementary set. The conditional probability of the concept X given an elementary set [xi ]R is defined as the ratio of the number of images in the elementary set [xi ]R that are PTC to the total number of images in the elementary set and is given by Eq. (18). |X ∩ [xi ]R | P(X|[xi ]R ) = |[xi ]R |

(18)

581

ˇ-Positive region of the set X, (POSˇ (X)) consists of the elementary sets for which the value of the conditional probability P(X|[xi ]R ) is greater than or equal to ˇ and is given by Eq. (19).

582

POSˇ (X) =

579 580

P(X|[xi ]R )≥ˇ

{[xi ]R ∈ R(B)}

(19)

585

ˇ-Negative region of the set X, (NEGˇ (X)) consists of the elementary sets for which the value of the conditional probability P(X|[xi ]R ) is less than ˇ and is given by Eq. (20).

586

NEGˇ (X) =

583 584

P(X|[xi ]R )<ˇ

{[xi ]R ∈ R(B)}

(20)

In this work, the images in the elementary sets in the ˇ-positive region are considered to be PTC. Similarly, the images in the elementary sets in the ˇ-negative region are considered NT.

4.6. Feature reduction

587 588 589

590

The aim of attribute reduction is to reveal those condition attributes that are redundant and do not provide any additional knowledge to the classification process. Removing redundant attributes saves computation time and space. In this work, Variable Precision Rough Sets (VPRS)-based ˇ-reduct is used to identify the redundant attributes and eliminate them. The advantage of this method is that it does not require any additional information about the data. The redundant condition attributes are identified as follows: elementary sets are identified from the information table (with the entire eighteen condition attributes A). Let X = {[x1 ]R , [x2 ]R , . . ., [x ]R } denote the set of all elementary sets identified and denote the total number of elementary sets. Among these elementary sets, the elementary sets that fall in the ˇ-positive region of the concept X are found using Eq. (19). Let this be denoted as ˇpos . From the set of condition attributes A, a condition attribute b is removed and the elementary sets are identified as was done earlier. Let X = {[x1 ]R , [x2 ]R , . . ., [x ] } denote the set of all elementary R

sets identified and denote the total number of elementary sets found out. The elementary sets that fall in the ˇ-positive region of the concept X are identified. Let this be denoted as ˇpos-b . If ˇpos and ˇpos-b are equal, that is, if they contain the same elements, then the attribute b is deemed as redundant and can be removed from the set of condition attributes A. Else, the attribute b is retained in the set of condition attributes. This process is continued for all the condition attributes in the information table and the redundant attributes are found out. The remaining subset of conditional attributes (i.e., the attributes that are not removed from the original set of conditional attributes A) is called a reduct A . Similarly, all

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx 621 622 623 624 625

possible reducts are identified using the above procedure. From the set of identified reducts REDUCT, the one with the minimal cardinality is chosen (RED). Algorithm 2 details the steps involved in the attribute-reduction process. The information table with the RED set of condition attributes results in a reduct table RT. Algorithm 2.

Feature reduction

626

9

The generated rule’s antecedent R(A) is formed by the conjunction of the condition attributes and their corresponding values in the elementary set. The generated rule’s consequent R(Q) value is 0 if the elementary set is in the ˇ-negative region (NT) or 1 if the elementary set is in the ˇ-positive region (PTC). Let ai1 , ai2 , . . ., aip denote the condition attributes in an ele-

mentary set [xiR ]R , v(ai1 ), v(ai2 ), . . ., v(aip ) denote the corresponding

values of the condition attributes ai1 , ai2 , . . ., aip respectively in the elementary set and p denote the total number of condition attributes in the elementary set. A positive rule as given in Eq. (21) is generated and stored in the rule base if the elementary set is in the ˇ-positive region of the concept. Otherwise, a negative rule as given in Eq. (22) is generated and stored in the rule base if the elementary set is in the ˇ-negative region of the concept. +

R =

(ai1 , v(ai1 )) ∧ (ai2 , v(ai2 ))∧, . . ., ∧(aip , v(aip ))

652 653 654 655 656 657 658 659 660 661 662 663 664 665

→1

(21)

666

R− = (ai1 , v(ai1 )) ∧ (ai2 , v(ai2 ))∧, . . ., ∧(aip , v(aip )) → 0

(22)

667

Algorithm 3 gives the steps involved in the rule generation process.

668

Algorithm 3.

Rule generation 669

627

670

628

629 630 631 632 633 634 635 636 637 638 639 640 641 642

4.7. Rule generation The rule generation step aims at creating rules to be stored in the rule base RDB . A rule base is a repository of rules. A rule R has two parts: the rule antecedent R(A) and the rule consequent R(Q) and is denoted as R(A) → R(Q) where → denotes implication. The rule antecedent is composed of conjunction of attribute value pairs (ra, v(ra)) where ra represents an attribute in a rule, v(ra) denotes the value of attribute ra in a rule. The rule consequent denotes an outcome implied by the rule. Hence, a rule’s antecedent specifies the values of attributes that have to be held so that the rule’s consequent can be met. In this work, rules are generated from the reduct table. A rule can be a positive rule R+ or negative rule R− corresponding to the partition induced by the value of the decision attribute. Let XR = {[x1R ]R , [x2R ]R , . . ., [x R ] } denote the set of all elementary sets 1

643 644 645 646 647 648 649 650 651

R

identified from the reduct table (RT) where 1 denotes the total number of elementary sets. For every identified elementary set [xiR ]R ∈ XR , a corresponding positive or negative rule is stored in the rule base as follows. If the elementary set belongs to the ˇ-positive region of the concept X according to Eq. (19), a positive rule is generated and stored in the rule base. Similarly, if the elementary set belongs to the ˇ-negative region of the concept X according to Eq. (20), a negative rule is generated and stored in the rule base.

4.8. Classification based on closest-matching-rule (CMR) After the construction of the rule base, the efficiency of the system is tested by providing unknown test samples and determining how well the system identifies the class label of the test sample using the rules in the base. This is a classification problem that aims at assigning a class label to a test sample. In this work, a novel closest-matching-rule (CMR) classification algorithm is proposed to classify a given test sample. This approach computes a significance value for all the condition attributes of the reduct table. When a test sample is supplied to the CAD system, a closeness value is computed between the test sample and all the rules in the rule base. The rule that has the maximum value for closeness is taken as the closest matching rule and is used to compute the class label l of the test sample. When more than one rule qualify as the closest matching rule (i.e., they have same value for closeness), then a majority voting strategy is used to compute the class label l of the test sample. The procedure for identifying the closest matching rule and the assignment of a class label to a test sample is elaborated below. Significance of an attribute S(a). We assign every attribute a ∈ A in the reduct table RT with a significance value (weight), which

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

671

672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

10 692 693 694

695

describes its importance. In this work, an attribute’s significance S(a) is calculated from the information required by the attribute Infoa (RT) for classifying a test sample [48] and is given by Eq. (23). Infoa (RT ) =

v  |Tk |

|RT |

k=1 696 697 698 699 700 701

Entropy(Tk )

(23)

where v represents the distinct values of attribute a, Tk denotes the set of samples from reduct table RT that has value k for attribute a, |Tk | is the total number of samples in Tk , |RT| denotes the total number of samples in RT. Entropy(Tk ) denotes the expected information needed to classify a sample in Tk . It is also known as the entropy of Tk and is computed according to Eq. (24).

 m

702

Entropy(Tk ) = −

pr c log2 (pr c )

(24)

c=1

707

where m denotes the distinct values of decision attribute d in the data partition Tk and prc denotes the fraction of samples that has decision attribute value as c in partition |Tk |. Let |Tc | represent the number of samples that has decision attribute value as c in partition |Tk | then prc is given by Eq. (25).

708

pr c =

703 704 705 706

|Tc | |Tk |

Depending upon the information required by an attribute to classify a test sample, it is assigned a significance value as given in Eq. (26).

711

S(a) = (1 − Infoa (RT ))

712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736

It is obvious from Eq. (26) that an attribute is more significant if it requires less amount of information to classify a test sample and vice versa. Similar attribute. Let t denote a test sample. Let p represent the total number of condition attributes in a rule’s antecedent as well as the test sample. Let tai and rai represent the ith condition attribute in the test sample and the rule’s antecedent respectively. Let v(tai ) and v(rai ) denote the value of the condition attribute i in the test sample and the rule’s antecedent respectively. A condition attribute in a test tuple tai is similar to the corresponding attribute rai in a rule’s antecedent if they have same values (i.e.,) v(tai ) = v(rai ) for any attribute i, i ∈ 1, 2, . . ., p. Closeness ( (R)). When a test sample is given for classification to the CAD system, a closeness value is calculated for every rule in the rule base with the test sample. The notion of closeness between a rule and the given test sample is to depict the number of attributes that are similar in the rule and test sample. However, we found out that counting solely the number of attributes that are similar is not a good technique to quantify closeness. Therefore, we formulated a precise way to compute closeness, which is explained below. Closeness of a rule with a test sample is the ratio of the cumulative sum of the significance values of the condition attributes that are similar in the test sample and in a given rule to the sum of the significance values of all the attributes in the antecedent of the rule and is given by Eq. (27).

738 739 740 741 742 743 744 745

(R) =

747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762

Closest-matching-rule (CMR) 763

(26)

p

737

Algorithm 4.

746

(25)

710

709

should have high significance value. Sometimes more than one rule may be close to a test sample. When only one rule is found as the closest rule, the class label of the given test sample is set as PTC (1) or NT (0) based on whether the closest matching rule Rclosest is positive or negative respectively. When more than one rule has the same value for closeness with a test sample, the set of all closest rules is chosen. Each rule in the set is made to cast a vote towards the class label of the test sample. A positive rule casts a positive vote and a negative rule casts a negative vote. Let vote+ and vote− denote the total number of positive and negative votes respectively. The class label of the test sample is assigned as PTC (1) if vote+ > vote− , that is, the total number of positive votes is greater than the total number of negative votes. Similarly, the test sample is classified as NT (0) if vote− > vote+ , that is, the total number of negative votes is greater than the total number of positive votes. Algorithm 4 gives the algorithm for the closest-matching-rule.

i=1 S(ai )∀ai |v(rai ) = v(tai ) p

i=1 S(ai )

(27)

Consequently, a rule is assigned a higher closeness value if many of its attributes are similar to that of the test tuple and the similar attributes must be of a high significance value. Closest matching rule (Rclosest ). Closest matching rule Rclosest for a test sample is the rule in the rule base RDB that has the maximum value for the closeness . To be qualified as a closest matching rule, a rule must satisfy the following two criteria: First, it should be maximally similar with the test tuple. Second, the similar attributes

764

5. Experimental results

765

5.1. Dataset description

766

The dataset used in this work contains 219 H&E-stained thyroid histopathology images taken from biopsy samples of 12 patient cases. The images are digitized and acquired using an Olympus microscope and camera. All images are taken at 40× optical magnification of the objective lens. All the images in the dataset are in JPEG format and the size of each image is 640× 480 pixels. The images are diagnosed into two classes (Normal Thyroid or Papillary Thyroid Carcinoma) by a pathologist. Out of the 219 images, 155

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

767 768 769 770 771 772 773 774

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

776

images are diagnosed as Papillary Thyroid Carcinoma (PTC) and 64 images are diagnosed as Normal Thyroid (NT).

777

5.2. Parameter tuning

775

778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826

827

828 829 830 831 832 833

In this work, the values for a few parameters are determined as follows. The parameters are the number of classes K for PSO based Otsu’s multilevel thresholding, the values of the inertial weight, acceleration constants and other parameters of PSO, the distance between the co-occurring pairs of pixels d in the GLCM texture feature extraction and the value for the threshold ˇ. To identify the value of the number of classes K in PSO based Otsu’s multilevel thresholding, experiments are conducted by setting the value of K as 2, 3 and 4. When K = 2, the nuclei are merged with other regions of the image and hence extracting the nuclei regions accurately was hard. When K = 4, due to the inhomogeneity of the intensity of the pixels within the nuclei, the nuclei are split into multiple-segmented binary images. Hence, it was required to assemble the nuclei regions from multiple images. When K = 3, the nuclei are partitioned in a way that was straightforward to extract them. Moreover, the histopathology images contain three regions corresponding to cytoplasm, nuclei and background. Hence, this also supports the cause of setting the value of K as 3. The tuning of the PSO algorithm parameters such as inertial weight and acceleration constants is a topic that warrants in-depth research but is beyond the scope of this work. However, in this work, we found out that setting the following values to the parameters provides better results. The values of the parameters are also based on our previous work of segmenting the nuclei from breast histopathology images. The initial position of the particles is assigned randomly within the range of intensity values [0, L − 1] of the pixels in an image. The parameters such as inertial weight (w), acceleration constants (ϑ1 , ϑ2 ), minimum and maximum velocity (vmax and vmin ), number of generations or iterations and the number of particles are initialized as 1.2, 0.8, [−3, 3], 50 and 25 respectively. The distance between the co-occurring pairs (pixels) d when computing GLCM texture features has been set as 1, 2, 4, 8, 32, . . . [49,50]. However, choosing a small value for d is found to provide better classification accuracy. Moreover, a small value for d captures the texture information of the pixels in the image well because it is highly likely for a pixel to be more similar to its neighbour than to a distant pixel. In this work, we conducted experiments to find the optimal value for d that yields the maximum value for classification accuracy. This is done by extracting GLCM features with the value of d set as 1, 2, 4 and 8 and finding the classification accuracy using the proposed method. The classification accuracy of the proposed method is found out as 100%, 74.43%, 83.23% and 83.52% for the corresponding values of d set as 1, 2, 4 and 8 respectively. The results indicate that the classification accuracy is maximum when d = 1 and it decreases for other values of d. Hence, we have taken d = 1 for all our experiments. The value for the threshold ˇ is application specific. In this work, the results obtained by setting ˇ = 0.5 gives 100% classification accuracy. Hence, in all our experiments, the value of ˇ is set to 0.5. 5.3. Identifying redundant attributes In this work, an exhaustive search is made to identify all the reducts. We found out that for our information table, a subset containing three attributes namely {GLCM0contrast , GLCM0energy and GLCM45 energy } are redundant. Thus, eliminating this subset of redundant attributes, we get the reduct, that is, REDUCT = {GLCM0corelation , GLCM0homogeneity , GLCM45 contrast ,

834

GLCM45 corelation ,

835

GLCM90 energy ,

GLCM45 homogeneity , GLCM90 homogeneity ,

GLCM90 contrast , GLCM135 contrast ,

GLCM90 corelation , GLCM135 corelation ,

11

35 GLCM135 energy , GLCMhomogeneity , Mean and Standard deviation. Eliminating any of the attributes from the REDUCT did not provide the original ˇ-positive region. Since, this is the only reduct we obtained, this is the minimal reduct, that is, in this work, 45 RED = {GLCM0corelation , GLCM0homogeneity , GLCM45 contrast , GLCMcorelation , 90 90 90 GLCM45 homogeneity , GLCMcontrast , GLCMcorelation , GLCMenergy ,

836 837 838 839 840 841

135 135 135 GLCM90 homogeneity , GLCMcontrast , GLCMcorelation , GLCMenergy , 35 GLCMhomogeneity , Mean and Standard deviation}. The core

842

is also the same as the minimal reduct in this case. The identified minimal reduct is used for generating the rules.

5.4. Classifier performance evaluation metrics

843 844 845

846

The classification performance of the proposed method is evaluated over a test set by computing sensitivity (SN), specificity (SP) and accuracy (ACC), which are evaluated using true positive (TP), true negative (TN), false positive (FP) and false negative (FN) values [48]. True positive (TP) refers to the total number of PTC images that are correctly labeled as PTC by the classification algorithm. True negative (TN) refers to the total number of NT images that are correctly labeled as NT by the classification algorithm. False positive (FP) refers to the total number of NT images that are incorrectly labeled as PTC by the classification algorithm. False negative (FN) refers to the total number of PTC images that are incorrectly labeled as NT by the classification algorithm. Let N denote total number of images in the test set. Let POS and NEG denote the total number of PTC images and NT images in the test set respectively. Then sensitivity, specificity and accuracy are given as follows. Sensitivity (SN) quantifies the ability of the proposed method to identify and label PTC images as PTC. It can also be termed as true positive recognition rate. It is the fraction of the PTC images that are correctly labeled and is given by Eq. (28). Specificity is the ability of the proposed method to identify and label NT images as NT. It is the fraction of NT images that are correctly labeled and is given by Eq. (29). Accuracy gives the efficiency of the proposed method to label both NT and PTC images correctly. It is the ratio of the sum of true positives and true negatives to the total number of elements in the test set and is given by Eq. (30).

847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874

Sensitivity =

TP POS

(28)

875

Specificity =

TN NEG

(29)

876

(30)

877

Accuracy =

TN + TP N

5.5. Classification performance The performance of the proposed system (VPRS-CMR) is compared with classifiers like K-nearest neighbour (K-NN) and Naive Bayes. The performance is evaluated using ten-fold cross-validation and 80-20 training and testing strategy. To perform 10-fold cross-validation, the dataset is partitioned into ten mutually exclusive subsets denoted as D1 , D2 , D3 , D4 , D5 , D6 , D7 , D8 , D9 and D10 each containing an equal number of images from the entire dataset. Training and testing is done repeatedly for ten times. For every iteration, one of the ten folds is used for testing and the remaining nine folds are used for training the classifier [48]. The evaluation parameters (sensitivity, specificity and accuracy) are tabulated for each fold and the average values are found. Table 1 summarizes the results of the classification performance of all the classifiers using ten-fold cross-validation. It is evident from

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

878

879 880 881 882 883 884 885 886 887 888 889 890 891 892

G Model

ARTICLE IN PRESS

ASOC 3489 1–13

J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

12

Table 1 Classification performance by 10-fold cross-validation. Method

SN

SP

Accuracy

K-NN Naive Bayes VPRS-CMR

0.9813 0.9938 1

0.9690 1 1

0.9771 0.9955 1

Table 2 Classification performance by 80-20 training and testing strategy

893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928

929

930 931 932 933 934 935 936 937 938 939 940 941

Method

SN

SP

Accuracy

K-NN Naive Bayes VPRS-CMR

1 0.9955 1

1 1 1

1 0.9938 1

Table 1 that the classification accuracies of the traditional classifiers K-NN and Naive Bayes are 97.71% and 99.55% respectively, whereas the classification accuracy of the VPRS-CMR method is 100%. The results show that the classification accuracy of our proposed method (VPRS-CMR) is on par with Naive Bayes classifier and is better when compared with the K-NN classifier. In order to evaluate the classification performance of the classifiers (VPRS-CMR, Naive Bayes and K-NN) using 80-20 training and testing strategy, 80% of the examples are randomly selected from the dataset and are used for training the classifiers. The remaining 20% of the examples in the dataset are used for testing. We repeated this process 10 times. The evaluation parameters (sensitivity, specificity and classification accuracy) are tabulated every time and the average values of the parameters are found. Table 2 details the values of the evaluation parameters obtained for the various classifiers using 80-20 training and testing strategy. Table 2 summarizes that the classification accuracies of the classifiers K-NN and Naive Bayes are 100% and 99.38% respectively, whereas the classification accuracy of the VPRS-CMR method is 100%. The 100% classification accuracy of the K-NN method using the 80-20 training and testing strategy is due to the random selection of the training and testing samples. The results show that the classification accuracy of our proposed method (VPRS-CMR) is on par with the Naive Bayes classifier and the K-NN classifier. However, in this study, we have considered a small data set. The difference in the classification performance between the classifiers would be visible if a large data set is considered. Use of soft computing techniques such as Particle Swarm Optimization (PSO) and Variable Precision Rough Sets (VPRS) combined with closest-matching-rule (CMR) algorithm in the CAD system provides the intended results. Using PSO for searching the optimal thresholds for Otsu’s multilevel thresholding to segment the input images is a simple and sophisticated method. The VPRS theory along with the closest-matching-rule (CMR) algorithm helps in eliminating redundant features and forms a rule-based classifier that works on par with traditional classifiers 6. Conclusions This paper proposes a CAD system to semi-automatically segment and classify H&E-stained thyroid histopathology images into two classes such as Normal Thyroid (NT) and Papillary Thyroid Carcinoma (PTC). The PSO-based Otsu’s thresholding is used to segment the images into different binary images. The stringent area and roundness constraint eliminate artefacts and restrict the images to contain only nuclei regions. Moreover, exploiting the differences in the biological and staining characteristics of the nuclei present in NT and PTC images is accomplished by quantifying 18 texture features. The CAD system uses VPRS based ˇ-reduct to eliminate unnecessary features and to generate positive and negative rules of the concept. The proposed approach resorts to

a closest-matching-rule (CMR) algorithm that assigns a significance value to attributes based on information requirement of the attributes. When a test sample is provided to the CAD system, a closeness value is computed for all the rules and the technique resolves the problem of assigning the class label to the test sample based on the maximally similar rule that has a high closeness value to a test tuple. From the results, we conclude the following: the proposed approach of separating the nuclei from other structures (cytoplasm, background) and extracting texture features from the nuclei possesses the ability to distinguish NT and PTC H&E-stained thyroid histopathology images. The closest-matching-rule (CMR) algorithm does not require any extra parameters; yet it performs on par with traditional classifiers. We note that the segmentation method is not fully automated and that the CAD system needs manual intervention to select a binary image that contains the nuclei for ROI extraction stage. In future, we plan to devise a fully automated segmentation approach and study its effect on the classification accuracy. Moreover, in future, this work can be extended as a complete CAD system for PTC by integrating automated image analysis for other major variants of PTC like the follicular variant and the tall cell variant. References [1] C.D. Scopa, Histopathology of thyroid tumors. An overview, Hormones 3 (2) (2004) 100–110. [2] N. Al-Brahim, S.L. Asa, Papillary thyroid carcinoma: an overview, Arch. Pathol. Lab. Med. 130 (7) (2006) 1057–1062. [3] J. Norman, Thyroid Cancer Symptoms, Diagnosis, and Treatments, 2015 http:// www.endocrineweb.com/conditions/thyroid-cancer/thyroid-cancer/. [4] V.A. LiVolsi, Papillary thyroid carcinoma: an update, Mod. Pathol. 24 (2011) S1–S9, http://dx.doi.org/10.1038/modpathol.2010.129. [5] R.V. Lloyd, D. Buehler, E. Khanafshar, Papillary thyroid carcinoma variants, Head Neck Pathol. 5 (1) (2011) 51–56. [6] N. Li, X.L. Du, L.R. Reitzel, L. Xu, E.M. Sturgis, Impact of enhanced detection on the increase in thyroid cancer incidence in the United States: review of incidence trends by socioeconomic status within the surveillance, epidemiology, and end results registry, 1980–2008, Thyroid 23 (1) (2013) 103–110. [7] L.G. Morris, A.G. Sikora, T.D. Tosteson, L. Davies, The increasing incidence of thyroid cancer: the influence of access to care, Thyroid 23 (7) (2013) 885–891. [8] R.M. Bavle, Orphan annie-eye nuclei, J. Oral Maxillofac. Pathol. 17 (2) (2013) 154–155. [9] M. Dundar, S. Badve, G. Bilgin, V. Raykar, R. Jain, O. Sertel, M. Gurcan, Computerized classification of intraductal breast lesions using histopathological images, IEEE Trans. Biomed. Eng. 58 (7) (2011) 1977–1984. [10] W. Wang, J.A. Ozolek, G.K. Rohde, Detection and classification of thyroid follicular lesions based on nuclear structure from histopathology images, Cytometry A 77A (5) (2010) 485–494. [11] J.A. Ozolek, A.B. Tosun, W. Wang, C. Chen, S. Kolouri, S. Basu, H. Huang, G.K. Rohde, Accurate diagnosis of thyroid follicular lesions from nuclear morphology using supervised learning, Med. Image Anal. 18 (5) (2014) 772–780. [12] E. Kim, Z. Baloch, C. Kim, Computer assisted detection and analysis of tall cell variant papillary thyroid carcinoma in histological images, in: in: Proc. of SPIE 9420, Medical Imaging 2015, Digital Pathology, 2015, http://dx.doi.org/10. 1117/12.2082156. [13] Y. Wang, D. Crookes, O.S. Eldin, S. Wang, P. Hamilton, J. Diamond, Assisted diagnosis of cervical intraepithelial neoplasia (CIN), IEEE J. Sel. Top. Signal Process. 3 (1) (2009) 112–121. [14] P.W. Huang, C.H. Lee, Automatic classification for pathological prostate images based on fractal analysis, IEEE Trans. Med. Imaging 28 (7) (2009) 1037–1050. [15] C. Wittke, J. Mayer, F. Schweiggert, On the classification of prostate carcinoma with methods from spatial statistics, IEEE Trans. Inf. Technol. Biomed. 11 (4) (2007) 406–414. [16] K. Nguyen, B. Sabata, A.K. Jain, Prostate cancer grading: gland segmentation and structural features, Pattern Recognit. Lett. 33 (7) (2012) 951–961. [17] A. Basavanhally, S. Ganesan, S. Agner, J. Monaco, M. Feldman, J. Tomaszewski, G. Bhanot, A. Madabhushi, Computerized image-based detection and grading of lymphocytic infiltration in HER2+ breast cancer histopathology, IEEE Trans. Biomed. Eng. 57 (3) (2010) 642–653. [18] A. Tashk, M. Helfroush, H. Danyali, M.A. Jahromi, A CAD mitosis detection system from breast cancer histology images based on fused features, in: in: Proc. of 22nd Iranian Conference on Electrical Engineering, 2014, pp. 1925–1927. [19] M. Veta, P.J. van Diest, R. Kornegoor, A. Huisman, M.A. Viergever, J.P.W. Pluim, Automatic nuclei segmentation in H&E stained breast cancer histopathology images, PLoS ONE 8 (7) (2013) e70221, http://dx.doi.org/10.1371/journal.pone. 0070221.

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963

964

965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016

G Model ASOC 3489 1–13

ARTICLE IN PRESS J. Angel Arul Jothi, V. Mary Anita Rajam / Applied Soft Computing xxx (2016) xxx–xxx

1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059

[20] P.W. Huang, Yan-HaoLai, Effective segmentation and classification for HCC biopsy images, Pattern Recognit. 43 (4) (2010) 1550–1563. [21] H. Huang, A.B. Tosun, J. Guo, C. Chen, W. Wang, J.A. Ozolek, G.K. Rohde, Cancer diagnosis by nuclear morphometry using spatial information, Pattern Recognit. Lett. 42 (2014) 115–121. [22] B. Gopinath, B.R. Gupta, Majority voting based classification of thyroid carcinoma, Procedia Comput. Sci. 2 (2010) 265–271. [23] A. Daskalakis, S. Kostopoulos, P. Spyridonos, D. Glotsos, P. Ravazoula, M. Kardari, I. Kalatzis, D. Cavouras, G. Nikiforidis, Design of a multi-classifier system for discriminating benign from malignant thyroid nodules using routinely H&E-stained cytological images, Comput. Biol. Med. 38 (2) (2008) 196–203. [24] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans. Syst. Man Cybern. 9 (1) (1979) 62–66. [25] K. Kennedy, R. Eberhart, Particle swarm optimization, in: in: Proc. of the IEEE International Conference on Neural Networks, vol. 4, 1994, pp. 1942–1948. [26] Z. Pawlak, Rough sets, Int. J. Comput. Inf. Sci. 11 (5) (1982) 341–356. [27] Z. Pawlak, J. Grzymala-Busse, R. Slowinski, W. Ziarko, Rough sets, Commun. ACM 38 (11) (1995) 88–95. [28] J. Komorowski, L. Polkowski, A. Skowron, Rough sets: a tutorial, in: Rough Fuzzy Hybridization – A New Trend in Decision Making, 1998, pp. 3–98. [29] W. Ziarko, Variable precision rough set model, J. Comput. Syst. Sci. 46 (1) (1993) 39–59. [30] W. Ziarko, Set approximation quality measures in the variable precision rough set model, in: in: Proc. of Hybrid Intelligent Systems (HIS ’02), Santiago, Chile, 2002, pp. 442–452. [31] R. Milton, V.U. Maheshwari, A. Siromoney, Studies on rough sets in multiple tables, in: Proc. of 10th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC), Regina, Canada, 2005, pp. 265–274. [32] D. Slezak, Rough sets and bayes factor, in: J. Peters, A. Skowron (Eds.), Transactions on Rough Sets III, Vol. 3400 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2005, pp. 202–229. [33] P.D. Sathya, R. Kayalvizhi, PSO based tsallis thresholding selection procedure for image segmentation, Int. J. Comput. Appl. 5 (4) (2010) 39–46. [34] K. Hammouche, M. Diaf, P. Siarry, A comparative study of various metaheuristic techniques applied to the multilevel thresholding problem, Eng. Appl. Artif. Intell. 23 (2010) 676–688. [35] G. Pedram, S.C. Micael, B.J. Atli, M.F.F. Nuno, An efficient method for segmentation of images based on fractional calculus and natural selection, Expert Syst. Appl. 39 (16) (2012) 12407–12417. [36] R. Kulkarni, G. Venayagamoorthy, Bio-inspired algorithms for autonomous deployment and localization of sensor nodes, IEEE Trans. Syst. Man Cybern. C: Appl. Rev. 40 (6) (2010) 663–675.

13

[37] A.A. Jothi, M.A. Rajam, Segmentation of nuclei from breast histopathology images using PSO-based Otsu’s multilevel thresholding, in: L.P. Suresh, S.S. Dash, B.K. Panigrahi (Eds.), Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, Vol. 325 of Advances in Intelligent and Soft Computing, Springer, 2014, pp. 835–843. [38] W. Wang, J.A. Ozolek, G.K. Rohde, Detection and classification of thyroid follicular lesions based on nuclear structure from histopathology images, Cytometry A 77A (5) (2010) 485–494. [39] C. Demir, B. Yener, Automated cancer diagnosis based on histopathological images: a systematic survey, Technical report, Department of Computer Science, Rensselaer Polytechnic Institute, USA, 2005. [40] R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Trans. Syst. Man Cybern. SMC-3 (6) (1973) 610–621. [41] A.E. Hassanien, A. Abraham, J.F. Peters, G. Schaefer, C. Henry, Rough sets and near sets in medical imaging: a review, IEEE Trans. Inf. Technol. Biomed. 13 (6) (2009) 955–968. [42] E. Hassanien, A. Abraham, J.F. Peters, J. Kacprzyk, Rough sets in medical imaging: Foundations and trends, in: G. Schaefer, J.J. Aboul Ella Hassanien (Eds.), Computational Intelligence in Medical Imaging Techniques and Applications, CRC Press, 2001, pp. 47–88. [43] E. Hassanien, A. Abraham, J.F. Peters, G. Schaefer, Rough sets in medical informatics applications, in: J. Mehnen, M. Kppen, A. Saad, A. Tiwari (Eds.), Applications of Soft Computing, Vol. 58 of Advances in Intelligent and Soft Computing, Springer-Verlag Berlin Heidelberg, 2009, pp. 23–30. [44] P. Pattaraintakorna, N. Cerconeb, Integrating rough set theory and medical applications, Appl. Math. Lett. 21 (1) (2008) 400–403. [45] S. Widz, K. Revett, D. Slezak, A rough set-based magnetic resonance imaging partial volume detection system, in: in: Proc of Pattern Recognition and Machine Intelligence, 2005, pp. 756–761. [46] A.E. Hassanien, J.M. Ali, Rough set approach for classification of breast cancer mammogram images, in: in: Proc. of Fuzzy Logic and Applications. 5th International Workshop (WILF), Naples, Italy, 2003, pp. 224–231. [47] X. Dong, L. Fuyun, Research and application of CT image mining based on rough sets theory and association rules, in: in: Proc. of 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), 2010, pp. 392–394. [48] J. Han, M. Kamber, Data Mining Concepts and Techniques, Elsevier, 2011. [49] L.K. Soh, C. Tsatsoulis, Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices, IEEE Trans. Geosci. Remote Sens. 37 (2) (1999) 780–795. [50] J. Gu, J. Chen, Q. Zhou, H. Zhang, L. Ma, Quantitative textural parameter selection for residential extraction from high-resolution remotely sensed imagery, Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. XXXVII (Part B) (2008) 1371–1376.

Please cite this article in press as: J. Angel Arul Jothi, V. Mary Anita Rajam, Effective segmentation and classification of thyroid histopathology images, Appl. Soft Comput. J. (2016), http://dx.doi.org/10.1016/j.asoc.2016.02.030

1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103