Deep feature learning for soft tissue sarcoma classification in MR images via transfer learning

Deep feature learning for soft tissue sarcoma classification in MR images via transfer learning

Accepted Manuscript Deep Feature Learning For Soft Tissue Sarcoma Classification In MR Images Via Transfer Learning Haithem Hermessi , Olfa Mourali ,...

8MB Sizes 0 Downloads 71 Views

Accepted Manuscript

Deep Feature Learning For Soft Tissue Sarcoma Classification In MR Images Via Transfer Learning Haithem Hermessi , Olfa Mourali , Ezzeddine Zagrouba PII: DOI: Reference:

S0957-4174(18)30745-0 https://doi.org/10.1016/j.eswa.2018.11.025 ESWA 12324

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

22 April 2018 16 November 2018 17 November 2018

Please cite this article as: Haithem Hermessi , Olfa Mourali , Ezzeddine Zagrouba , Deep Feature Learning For Soft Tissue Sarcoma Classification In MR Images Via Transfer Learning, Expert Systems With Applications (2018), doi: https://doi.org/10.1016/j.eswa.2018.11.025

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

HIGHLIGHTS  Deep learning-based MRI classification of soft tissue sarcoma of the extremities  Impact of Type2 fuzzy entropy MRI fusion in the shearlet domain on transfer learning  The depth of knowledge transfer soft tissue sarcomas MRI classification

ACCEPTED MANUSCRIPT

Deep Feature Learning For Soft Tissue Sarcoma Classification In MR Images Via Transfer Learning Haithem Hermessi1, Olfa Mourali2 and Ezzeddine Zagrouba3

1. Introduction The recent deluge of the success of deep learning (DL) in computer vision tasks has led to a richness of applications in medical imaging. Classification is one of the fields that leverage major research contribution in computer-aided diagnosis. It has been recently approached by burgeoning deep learning architectures and has achieved significant results. In medical image processing, in literature known as exam classification (Litjens et al., 2017), typically feeding one or more input images to recuperate one single diagnostic variable as output, exhibiting the presence of disease or not, in the case of binary classification. In contrast, object or lesion classification engages local regions of the image and global contextual information for accurate classification. Deep learning architectures refer to a class of machine learning models able to learn hierarchical features by constructing highlevel information from low-level ones. Markedly, visual understanding tasks have been investigated by emerging deep architectures such as Restricted Boltzmann Machines (RBMs), Autoencoders (EAs), Sparse Coding (SC) and Convolutional Neural Networks (CNNs). Nevertheless, CNNs have been reported to be the notable deeper networks that demonstrated effectiveness for image content understanding proffering state-of-the-art results on image segmentation, detection, retrieval and classification (Guo et al., 2016). Thus, several typical CNN models were proposed in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) classification competition (Russakovsky et al., 2015) for instance AlexNet, VGGNet, and GoogLeNet. In fact, those models were trained on large-scale well-annotated dataset, notably unavailable in the medical domain (millions vs. hundreds/thousands of samples). Accordingly, transfer learning (TL) has been applied to bridge the gap between datasets stemming from multiple domains by efficaciously transferring knowledge, specifically from the natural domain to the medical domain (Greenspan, Van Ginneken, & Summers, 2016). The usual approach is to copy the first n layers of the pre-trained network to the first n layers of a target network. In light of that, transfer learning can be regrouped in two approaches: (1) fine-tuning a pre-trained model using medical data, and (2) adopt a pre-trained model as a feature extractor which is more suitable since no training is required (Litjens et al., 2017). In the same way, a

AC

CE

PT

ED

M

AN US

Abstract—Medical image analysis is motivated by deep learning emergence and computation power increase. Meanwhile, relevant deep features can significantly enhance learnable expert and intelligent systems performance and reduce diagnosis time and arduousness. This paper presents a deep learning-based radiomics framework for aided diagnosis of soft tissue sarcomas of the extremities. MR Images with histologically confirmed Liposarcoma (LPS) and Leiomyosarcomas (LMS) have been retrieved from the Cancer Imaging Archives database and pre-processed to recuperate ROIs from MR scans with delineated tumors. This study investigates the significance and impact of medical image fusion on deep feature learning based on transfer learning from the natural domain to the medical domain. Towards this end, we propose to fuse T1 with T2FS or STIR modalities using type-2 fuzzy sets in the nonsubsampled shearlet domain. Being decomposed, low-frequency sub-images were selected using local energy and type-2 fuzzy entropy, while high frequencies were selected according to the maximum of the absolute value. Experimental results indicated that the proposed fusion framework outperformed the state-of-the-art fuzzy logic-based fusion techniques in terms of entropy and mutual information. Accordingly, we fine-tuned the pre-trained AlexNet deep convolutional neural network (CNN) with stochastic gradient descent (SGD). First, with the pre-processed dataset, and second with the fused images. As a result, the average classification accuracy using the augmented training data by image rotation and flipping was 97.17% with the raw data and 98.28% with the fused images, which highlighted the usefulness of complementary information for deep feature learning. One crucial concern was to investigate the depth of knowledge transferability. We incrementally fine-tuned the pre-trained CNN to assess the required level that achieves performance improvements in STS classification. Through layer-wise fine-tuning, our study further confirms the potential of middle and deep layers in performance improvement. Moreover, the transferability was concluded better than random weights. With the encouragement of classification results, our aided diagnosis framework may be in the pipeline to assist radiologists in classifying LPS and LMS.

CR IP T

University of Tunis El Manar - Intelligent Systems in Imaging and Artificial Vision Research Team (SIIVA) Laboratory of Informatics, Modeling and Information and Knowledge Processing (LIMTIC) Higher Institute of Computer Science, Ariana, Tunisia [email protected] [email protected] [email protected]

Keywords—Convolutional Neural Networks, Soft tissue sarcoma, Multi-modal medical image fusion and classification, Type-2 fuzzy logic, Transfer learning.

ACCEPTED MANUSCRIPT

CE

PT

ED

M

AN US

Soft tissue sarcomas (STS) describe a category of rare histologically cancers (approximately 1%) that develop in different soft tissues such as muscles, fat or nerves. More than 50 histologic subtypes are recognized by the American Cancer Society. However, liposarcoma and leiomyosarcoma are reported to be the most frequently seen. In 2016, 40% of patients suffering from STSs die and 25% of them develop distant metastases (Farhidzadeh et al., 2017). Furthermore, Segal in (Segal et al., 2003) classified subclasses of STS based on functional genomics and support victor machines. Hence, gene expression has proved to be efficient not only in STS classification but also in targeting new therapies. Under those circumstances and due to its noninvasive and prevalent nature, medical imaging (MRI, CT, and PET) has assisted efficiently the computer aided diagnosis of STSs specifically in primary tumors investigation where radiotherapy and chemotherapy treatment aims to not jeopardizing the patient chance of a curative operation. However, there have been few studies in the past to enhance the diagnosis of STS. In particular, Mayerhoefer in (Mayerhoefer et al., 2008) showed that texture analysis revealed informative for the differentiation of benign and malignant STSs in MR scans. From a machine learning perspective, Juntu et al. in (Juntu, Sijbers, De Backer, Rajan, & Van Dyck, 2010) studied the performance of three shallow classifiers in texture analysis features extracted from T1weighted MRI scans to discriminate between malignant and benign STSs. Authors concluded that a small dataset of less than one hundred images is sufficient for such a standard machine learning-based classification task. Furthermore, in (Farhidzadeh et al., 2015) prediction of STS progression was approached by analyzing sub-regions of MRIs where textural features are selected in order to differentiate metastatic vs. non-metastatic tumors according to three classifiers. Whereas, standard machine learning techniques require specific feature extractors for each type of data. From a deep learning perspective, Rhabdomyosarcoma (RMS) a pediatric brain STS, was classified into subtypes (embryonal vs. alveolar) based on a pre-trained deep CNN model fine-tuned with multiparametric MR scans (Banerjee, Crawley, Bhethanabotla, Daldrup-Link, & Rubin, 2017).

rithms include pyramid transform, wavelets transform, contourlet transform, curvelet transform and shearlet transform (Jiang, Jin, Lee, & Yao, 2017). However, pyramid-based methods suffer from blocking effects around edges, while wavelets do not well capture directional information, additionally to the lack of shiftinvariance because of the down-sampling process (Tirupal, Mohan, & Kumar, 2017). To overcome such limitations, shearlets (Easley, Labate, & Lim, 2008) have been reported to provide anisotropic property when dealing with sharp transitions and geometric features due to the unlimited number of directions. Another concern intersects with the fusion rule that seeks to maintain relevant features and restrain insignificant ones. Several algorithms have been used in the literature such as the maximum and averaging rule, principal component analysis and fuzzy logic (Du et al., 2016). Indeed, medical images are constrained to fuzzy effects related to poor illumination. Consequently, fuzzy set theory has been adopted to remove uncertainty of luminance and contrast (Ramya & Sujatha, 2016). Equally important, the combination of fuzzy logic and MSD approaches has been reported to improve the effects of feature analysis. Recent literature has approached medical image fusion by fuzzy sets (Tirupal, Mohan, & Kumar, 2017; Ramya & Sujatha, 2016) and MSD algorithms in the fuzzy domain such as stationary wavelets transform (Jiang et al., 2017), non-subsampled contourlet transform (Yang, Que, Huang, & Lin, 2016) and non-subsampled shearlet transform (Jiang, Jin, Hou, Lee, & Yao, 2018).

CR IP T

recent survey (Tajbakhsh et al., 2016) was dedicated to study the effectiveness of knowledge transfer in medical imaging. It revealed that classification results based on pre-trained models with adequate fine-tuning outperform CNNs trained from scratch in various medical applications and with different imaging modalities, notably in tumor identification and staging.

AC

With the intention to provide relevant and complementary information for CNN training, multi-modal medical image fusion can play a crucial role. Since there image fusion in the medical context has established as an important research issue to extract and fuse maximum of relevant features through three levels; pixel, feature, and decision. A detailed overview of multi-modal medical image fusion is referenced in (Du, Li, Lu, & Xiao, 2016). Existing methods include, but are not restraint to, sparse representation methods, pyramid methods, and multiscale decomposition (MSD) based methods. Typically, transform algo-

The remainder of this paper is organized as follows. Section 2 entails the state-of-the-art in transfer learning-based medical image analysis with highlighted related works on soft tissue sarcomas aided diagnosis. Section 3 presents the proposed fusion methods, materials, and the pre-trained CNN fine-tuning process. Section 4 puts forward the results, comparisons with fusion methods of the literature as well as the classification results performed by the deep CNN on both datasets. Limitations and future recommendations are also addressed. Finally, conclusions and future works are summarized in section 5. 2. Transfer learning in CNN based medical imaging analysis Large-scale medical training data is a requirement that should be satisfied to train deep learning models designed for aided diagnosis. This challenge may be difficult to meet in the medical domain where diseases are rare in datasets and expert annotation is expensive. Accordingly, fine-tuning a pre-trained CNN has been imposed as an alternative to training from the scratch, implementing the idea of knowledge transfer from a source domain with large scale annotated datasets to the target domain with limited labeled data. A recent study (Tajbakhsh et al., 2017) proposed a layer-wise fine-tuning of different pretrained CNNs (AlexNet (Krizhevsky, Sutskever, & Hinton, 2012) and deeper architectures VGGNet (Simonyan & Andrew Zisserman, 2015) and GoogleNet (Szegedy et al., 2015)) to examine the impact of fine-tuning depth on knowledge transfer. It has been observed that the desired level differed from one medi-

ACCEPTED MANUSCRIPT

subsampled shearlet domain. Then, in a transfer learning fashion, we fine tune the pre-trained AlexNet CNN twice, with retrieved scans and with fused MR images within a goal to shed light on the impact of medical image fusion on deep feature learning. The depth of knowledge transfer is experimentally investigated with the fused images in order to explore the necessary level that achieves improvement in classification performance. We incrementally fine-tuned the pre-trained model starting from lower layers to upper layers.

3.1. Dataset

CR IP T

3. Proposed method and material The emerging CNN models have demonstrated effectiveness in understanding image content without relying on any handcrafted features and large-scale datasets (Gao et al., 2017). Towards this end, we propose to classify two STS subtypes, Liposarcoma and Leiomyosarcoma based on transfer learning. First, by MR scans (T1, STIR and T2FS), second by fused MR scans. We fused T1 with T2FS or STIR scans based on Type-2 fuzzy logic in the shearlet domain. Then, we fine-tuned the AlexNet deep CNN with fused MRIs on one hand and with retrieved scans on the other hand.

AN US

cal application to another. Thereupon, intermediate fine-tuning was concluded sufficient to achieve optimal performance for colonoscopy frame classification, while deeper levels were suitable for polyp detection. Thus, the optimal depth should take into account the challenging problems of convergence and overfitting. The main advantage of deeply fine-tuned CNNs is the rapid convergence, while intermediate levels may lead to undesirable local minimum. In addition, deep levels outperform intermediate when limited training data are available. Moreover, the performance gap between fine-tuned CNNs and trained from scratch increases when reducing the size of the training datasets (Tajbakhsh et al., 2017). Similarly, authors in (Tajbakhsh et al., 2016) further confirmed that CNN layer wise fine-tuning, either shallow or deep, depends on the application and the amount of available labeled data. Another key point was to quantify the transferability of features from each layer in CNN in which the study (Yosinski, Clune, Bengio, & Lipson, 2014) reveals their degree of generality or specificity, therefore evaluating where the transition takes place. Consequently, experiments emphasize that transferability is negatively leveraged by; (1) specializing higher layers to the original task at the cost of the performance on the target task, and (2) splitting networks in the middle of fragilely co-adapted layers making the optimization more difficult. Transferring features from distant tasks, however, still performs better than randomly initialized weights.

AC

CE

PT

ED

M

From an aided-diagnosis perspective, Banerjee in (Banerjee et al., 2017) adapted the transfer learning approach in which the pre-trained AlexNet model has been fine-tuned on fused multimodal MR scans for RMS soft tissue sarcoma classification. In fact, on a limited sized dataset the main hypothesis of knowledge transfer, despite the disparity between natural and medical images, was motivated by two factors. The first is supposed to be an advantage of earlier CNN layers which contain more generic features useful for any image analysis task that strengthening the generalization property. While the second supports the idea that initialization of random weights has a high chance of overfitting, one of the most DL challenging problems. Other CAD tasks were also approached by knowledge transfer with CNNs such as segmentation (Tajbakhsh et al., 2017; Wang, MacKenzie, Ramachandran, & Chen, 2016), prognosis (Nie, Zhang, Adeli, Liu, & Shen, 2016), and retrieval (Margeta, Criminisi, Cabrera Lozoya, Lee, & Ayache, 2017). Mainly, knowledge adaptation is viewed as a classification problem, particularly joined with a fusion process of architectures (Gao, Hui, & Tian, 2017) or multi-modalities (Banerjee et al., 2017). Notably, the fusion of architectures outperforms the fusion of multi-modalities. However, the latter is less time-consuming and less sensitive to overfitting. Moreover, TL has also investigated diagnostic knowledge sharing between radiologists by developing transferable deep features in a domain adaptation fashion (Shen et al., 2016). This paper develops a soft tissue sarcoma computer aided diagnosis (CAD) framework that classifies liposarcoma (LPS) and Leiomyosarcomas (LMS) STS subtypes. We propose to fuse preregistered MRI based on type-2 fuzzy logic in the non-

One key advantage of MRI is to better discriminate soft tissues, added to its ability to capture tumor changes and heterogeneity (Farhidzadeh et al., 2017). The Cancer Imaging Archive1 (TCIA) database, publically available, is retrospectively retrieved. The approved dataset is composed of MR/PET/CT scans of 51 patients with histologically proven STS of the extremities (Vallières, Freeman, Skamene, & Naqa, 2015). In this study, a dataset2 of 21 patients is analyzed, 11 with pathologically confirmed Liposarcomas (LPS) which arise in deep soft tissue fat cells (age 29-82 years) and 10 with Leiomyosarcomas (LMS) affecting muscle cells (age 24-83 years). The cohort includes 12 males and 9 females scanned during a median follow-up period of 31 months. For each patient, the ground truth differentiating histopathological subtypes is confirmed by surgery. Tumors were localized in primary sites comprehending the thigh, the biceps and the pelvis. We propose to analyze three types of MRI sequences used for training as in clinical protocols, namely T1weighted (T1), T2-weighted fat-saturated (T2FS) and short tau inversion recovery (STIR). For all T1 sequences, the acquisition was performed in the axial plane, while T2FS and STIR sequences were acquired in different orientations (axial, sagittal and coronal). Additionally, we note that MR scans slice thickness was 5.5 mm for T1 and 5 mm for both T2-weighted fatsaturated and STIR. The in-plane resolution was 0.63 mm2, 0.74 mm2 and 0.86 mm2 for T2FS, T1 and STIR scans, respectively (Vallières et al., 2015).

1 2

https://wiki.cancerimagingarchive.net/display/Public/Soft-tissue-Sarcoma http://doi.org/10.7937/K9/TCIA.2015.7GO2GSKS

ACCEPTED MANUSCRIPT

3.2. Data preparation and ROI extraction

TABLE I.

LPS LMS

T1 676 472

MULTIMODAL DATASET

T2FS 457 409

STIR 219 63

Fig. 2. Examples ROI with segmented tumors

3.3. Proposed multimodal MR fusion method

CR IP T

Medical image fusion is the progression by which two or more images of the same or different modalities are merged aiming to enhance features and improve quality with respect to similar anatomical position and cells (Du et al., 2016). Fusion techniques are broadly regrouped into two families; spatial domain and transform domain fusion methods. Specifically, in the NSST domain, source images are decomposed into sub-images, then fusion rules are performed to combine coefficients. At last, the fused image is obtained by applying the inverse of the transform. Indeed, the preprocessed dataset regrouped three MR sequence types where T2FS and STIR have similar textures, different from T1 texture (Vallières et al., 2015). In light of that, we propose to fuse ROIs of T1 scans with either T2FS or STIR images. ROIs are resized to 227x227 to fit AlexNet input size. The fused scans will be used to fine-tune the pre-trained CNN for LPS and LMS classification, put forward to assess the impact of image fusion on deep feature learning.

AN US

The retrieved dataset includes 4338 MR scans, 2357 scans include LPS and 1981 scans with LMS of the extremities. Tumors were manually delimited by an expert radiation oncologist on T2FS scans then propagated to T1 and STIR based on a rigid registration (Vallières et al., 2015). Furthermore, patients with confirmed LMS and LPS were diagnosed with a tumor and some of them with visible edema around tumors. Contours were provided as RTstruct DICOM objects for all patients and sequences. We first extract them from DICOM metadata in order to delineate the tumor region. Fig. 1. shows examples of 2D slices of different MRI types (Fig. 1 (a)-(b)-(c) with LPS and Fig. 1 (d)(e)-(f) with LMS) where the visible edema (highlighted by a red box) encapsulating the tumor (highlighted by a yellow box) are both demarcated. Images were resized to 227 x 227 pixels. We note that only slices including tumors are kept from the whole volume, likewise, we are interested in contours specifying tumor masses only. TABLE I. provides a summary of the cohort among different MRI types after being pre-processed. For more reliability, segmented scans are automatically cropped to restrain the 2D region of interest (ROI) using the bounding box. Fig. 2. shows examples of ROI with segmented tumors.

1352 944

(a)

(c)

AC

(d)

CE

PT

(b)

ED

M

3.3.1.

(e)

(f)

Fig. 1. Examples of annotated slices: (First row) LPS patient with

visible edema (highlighted by red box) and tumors (highlighted by yellow box), (a) T1; (b) T2FS and (c) STIR. (Second row) LMS patient (d) T1; (e) T2FS and (f) STIR.

Non-subsampled shearlet transform

Shearlets (Kutyniok & Labate, 2012) are equipped with a rich mathematical structure that enables it to optimally sprase representing sharp transitions. In dimension , it is an affine group with composite dilations as written in Eq. 1: ( )

{

( )

|

|

.

/

} (1) (

( ) is the shearlet atom, a, s and t are is Where the scale, the shear and the translation parameters, respectively.

(

) stands for the scaling matrix and . / is √ the shear matrix, both invertible. Matrix controls the scale decomposition while controls the orientation decomposition. Mainly, shearlets constitutes a tight frame for ( ) of well localized waveforms at different scales and directions. Readers may refer to references (Easley et al., 2008) and (Kutyniok & Labate, 2012) for more theoretical background. However, shearlet transform with its subsampling scheme is less sensitive to shiftinvariance responsible for Gibbs phenomenon (Jiang et al., 2018). To overcome this limitation, the non-subsampled shearlet transform (NSST) was proposed to eliminate sub-sampling effects (Easley et al., 2008). As shown in Fig. 3. it consists of two steps; the non-subsampled Laplacian pyramid filters (NSLP) which ensures the multiscale decomposition and shearing filters

1 )

ACCEPTED MANUSCRIPT

that performs the directional decomposition. Moreover, to exploit the GPU capability during the fusion process we used the GPUdiscrete shearlet transform framework (Gibert, Patel, Labate, & Chellappa, 2014) (ShearGPU3) and the GPU engine for Matlab (GPUmat4).

characterized based on a reciprocal parameter controlling the footprint of uncertainty (FOU) which adds an extra dimension for the type-1 FS aiming to better characterize uncertainty (Yang et al., 2016). In fact, a recent study (Biswa, & Sen, 2015) has concluded that good experimental results were given with , -. Fig. 4. illustrates the FOU of Gaussian MF of interval type-2 fuzzy set. Basically, ( ) ⋃ depicts the union of all primary MFs.

Fig. 3. Schematic diagram of image decomposition process with NSST

3.3.2.

Type-2 fuzzy sets

[

( )]

( )

[

( )]

(4)



{(

( ))|

M

AN US

Fuzzy set theory was proposed by Zadeh (Zadeh, 1965), and has been successfully applied to image fusion due to its particularity in modeling uncertainty. As a decision operator, it has the ability to quantify the affiliation relationship of pixels (coefficients) according to a membership function (MF) laying in the interval , - where values represent different membership degrees (Yang et al., 2016). In general, a fuzzy logic model comprises a fuzzifier, inference engine, defuzzifier, fuzzy set and fuzzy rules (Yang et al., 2016). Significantly, considering a set * +, a fuzzy set of (Jiang et al., 2017) can be mathematically expressed in Eq. 2 as:

( )

CR IP T

{

}

(2)

{((

)

CE

PT

ED

, - is the membership degree (MF) of ( ) Where . Indeed, the non-membership degree is given as ( ). However, conventional fuzzy set MFs (type-1) are not flexible in modeling and minimizing certain uncertainties due to its precision and because they are totally crisp. To cope with this challenges, interval type-2 fuzzy sets (IT-2FS) have been reported to better handle more uncertainties (Wu, 2013). Hence, IT-2FS is lying in an interval , - and can be alternatively defined using type-1 MFs as: (

))|

, (

)

-

}

4

AC

http://www.umiacs.umd.edu/~gibert/ShearCuda.zip http://sourceforge.net/projects/gpumat/

and upper membership functions.

3.3.3.

Fusion rule of low frequency sub-images

Low-frequency sub-images ( ) with size ( are the input images and , represent the coefficient position), given by the NSST, are approximation of I in the coarse scale. We assume that ( ) can be efficiently combined based on type-2 fuzzy sets since it depicts large features with uncertainty and vagueness around edges in low frequencies. Another assumption lies in the fact that image pixel distribution can be viewed as a normal one following the same behavior as a Gaussian distribution (Eq. 5) (Jiang et al., 2018). In this work, a new fusion strategy is presented exploiting the advantages of local image information in the shearlet domain. We propose to extend the application of interval type-2 Gaussian MF (T2GMF) to fuzzify low frequency sub-images considered as type-2 fuzzy sets used to calculate the importance degree of ( ). We choose to better symmetrically enveloping the conventional GMF aiming to better represent the fuzzy degrees of the NSST coefficients.

(3)

( ) is the secondary type-2 MF, and are priWhere ( ) and mary variable and secondary variable respectively. ( ) are lower and upper MF of the IT-2FS that encapsulates the Type-1 MF. Basically, lower and upper MFs (Eq. 4) can be 3

Fig. 4. Type-2 interval Gaussian membership function with its lower

(

(

(

) )

(

)

)

(

(

) )

.

(

(

) )/

(

(

) )

.

(

(

) )/

(5) ⁄

(6) (7)

Upper and lower MFs are defined in Eq. 6 and Eq. 7 where is the corresponding mean value of the low frequency subband and

ACCEPTED MANUSCRIPT

(

)

( )

∑(

( ))

∑ * |

(

( )

( ))

(

( )

( ))

( )

( )

(8)

(

∑∑.

+ (Eq. 8).

) )/

(

(

))

(

(

(

))/

))

∑ ∑|

{

) (

(

)

)

(

∑∑

(

.

(

( (

))

(

))

(

(

))/

(

))/

(10)

3.3.4.

(

)

(

(

)

)

(11)

Fusion rule of high frequency subimages

NSST decomposition process produces a series of high fre) denote the quency sub-images ( ) ( ) where ( scale and th direction. To retain more boundary information, fused high frequency coefficients ( ) ( ) are selected according to the larger absolute value as follows: (

)(

)

{

(

)(

) |

(

)(

) |

(

)(

)|

|

(

)(

)|

|

(

)(

)|

(

)(

)|

(12)

( ) and ( ) (Eq. 12) are NSST high frequency coefficients of input images A and B respectively. Therefore, Fig. 5. outlines the block diagram of the multimodal fusion process.

CE

PT

ED

M

Where m x n denotes the size of the window (m=n=3) centered on the low frequency coefficient. On the other hand, it is evident that most of image energy lies in the low frequencies (Yang et

)|

)

(9) .

(

Therefore, to obtain better low-pass subband that maximally preserve information, we use regional fuzzy entropy (Eq. 9) and local energy (Eq. 10) to choose fused coefficients ( ) according to Eq. 11:

AN US

(

(

(

Where Basically, best fused coefficients correspond to higher fuzzy entropy (Yang et al., 2016). Bear in mind the definition of Zheng (Zheng & Yin, 2013), we propose to expand the fuzzy entropy of the T2GMF to the 2D image plane introduced as follows: .

al., 2016). To this extent, we exploit the neighborhood correspondence between local features in a window size of m x n. Local energy is written as:

CR IP T

denotes its standard deviation. Furthermore, the concept of fuzzy entropy, proposed by Zadeh in (Zadeh, 1968), is an important topic in the theory of fuzzy systems to describe the fuzziness degree of a fuzzy set. Several research issues have discussed the fuzzy entropy of IT-2FS. However, Deng in (Deng, Wang, Wang, & Sheng, 2012) have improved previous versions which can be expanded for a 2D image plane as follows:

Fig. 5. The overall block diagram of the proposed fusion scheme

AC

3.4. Transfer learning and model fine-tuning The basic motivation behind transfer learning lies in the fact that knowledge can be successfully transferred from large scale datasets known as source domain to the medical domain with limited sized datasets despite the disparity between fields with reduced overfitting. Thus, we propose to re-train the AlexNet

deep CNN classifier (1000 classes trained with of 1.2 million images) (Krizhevsky et al., 2012) to fine-tune parameters with back-propagation (BP) based on both the pre-processed dataset and the fused MR dataset. We would like also to quantify the impact of image fusion on the performance of transfer learning. The network is trained twice. First, with 2296 preprocessed images (T1, T2FS and STIR), and second with 1148 fused images

ACCEPTED MANUSCRIPT

(13) Where

denotes the momentum variable at iteration ,

M

ED

PT

CE

6

AC

5

https://developer.nvidia.com/cuda-downloads https://developer.nvidia.com/cuDNN

is the

derivative of the softmax function with respect to weights and is the learning rate. For evaluation, we used 5-fold cross validation strategy to select training and validation sets. For each of the 5 folds, feature selection and classification were performed using 4 folds as training set, and the aided diagnostic ability was evaluated using the 5th validation fold. In order to reduce overfitting, we adopted the dropout regularization method (Krizhevsky et al., 2012) where output of a hidden neuron with probability 0.5 is set to zero which eliminates their contribution in the forward and BP.

AN US

The AlexNet CNN (Krizhevsky et al., 2012) was trained with 1.2 million images of the ImageNet dataset with 1000 class labels. With 60 million parameters, it contains eight learned layers; five convolutional and three fully-connected (conv1-conv2conv3-conv4-conv5-fc6-fc7-fc8). All convolutional and fullyconnected layers are followed by rectified linear unit (ReLU) activation function. Max-pooling sub-sampled outputs of the first, the second and the last convolutional layers. Notably, we keep parameters (weights and biases) of all layers to fine-tune them with our datasets except the last fully-connected (FCL) layer. Consequently, we randomly initialized the last FCL with the Xavier algorithm (Glorot & Bengio, 2010), while biases are initialized to zero. In fact, Xavier algorithm adaptively initializes weights according to the number of input and output neurons which helps the CNN to converge since wrong initialization will make gradients too large or too small leading to difficult update of weights. Furthermore, we change the size of output nodes in the last FCL layer to fit the two subtypes STS classification objective (LPS and LMS) in order to feed them to the softmax classification layer. Nevertheless, the learning rate is one of the most important parameters that impacts convergence. Hence, choosing an optimal learning rate for different layers is a challenging problem that is under research (Smith, 2017; Singh, De, Zhang, Goldstein, & Taylor, 2016). In this work, we initially set the learning rate to 0.001 for all layers except the modified one, and we set it to 0.01 in the last two layers. Throughout training, we drop it by a factor of 10 after 30 epochs. Furthermore, we also reduce to half the learning rate during training based on the fused dataset. Equally important, we normalize training data by centering it around zero mean which helps CNN to converge faster (LeCun, Bottou, Orr, & Müller, 1998). To do so, we subtracted the average image from training images. We fine-tuned the pre-trained CNN weights by back-propagation. Towards this end, stochastic gradient descent (SGD) is used to minimize the

loss function during 60 epochs with a batch size of 50 in the first fine-tuning and 30 in fused dataset-based fin-tuning. The momentum is set to 0.9 and a weight decay is 0.0005. Weights are updated with the following rule:

CR IP T

produced by the proposed fusion method. An Amzon Virtual Machine (AWS Instance p2.xlarge) equipped with NVIDIA Tesla K80 GPU was used (2496 parallel processing cores and 12 GB of GPU memory). We trained the CNN on Matlab R2017a with the deep learning MatConvNet framework (Vedaldi & Lenc, 2014) accelerated by the CUDA5 parallel computing platform and the Deep Neural Network GPU-accelerated library (cuDNN6). In order to prevent overfitting, we further artificially augmented training datasets by two angles of rotation, horizontal and vertical flipping for each rotation. Thus, datasets size was increased by X9. A total of 12168 examples in the LPS class and 8496 examples in the LMS class, meanwhile the half of fused scans in both classes for the fused MRIs dataset. Fig. 6. shows examples of augmented data of LPS scans (Fig. 6. (a)) and LMS scans (Fig. 6. (b)). The columns present horizontal and vertical flipping for each rotation, while rows illustrate rotation according to 45o and 90o.

CR IP T

ACCEPTED MANUSCRIPT

AN US

(a)

(b)

Fig. 6. Examples of augmented data: (a) Liposarcoma (LPS) and (b) Leiomyosarcomas (LMS)

4.1. Multimodal MR fusion method assessment 4.1.1.

Experimental settings

M

4. Experimental results and discussion In this section, detailed experiments on multimodal MR image fusion and classification of STS based on the fine-tuned deep AlexNet CNN are performed. Impact of fine-tuning level on knowledge transfer and STS classification is also discussed

AC

CE

PT

ED

The effectiveness of the proposed fusion method is verified subjectively and objectively. T1, T2FS and STIR scans are of the same size and originally pre-registered. The comparison methods are: Type-2 Fuzzy logic (T2FL), Type-2 fuzzy logic in nonsubsampled contourlet domain (T-2FS-NSCT) (Yang et al., 2016), Type-2 near fuzzy-based technique (T2FNF) (Biswa, & Sen, 2015) and Neuro-fuzzy in NSCT (NF-NSCT) (Das & Kundu, 2013). Objective assessment is performed using some widely used evaluation metric such as entropy (EN), the mean standard deviation (STD), mutual information (MI), spatial frequency (SF) and edge-based similarity measure (QAB/F). EN measures the amount of information of the resultant image, larger EN means more abundant information. While STD reflects the spreading of gray levels, also depicts the contrast quality. MI describes the amount of information preserved from the source image. SF reflects the level of clarity and returns the resolution of the image. QAB/F considers the amount of edge information preserved from input images. We refer the reader to reference (Jagalingam & Hegde, 2015) for more details about quality metrics used for objective assessment. Since source codes of com-

pared methods are not shared, we choose a pair of pre-registered CT and T1-weighted MR scans conventionally used by above techniques, as illustrated in Fig. 7. 4.1.2.

Comparative results and discussion

Visual assessment of the fusion scheme can be conducted through Fig. 7. Results present good visual representation of tissue density and soft tissues information in the fused image except the T2FL method. It is clearly observed that our fusion method produces informative and clear image with good contrast. However, there are a slight difference in boundary preservation and contrast between the given methods. From images in Fig. 7. (d) and Fig. 7. (e), a reduction of contrast can be observed in brightness compared with that in source images in Fig. 7. (a) and Fig. 7. (b). Images in Fig. 7. (f), Fig. 7. (d) and Fig. 7. (g) are more likely to retain contrast. Obviously, blurred edges are given by the T2FN technique. At all, little visual difference can be concluded between results of the proposed method, T-2FSNSCT and NF-NSCT. In addition to the subjective evaluation, TABLE II. provides quantitative performance in term of EN, STD, MI, SF and QAB/F. Through the significantly higher EN value, it can be clearly concluded that more information will be retained by our algorithm. Moreover, greater special frequency compared with that of T2FNF scheme indicates more detailed information with more activity and clarity level, what proving the advantages of the MSD tools. Larger MI is obtained by the proposed method which transferred more information from source images than did the other methods. Meanwhile, MSDbased method gives comparable QAB/F and STD with slight difference. Pretty much similar STD shows that these methods

ACCEPTED MANUSCRIPT

produce the same contrast which can be found in visual results. While competitive QAB/F values indicate conveyed edge information about source images. TABLE II.

COMPARISON OF DIFFERENT METHODS WITH THE PROPOSED FUSION ALGORITHM

MI 1.79 4.56 2.27 4.39 4.98

(b) MRI

(c) T2FL

(d) T-2FS-NSCT

(e) T2FNF

(f) NF-NSCT

ED

M

(a) CT

CE

PT

(g) Proposed method

AC

4.2.1.

QAB/F 0.92 0.78 0.69 0.78 0.77

SF 0.49 6.74

4.2. STS subtypes classification resuslts and discussion Fine-tuning with the retrieved dataset

Learning process of the CNN with retrieved MR images can be analyzed through Fig. 8. It can be observed that the validation error consistently decreases with the training error along the 60 epochs which slightly high values of the validation error which indicates that no overfitting is observed. The network converges at epoch 50 where objective function curve and error curve begin to saturate. Mainly, each training epoch took around one minute using the GPU and cuDNN. The training and validation performance of the CNN during 60 epochs are shown in Fig. 8. (a). By terminating the training process, the CNN fine-tuning ends with a validation error of and a training error of , accordingly the training accuracy is about and the validation accuracy is about . The confusion matrix relative to the retrieved images validation set is illustrated in TABLE III. LPS and LMS accuracy rate classification gives and respectively. We bring about of average validation accuracy during overall fine-tuning with variance as shown in TABLE IV. This reflects that no overfitting has been encountered from one hand. On the other hand, the transfer learning approach has successfully transferred knowledge from the source to the target domain despite the limited number of training images. In fact, deep extracted features with the fine-tuned parameters look to be representative for STS subtype (LPS and LMS) classification. Consequently, the finetuned model can provide beneficial diagnostic interpretation of the MR scans by efficiently discriminating LPS from LMS.

CR IP T

STD 81.06 60.87 58.25 58.91

provided by T2FL, T2FS-NSCT, T-2FNF, NF-NSST and the proposed method.

AN US

T2FL T-2FS-NSCT T2FNF NF-NSCT PROPOSED METHOD

EN 2.58 6.37

Fig. 7. Comparative analysis of visual results of CT and MRI

TABLE III.

CONFUSION MATRIX FOR RETRIEVED MRS VALIDATION SET

LPS LMS

LPS

LMS

2396 69 Average

38 1624

Accuracy rate (%)

98.43 95.92 97.17

(a)

CR IP T

ACCEPTED MANUSCRIPT

(b)

Fig. 8. Learning curves (error rate and objective function) of the fine-tuning using the raw MR scans

4.2.2.

Transfer learning on fused MR images

AC

CE

PT

ED

M

AN US

The second training process has dealt with the fused MR scans based on the proposed fusion method (section 3.3). The number of examples is half of the pre-processed dataset (10332 images after being augmented), but with more information inside each image as depicted by the high entropy and mutual information values. Fig. 9. shows the evaluation performance of the fine-tuning during 60 epochs using the fused MRIs. The objective function is decreasing in both training and validation with slightly higher value of validation error at the end of the training process. Training took about half an hour. Basically, no severe overfitting is observed during 60 epochs since training and validation errors thinly scale down with superiority of the validation error as depicted by Fig. 9. (a). After 60 epochs, the validation accuracy equals which is higher than the accuracy given by the raw images-based fine-tuning ( ). Accordingly, the confusion matrix of the validation data is given in TABLE V. which presents classification of two classes LPS and LMS indicating and respectively. Despite the small number of example and even if LPS class contains more examples then LMS, complementary information bring back by the fusion method has proved to be very useful for learning discriminative features to classify LPS and LMS due to the small batch size and the 5-folds cross validation. The CNN converged rapidly when fine-tuned with the fused images. It can be concluded from Fig. 9. (a)-(b) that around epoch 40 learning curves saturated along with the objective, further rapid convergence than the first tune-tuning process, also we just need 50 epochs to get higher performance. TABLE IV. indicates that with only half of

the number of retrieved images, 60 epochs fine-tuning of the deep CNN appears to achieve an average classification accuracy about slightly lower than when using the preprocessed dataset. Low variance shows that overfitting was avoided. Typically, being inspired by the visual cortex deep hierarchical structures (Liu et al., 2017), feature learning with fused images can benefit from soft tissue information merged in fused MRIs. The impact on how such a deep feature learning model can exploit complementary information and knowledge transfer can also be explored in Fig. 10. The classification accuracy based on the fused MRI appear to be higher and more stable by terminating the 60 epochs, but with more fluctuations at the beginning of the training which could be explained away by the small batch size. However, the pre-processed data achieved more gradually stable increase due to the more sufficient data used as validation set and the large enough batch size. The fluctuations of the accuracy have stabilized from epoch 40 when the CNN begins to converge. Moreover, we investigated the behavior of the first convolutional filters which are looking directly on raw data by visualizing their weights. Fig. 11. shows learned filters of the first convolutional layers after being fine-tuned with both datasets. It appears that filters are smooth and without any noisy patterns which indicates that both datasets have conducted efficient fine-tune to the deep CNN with suitable regularization leading to nicely convergence and without falling in overfitting problems. Under those circumstances, the proposed CAD system may be in the pipeline to assist radiologists in classifying STS subtypes (LPS and LMS) because of the encouraging classification results obtained from fine-tuning of deep CNN based on the fusion method.

CR IP T

ACCEPTED MANUSCRIPT

(a)

(b)

ED

M

AN US

Fig. 9. Learning curves (error rate and objective function) of the fine-tuning using the fused MR scans

AC

CE

PT

Fig. 10. Classification accuracy of fused and raw dataset

(a)

(b)

ACCEPTED MANUSCRIPT

Fig. 11. First convolutional layer learned filters: (a) fine-tuning based on raw data,

(b) fine-tuning using fused MR scans. COMPARISON OF CLASSIFICATION RESULTS ON FUSED AND RETRIEVED DATASETS (MEAN±VARIANCE) AT OVERALL TRAINING Examples

Average Classification Accuracy (%)

20664 10332

90.5±0.08 88.9±0.013

Retreived dataset Fused Images TABLE V.

CONFUSION MATRIX FOR FUSED MRS VALIDATION SET

LPS LMS

4.2.3.

LPS

LMS

1201 18 Average

15 807

Accuracy rate (%)

to the target domain are learned for classification of STS subtypes. Despite the limited number of examples, features transferred from a different task are better than randomly initialized weights. Significant performance degradation is observed when middle layers are updated (conv4-fc7) and lowest performance is observed when only the fc-6 layer is fine-tuned. Thus, the third and the fourth convolutional layers can be considered as key points in transferability along with the higher specific layers (fc7-fc8). Performance increased when more convolutional layers was included in fine-tuning process. 4.2.4.

98.76 97.81 98.28

Evaluation of transfer learning depth

Limitations and perspectives

Considering the experimental results of our type-2 fuzzy logic-based image fusion along with the transfer learning-based MR classification, it is clearly observed that the image fusion approach helps to improve efficiency of the deep vision task. Correspondingly, the multi-resolution decomposition with NSST has realized improvements, specifically with the fuzzification of the low frequency sub-bands where most information is concentrated. The highest entropy depicts more information in the fused image compared with others, in addition to more activity and clarity level given by highest spatial frequency value. In light of that, complementary information given by the proposed fusion method showed its effectiveness in knowledge transfer. STS subtypes differentiation has progressed compared with the finetuning using the raw MRI dataset.

AC

CE

PT

ED

M

AN US

We experimentally assess the level of knowledge transfer in a layer-wise manner aiming to evaluate the effect of incremental fine-tuning on how transferrable are features, from one hand, and to depict the necessary depth that achieves performance improvement in STS subtypes classification. Basically, earlier CNN layers extract more generic features such as edges and curves useful for any image analysis task, while last layers build more abstract concepts specific to the target task. We incrementally fine-tuned the network with the fused images starting from lower layers to upper ones (conv1-conv2-conv3-conv4-conv5fc6-fc7). The last FCL (fc8) is not considered by fine-tuning since it has been always randomly initialized to fit the classification goal. Towards this aim, we randomly initialized layer weights being learned from scratch using the Xavier algorithm (Glorot & Bengio, 2010). Alternatively, the rest of layer weights are initialized with the pre-trained AlexNet weights. We considered LPS class as positive and LMS class as negative and we evaluated the model on the cross-validation set. In order to find the optimal depth of fine-tuning, we used the accuracy, the sensitivity and the specificity among different training scenarios. Sensitivity measures the true positive rate, while specificity indicates the true negative rate. Due to the GPU capabilities, each training scenario took about 34 minutes. TABLE VI. shows measured values of accuracy, sensitivity and specificity at each level of transfer learning. We can clearly observe that accuracy decreases with the fine-tuning depth as well as sensitivity and specificity. Specifically, the more the level is randomly initialized the less knowledge is transferred from source to target domain. Consequently, the less discriminative features appropriate

CR IP T

TABLE IV.

TABLE VI.

However, limitations of this study lie in two folds. First, the limited number of examples in both training cases relatively has driven the model to overfit during first epochs of fine-tuning, which constrain us to decrease the learning rate and increase the number of training epochs in order to help the model converging. The second is that the learning rate strategy based on a naïve heuristic is not efficient in the transfer learning approach where manual co-adaptation is performed. Future direction would better optimize the learning rate through layers and training process by adopting cyclical learning rate (Smith, 2017) or layer specific adaptive learning rate (Singh, De, Zhang, Goldstein, & Taylor, 2016). We are also planning to classify other STS subtypes in different modalities like MRI and PET. In future work, we will explore the semantic segmentation of STS images based on CNNs. Another field of concern will investigate STS stadification based on other deep learning architectures (VGGNet (Simonyan & Andrew Zisserman, 2015) and GoogleNet (Szegedy et al., 2015)).

PERFORMANCE OF DIFFERENTS FINE-TUNING LEVELS TO DEPICT THE OPTIMEL DEPTH: ACCURACY, SENSITIVITY AND SPECIFICITY Fine-tuning depth

conv1-fc7 conv2-fc7

Accuracy

Sensitivity

Specificity

0.9872 0.9741

0.9950 0.9776

0.9791 0.9706

ACCEPTED MANUSCRIPT

0.9504 0.9392 0.8822 0.9304 0.9011

(TCIA) for hosting medical images of STS accessible for public download. CONFLICT OF INTEREST Authors certify that there is NO conflict of interest in relation to this work. REFERENCES

Banerjee, I., Crawley, A., Bhethanabotla, M., Daldrup-Link, H. E., & Rubin, D. L. (2017). Transfer learning on fused multiparametric MR images for classifying histopathological subtypes of rhabdomyosarcoma. Computerized Medical Imaging and Graphics, 65, 167–175. https://doi.org/10.1016/j.compmedimag.2017.05.002 Biswas, B., & Sen, B. K. (2015). Medical image fusion technique based on type2 near fuzzy set. In 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 102–107). https://doi.org/10.1109/ICRCICN.2015.7434218

ED

M

AN US

5. Conclusion We proposed an aided diagnosis system that concerns classification of liposarcoma (LPS) and leiomyosarcoma (LMS) STS of the extremities. We adopt the transfer learning approach to fine-tune the deep CNN AlexNet using two datasets. The first dataset includes T1, T2FS and STIR MR images retrieved from The Cancer Imaging Archive getting pre-processed to extract ROIs with segmented tumors, while the second is fused scans of the first ones. The main objective was to highlight the effect of medical image fusion on knowledge transfer and deep feature learning. We proposed a fusion method based on type-2 fuzzy logic in the shearlet domain that has outperformed the state-ofthe-art based on fuzzy logic in term of entropy and mutual information. It has been shown that relevant complementary information provided by our fusion method was very useful for deep feature learning-based STS classification with an average classification accuracy of better when fine-tuning using the retrieved raw data achieving . Results indicate the potential of image fusion in deep feature learning. Thus, both findings make the proposed CAD system an efficient tool to assist radiologists in decision making. Our experiments further confirm the efficiency of knowledge transfer from the natural to the medical domain despite the limited dataset size. Transfer learning depth was explored by incrementally fine-tuning the deep CNN in a layer-wise manner in order to investigate the necessary level that improves the classification performance. Our study revealed that transferability has proved to be better than random weights. Also, middle convolutional layer along with deep layers are more likely to be fine-tuning for LPS and LMS classification improvement.

0.9712 0.9662 0.9835 0.9862 0.9615

CR IP T

0.9608 0.9527 0.9328 0.9583 0.9313

conv3-fc7 conv4-fc7 conv5-fc7 fc6-fc7 fc7

AC

CE

PT

Future works will be oriented towards aided diagnosis of different others STS subtypes such as Fibrosarcoma along with different modalities like PET with the aim to expand the expert system field. Being interested in deeper architectures, we are also planning to fine-tune deeper CNNs in order to study the compromise between depth and accuracy. Another future direction would propose a new method to optimize the learning rate through layers based on theoretical backgrounds of the layerspecific adaptive learning rate. We would also like to extend our framework to investigate other clinical problems such segmentation of STS tumors in multimodal images based on transfer learning. ACKNOWLEDGMENT

Authors would like to thank RosettaHUB team and AWS for providing instance p2.xlarge equipped with Tesla K80 GPU. We would like also to acknowledge The Cancer Imaging Archive

Das, S., & Kundu, M. K. (2013). A neuro-fuzzy approach for medical image fusion. IEEE Transactions on Biomedical Engineering, 60(12), 3347– 3353. https://doi.org/10.1109/TBME.2013.2282461 Deng, T.-Q., Wang, Z.-J., Wang, P.-P., & Sheng, C.-D. (2012). Study on fuzzy entropy of type-2 fuzzy sets. Control and Decision, 27(3), 408–412. Du, J., Li, W., Lu, K., & Xiao, B. (2016). An overview of multi-modal medical image fusion. Neurocomputing, 215, 3–20. https://doi.org/10.1016/j.neucom.2015.07.160 Easley, G., Labate, D., & Lim, W. Q. (2008). Sparse directional image representations using the discrete shearlet transform. Applied and Computational Harmonic Analysis, 25(1), 25–46. https://doi.org/10.1016/j.acha.2007.09.003 Farhidzadeh, H., Goldgof, D. B., Hall, L. O., Gatenby, R. A., Gillies, R. J., & Raghavan, M. (2015). Texture Feature Analysis to Predict Metastatic and Necrotic Soft Tissue Sarcomas. In 2015 IEEE International Conference on Systems, Man, and Cybernetics (pp. 2798–2802). https://doi.org/10.1109/SMC.2015.488 Farhidzadeh, H., Goldgof, D. B., Hall, L. O., Scott, J. G., Gatenby, R. A., Gillies, R. J., & Raghavan, M. (2016). A quantitative histogram-based approach to predict treatment outcome for Soft Tissue Sarcomas using pre- and post-treatment MRIs. In 2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016 - Conference Proceedings (pp. 4549– 4554). https://doi.org/10.1109/SMC.2016.7844948 Gao, X. W., Hui, R., & Tian, Z. (2017). Classification of CT brain images based on deep learning networks. Computer Methods and Programs in Biomedicine, 138, 49–56. https://doi.org/10.1016/j.cmpb.2016.10.007 Gibert, X., Patel, V. M., Labate, D., & Chellappa, R. (2014). Discrete shearlet transform on GPU with applications in anomaly detection and denoising. Eurasip Journal on Advances in Signal Processing, 2014(1), 64. https://doi.org/10.1186/1687-6180-2014-64 Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Vol. 9, pp. 249–256). https://doi.org/10.1.1.207.2059

ACCEPTED MANUSCRIPT

Greenspan, H., Van Ginneken, B., & Summers, R. M. (2016). Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique. IEEE Transactions on Medical Imaging, 35(5), 1153–1159. https://doi.org/10.1109/TMI.2016.2553401

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y

Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual understanding: A review. Neurocomputing, 187, 27– 48. https://doi.org/10.1016/j.neucom.2015.09.116

Segal, N. H., Pavlidis, P., Antonescu, C. R., Maki, R. G., Noble, W. S., DeSantis, D., … Cordon-Cardo, C. (2003). Classification and subtype prediction of adult soft tissue sarcoma by functional genomics. American Journal of Pathology, 163(2), 691–700.

Jiang, Q., Jin, X., Hou, J., Lee, S., & Yao, S. (2018). Multi-sensor Image Fusion Based on Interval Type-2 Fuzzy Sets and Regional Features in Nonsubsampled Shearlet Transform Domain. IEEE Sensors Journal, 18(6), 2494–2505. https://doi.org/10.1109/JSEN.2018.2791642 Jiang, Q., Jin, X., Lee, S.-J., & Yao, S. (2017). A Novel Multi-Focus Image Fusion Method Based on Stationary Wavelet Transform and Local Features of Fuzzy Sets. IEEE Access, 5, 20286–20302. https://doi.org/10.1109/ACCESS.2017.2758644 Juntu, J., Sijbers, J., De Backer, S., Rajan, J., & Van Dyck, D. (2010). Machine learning study of several classifiers trained with texture analysis features to differentiate benign from malignant soft-tissue tumors in T1-MRI images. Journal of Magnetic Resonance Imaging, 31(3), 680–689.

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (pp. 1–14). https://doi.org/10.1016/j.infsof.2008.09.005 Singh, B., De, S., Zhang, Y., Goldstein, T., & Taylor, G. (2016). Layerspecific adaptive learning rates for deep networks. In Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015 (pp. 364–368). IEEE. https://doi.org/10.1109/ICMLA.2015.113 Smith, L. N. (2017). Cyclical learning rates for training neural networks. In Proceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017 (pp. 464–472). IEEE. https://doi.org/10.1109/WACV.2017.58

AN US

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances In Neural Information Processing Systems 25 (pp. 1–9). Curran Associates, Inc. https://doi.org/http://dx.doi.org/10.1016/j.protcy.2014.09.007

Shen, W., Zhou, M., Yang, F., Dong, D., Yang, C., Zang, Y., & Tian, J. (2016). Learning from experts: Developing transferable deep features for patientlevel lung cancer prediction. In 2016 International Conference on Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 (Vol. 9901 LNCS, pp. 124–131). Springer, Cham.

CR IP T

Jagalingam, P., & Hegde, A. V. (2015). A Review of Quality Metrics for Fused Image. Aquatic Procedia, 4, 133–142. https://doi.org/10.1016/j.aqpro.2015.02.019

Kutyniok, G., & Labate, D. (2012). Introduction to Shearlets. In Shearlets: Multiscale Analysis for Multivariate Data (pp. 1–38). Boston: Birkhäuser Boston. https://doi.org/10.1007/978-0-8176-8316-0_1

LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K. R. (1998). Efficient backprop. In Neural Networks: Tricks of the Trade (Vol. 1524 LNCS, pp. 9–50). Springer, Berlin, Heidelberg. https://doi.org/https://doi.org/10.1007/3540-49430-8_2

M

Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., … Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005

ED

Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26. https://doi.org/10.1016/j.neucom.2016.12.038

PT

Margeta, J., Criminisi, A., Cabrera Lozoya, R., Lee, D. C., & Ayache, N. (2017). Fine-tuned convolutional neural nets for cardiac MRI acquisition plane recognition. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 5(5), 339–349. https://doi.org/10.1080/21681163.2015.1061448

CE

Mayerhoefer, M. E., Breitenseher, M., Amann, G., & Dominkus, M. (2008). Are signal intensity and homogeneity useful parameters for distinguishing between benign and malignant soft tissue masses on MR images? Objective evaluation by means of texture analysis. Magnetic Resonance Imaging, 26(9), 1316–1322. https://doi.org/10.1016/j.mri.2008.02.013

AC

Nie, D., Zhang, H., Adeli, E., Liu, L., & Shen, D. (2016). 3D deep learning for multi-modal imaging-guided survival time prediction of brain tumor patients. In 2016 Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 (Vol. 9901 LNCS, pp. 212–220). Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_25 Ramya, H. R., & Sujatha, B. K. (2016). A novel approach for medical image fusion using fuzzy logic type-2. In 2016 International Conference on Circuits, Controls, Communications and Computing (I4C) (pp. 1–5). IEEE. https://doi.org/10.1109/CIMCA.2016.8053286

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 07–12–June, pp. 1–9). https://doi.org/10.1109/CVPR.2015.7298594 Tajbakhsh, N., Shin, J. Y., Gurudu, S. R., Hurst, R. T., Kendall, C. B., Gotway, M. B., & Liang, J. (2016). Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? IEEE Transactions on Medical Imaging, 35(5), 1299–1312. https://doi.org/10.1109/TMI.2016.2535302 Tajbakhsh, N., Shin, J. Y., Gurudu, S. R., Todd Hurst, R., Kendall, C. B., Gotway, M. B., & Liang, J. (2017). On the necessity of fine-tuned convolutional neural networks for medical imaging. In L. Lu, Y. Zheng, G. Carneiro, & L. Yang (Eds.), Advances in Computer Vision and Pattern Recognition (pp. 181–193). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-42999-1_11 Tirupal, T., Mohan, B. C., & Kumar, S. S. (2017). Multimodal medical image fusion based on sugeno’s intuitionistic fuzzy sets. ETRI Journal, 39(2), 173–180. https://doi.org/10.4218/etrij.17.0116.0568 Vallières, M., Freeman, C. R., Skamene, S. R., & El Naqa, I. (2015). A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities. Physics in Medicine and Biology, 60(14), 5471–5496. https://doi.org/10.1088/0031-9155/60/14/5471 Vedaldi, A., & Lenc, K. (2014). MatConvNet - Convolutional Neural Networks for MATLAB. In Proceedings of the 23rd ACM international conference on Multimedia - MM ’15 (pp. 689–692). New York, New York, USA: ACM Press. https://doi.org/10.1145/2733373.2807412 Wang, J., MacKenzie, J. D., Ramachandran, R., & Chen, D. Z. (2016). A Deep Learning Approach for Semantic Segmentation in Histology Tissue Images. In 2016 International Conference on Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 (Vol. 9901 LNCS, pp. 176–184). Springer, Cham. https://doi.org/10.1007/978-3-319-467238_21

ACCEPTED MANUSCRIPT

Wu, D. (2013). Two differences between interval type-2 and type-1 fuzzy logic controllers: Adaptiveness and novelty. Studies in Fuzziness and Soft Computing, 301, 33–48. https://doi.org/10.1007/978-1-4614-6666-6-3 Yang, Y., Que, Y., Huang, S., & Lin, P. (2016). Multimodal Sensor Medical Image Fusion Based on Type-2 Fuzzy Logic in NSCT Domain. IEEE Sensors Journal, 16(10), 3735–3745. https://doi.org/10.1109/JSEN.2016.2533864 Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems 27 (pp. 3320–3328). https://doi.org/10.1109/IJCNN.2016.7727519

Zadeh, L. A. (1968). Probability measures of Fuzzy events. Journal of Mathematical Analysis and Applications, 23(2), 421–427. https://doi.org/10.1016/0022-247X.(68)90078-4

AC

CE

PT

ED

M

AN US

Zheng, G., & Yin, S. W. (2013). An Entropy of Interval Type-2 Fuzzy Sets. Applied Mechanics and Materials, 321–324, 1999–2003. https://doi.org/10.4028/www.scientific.net/AMM.321-324.1999

CR IP T

Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353. https://doi.org/10.1016/S0019-9958.(65)90241-X