NF-RCNN: Heart localization and right ventricle wall motion abnormality detection in cardiac MRI

NF-RCNN: Heart localization and right ventricle wall motion abnormality detection in cardiac MRI

Physica Medica 70 (2020) 65–74 Contents lists available at ScienceDirect Physica Medica journal homepage: www.elsevier.com/locate/ejmp Original pap...

1MB Sizes 0 Downloads 21 Views

Physica Medica 70 (2020) 65–74

Contents lists available at ScienceDirect

Physica Medica journal homepage: www.elsevier.com/locate/ejmp

Original paper

NF-RCNN: Heart localization and right ventricle wall motion abnormality detection in cardiac MRI Saeed Kermania,1, Mostafa Ghelich Oghlib,c,

⁎,1

T

, Ali Mohammadzadehc, Raheleh Kafiehd

a

School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran Research and Development Department, Med Fanavarn Plus Co., Karaj, Iran c Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Science, Tehran, Iran d Medical Image & Signal Processing (MISP) Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran b

A R T I C LE I N FO

A B S T R A C T

Keywords: Medical localization Cardiac magnetic resonance imaging Faster RCNN Convolutional networks

Convolutional neural networks (CNNs) are extensively used in cardiac image analysis. However, heart localization has become a prerequisite to these networks since it decreases the size of input images. Accordingly, recent CNNs benefit from deeper architectures in gaining abstract semantic information. In the present study, a deep learning-based method was developed for heart localization in cardiac MR images. Further, Network in Network (NIN) was used as the region proposal network (RPN) of the faster R-CNN, and then NIN Faster-RCNN (NF-RCNN) was proposed. NIN architecture is formed based on “MLPCONV” layer, a combination of convolutional network and multilayer perceptron (MLP). Therefore, it could deal with the complicated structures of MR images. Furthermore, two sets of cardiac MRI dataset were used to evaluate the network, and all the evaluation metrics indicated an absolute superiority of the proposed network over all related networks. In addition, FROC curve, precision-recall (PR) analysis, and mean localization error were employed to evaluate the proposed network. In brief, the results included an AUC value of 0.98 for FROC curve, a mean average precision of 0.96 for precision-recall curve, and a mean localization error of 6.17 mm. Moreover, a deep learning-based approach for the right ventricle wall motion analysis (WMA) was performed on the first dataset and the effect of the heart localization on this algorithm was studied. The results revealed that NF-RCNN increased the speed and decreased the required memory significantly.

1. Introduction Cardiovascular diseases (CVDs), resulting in a great deal of mortality and morbidity, are the number one cause of death in the world. As reported by the World Health Organization, more than 23 million annual deaths will occur due to CVDs by 2030 [1]. The most important factor in controlling and treating these types of diseases is an early diagnosis, which is achievable by utilizing various diagnostic tools such as imaging systems. Among various imaging modalities available to visualize the heart, magnetic resonance imaging (MRI) has demonstrated a strong capability for diagnosing CVDs and evaluating cardiac function. In recent decades, a wealth of approaches has been offered to extract clinically relevant information from Cardiac magnetic resonance (CMR) images [2,59] and echocardiography [60,61]. Determining a region of interest (ROI) centered on the heart decreases the computational load in all

these analyses. Heart localization is extremely useful in fully automated, and especially deep learning, applications although it is less likely to interest practicing clinicians. The main advantage of heart localization is increasing the efficiency of the forehead algorithms (such as segmentation or sequence processing etc.) by forcing the procedure to only face the heart and ignoring the organs nearby [3]. As shown in Fig. 1, localized heart can appear in different sizes and in the presence or absence of lung. Object detection and localization are two core tasks in computer vision as they are applied in many real-world applications, such as autonomous vehicles and robotics. Introducing region-based convolutional network (R-CNN) in 2014 made a major breakthrough in object detection [4]. R-CNN inspired several further methods such as fast RCNN [5], faster R-CNN [6], YOLO [7], SSD [8] and R-FCN [9]. In addition, R-CNN is considered a robust and accurate object detection technique in natural images [4]. It presents a simple and scalable



Corresponding author at: 10th St. Shahid Babaee Blvd. Payam Special Economic Zone, Karaj, Iran. E-mail address: [email protected] (M. Ghelich Oghli). 1 Both authors contributed equally to this manuscript. https://doi.org/10.1016/j.ejmp.2020.01.011 Received 7 June 2019; Received in revised form 31 December 2019; Accepted 9 January 2020 1120-1797/ © 2020 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

Fig. 1. Axial (top row) and short axis (bottom row) cardiac MR images. The yellow box delineates the localized heart. Variety in shape, size and position of the heart in different images makes the localization procedure difficult. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

considered the vectors as random processes. This profile was modelled by a Markov chain with unknown ordering. Finally, the locations of the salient points in the image were accumulated using a Hough transform [17] array to vote for the most likely center position and radius for the myocardium. This approach roughly allows for sketching a circle positioned on the left ventricle (LV). In another study, Pavani et al. proposed a method [18] based on a popular face detection approach [19] introducing an extension of Haar-like features [20]. In their study, optimal weights were assigned to the rectangles of the Haar-like features in order to optimize the weights. Three different techniques were suggested for computing optimal weights including genetic algorithms, brute-force search, and Fisher's linear discriminant analysis. In addition, Lin et al. devised a method for cardiac localization based on a Fourier analysis of the image [21]. This method was developed based on the fact that when there is a moving structure in an image sequence in the course of time, a pixel-wise Fourier transform along the time axis gives a series of harmonic images. Accordingly, processing the first harmonic image provides the possibility of localizing the heart since it is the only moving organ in an MR image. Pednekar et al. proposed an automated approach for segmenting myocardium in short axis CMR images [22]. In their method, a motion map was computed, which is the pixel by pixel intensity difference between two near ED phases of cardiac cycle frames. After applying a Hough transform, the obtained image contained circular region around the LV. Further, Zhong et al. [23] developed a spectrum-based tool for automatic LV localization. They used Fourier analysis on the image sequence, considering that the right ventricle (RV) represents a higher brightness than the LV in all harmonic images. Quantitative and qualitative analyses were conducted to determine different patterns of LV and RV, and then an anisotropic weighted circle Hough detection was performed on the first and fifth harmonic images.

detection algorithm, which can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals. The bottom-up strategy uses small-segmented areas, which are merged together and form larger segmented areas. R-CNN algorithm includes region proposal generation by using selective search algorithm [10], feature extraction by a convolutional neural network (AlexNet as mentioned in the original paper [11]) and finally linear Support Vector Machine (SVM) for classification [12]. Fast R-CNN [5] uses selective search for generating object proposals as well, but applies the feature extractor unit (CNN) on the whole image instead of extracting features from region proposals independently. This minor change makes the procedure faster and allows the model to be end-to-end differentiable [13] and easier to train. In faster R-CNN, as the third generation of R-CNN series [6], the selective search algorithm is eliminated and a region proposal network (RPN) is added to the network. RPN produces the objects according to an “objectness” score and makes the model completely trainable from scratch. Dai et al. [9] proposed R-FCN (region-based fully convolutional net), which was inspired by faster R-CNN. However, its features were taken from the last layer instead of using the same layer, where region proposals were predicted. This approach minimized the amount of perregion computation. Liu et al. [8] suggested SSD (single-shot detector), which performed region proposals and region classifications in a single shot. In the present study, a novel architecture based on faster R-CNN is proposed to localize the heart in CMR images. In the proposed network, the RPN of faster R-CNN was replaced by a novel convolutional network architecture [14] to represent CMR features more powerfully. This alternation was based on the following reasons. Faster R-CNN was opted since it is the most accurate object detector so far, based on a speed/accuracy trade-off study [15]. Our evaluation of CMR images revealed intrinsic complexities regarding the structure of anatomical organs in the images, where the gray levels of organs were close to each other. Network in Network (NIN) is able to represent these complexities in a nonlinear fashion. In the last stage of the present study, the effect of this RPN promotion was investigated by implementing a wall motion analysis (WMA) on CMR images. The results indicated a better performance in comparison with other R-CNN series.

2.2. Deep learning based approaches With the advent of deep learning-based approaches, several new methods have been developed for anatomical organ localization in medical images. Different parts of the body were considered in the methods including lung, liver, kidney, and neck in CT images [24–27], pulmonary embolism in ultrasound images [28,29], polyps in colonoscopy videos [30], and heart, aortic valve, and aortic arch in 3D ultrasound and MRI images [31], etc. To the best of our knowledge, no deep learning-based approach has been adopted for whole heart localization in CMR images. Therefore, some approaches to the left ventricle localization are discussed in the following section. In addition, some deep learning-based methods focused on localizing other organs are considered.

2. Related works 2.1. Traditional approaches Several approaches to cardiac localization have been proposed in previous studies. Jolly et al. [16], plotted four cross sections through the ventricle, formed their gray levels to a 1 × 100 vector, and 66

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

linear filter of the convolutional layer with a micro network, which is called “MLPCONV” by the authors his structure combination makes NIN capable of approximating more abstract representations of the hidden concepts. Traditional CNN architectures use linear filters to extract features from images. Therefore, conventional CNN implicitly makes the assumption that the latent concepts are linearly separable. However, a straight line may not always be capable of representing the features. In NIN, the convolution layer is replaced by a mini network, the so-called MLPCONV.

2.2.1. Left ventricle localization Due to the importance of LV in cardiac functionality, a large number of studies performed by deep learning-based methods conduct LV analysis. For instance, Avendi et al. [32] introduced a CNN-based approach for LVlocalization in cardiac MRI images. They used a CNN with a convolution layer with the size of 11 × 11 and depth of 100 filters and a pooling layer with the size of 6 × 6. Then, the pooled features were unpacked as a vector and the fully connected layer with 1024 outputs was used. The result was a mask with the size 32 × 32 centered on the LV, which specified the bounding box. In another study, Emad et al. [33] proposed a fully-automatic approach based on CNN for LV localization in short-axis MR images. Their method scanned the input image at four different scales, i.e. the pyramids of scales, in order to consider the different sizes of the heart. In addition, a six-layer CNN with different kernel sizes for feature extraction and a Softmax classifier layer [34] was used in the study. The proposed approach was tested on a public cardiac MRI dataset from York University [35], and accuracy, sensitivity, and specificity were used to evaluate the dataset.

3.1.2. A brief look at R-CNN series The basic idea behind the R-CNN is “recognition using regions”, which has been successfully tested for object detection [4]. R-CNN used convolutional neural networks for feature extraction and about 2000 region proposals were generated in the test phase, from each of which CNN extracted several fixed-length feature vectors. Then, each region was classified using linear support vector machine (SVM) [12]. In addition, the authors used image warping in order to change the sizes of proposals into fixed-size, which was simply resizing the region proposal to a fixed-size region without considering its original size. Fast R-CNN improves the detection speed through performing only one CNN over the entire image to extract the image features instead of performing 2000 CNNs over 2000 overlapping regions and replacing the SVM classifier with a softmax layer to avoid creating a new model for predictions. However, the selective search algorithm for generating region proposals is considered as the main drawback of R-CNN, which has not been improved in fast R-CNN. In order to resolve this issue, the authors of faster RCNN paper introduced a new convolution neural network, namely RPN, to be replaced with the selective search algorithm. RPN creates region proposals in three steps. First, a 3 × 3 kernel scans the feature map of the last layer in a CNN and maps it to a lower dimension. Next, RPN generates multiple possible regions or bounding boxes for the location of each kernel. Accordingly, different boxes, such as a tall, wide, and large box, are created in each location in the last feature map centered on the location. Then, an objectness score is assigned to each region. In the last step, a pooling layer, some fully-connected layers, a softmax classification layer, and bounding box regressor are added to the network.

2.2.2. Other organs localization In a study conducted by de Vos et al. [36] a prominent trait of CNNs, which is automatically finding hierarchical feature representations from raw images, was employed to localize heart, aortic arch, and descending aorta in 3D CT volume. They posed 3D organ detection as a 2D problem by 2D parsing the 3D CT volume. Further, Tajbakhsh et al. [28] tried to answer this question that among end-to-end training and fine tuning, which was more appropriate for medical image analysis with respect to the scarcity of the labelled data in medical context. They studied several aspects of medical image analysis and various imaging modalities. Despite the valuable subject of their study, it seems that they were entrapped in the limited case studies. Furthermore, Ghesu et al. [31] introduced marginal space deep learning (MSDL), which took advantage of object parametrization in hierarchical marginal spaces and automatic feature design in deep neural networks. In their proposed approach, the probabilistic boosting tree in marginal space learning was replaced with sparse adaptive deep neural networks and it cascaded the filtering for localization of aortic arch in 3D ultrasound and MRI images. 3. Materials and methods

3.1.3. Cardiac MRI feature extraction: NIN MLPCONV layer is considered as the key concept in NIN architecture. MLPCONV layer is introduced in the study of Lin et al. [14] since the authors believe that the convolution filter in CNN is a generalized linear model (GLM) with which the level of abstraction is low. Therefore, a more powerful nonlinear function could enhance the abstraction level. To this end, MLP was chosen in NIN, as a trainable universal function approximator [14], which was compatible with the structure of convolutional neural networks, and could be a deep model itself [39]. It could be considered as a 1 × 1 convolution, equivalent to a fully-connected layer in the context of CNNs [14]. MLPCONV consists of one convolution layer and two MLP layers (Fig. 2). Furthermore, NIN has a global average pooling, which averaged the last feature maps and prepared the input of the softmax layer. Table 1 presents a summary of NIN layers.

3.1. NIN faster R-CNN (NF-RCNN) 3.1.1. Overview of the proposed model The results of comparing networks confirmed that faster R-CNN is the most accurate object detection and localization algorithm in natural images [15]. While object detection and object localization are often used synonymously, they have different definitions. Localization is related to determining a region, usually a bounding box, in an image encompassing the target object while detection is associated with discovering whether an object exists a in an image or not. The present study describes the procedure of finding the heart in the CMR images as a localization problem. Despite the outstanding ability of faster R-CNN in object detection and localization, medical images, especially MRI, have intrinsic complexities due to the close gray level values of different structures and partial volume effects in a 3D object image [37], which restricts the ability of feature extraction in such networks. In order to address this issue, we proposed to implement faster R-CNN object detector with a more vigorous RPN instead of vgg-16 [38]used in original faster R-CNN. This alternative network should be able to deal with MR image complexities and adjacent slice effects and have a great performance similar to that of vgg-16 in the classification task. Among various potential CNNs, which can be used instead of vgg-16, NIN [14] has a performance like vgg-16 and contains a special structure, combining a convolution layer with multilayer perceptron (MLP). NIN replaces the

3.1.4. The proposed network architecture NIN has an outstanding ability in extracting features from images as well as a remarkable performance in image classification. The following published evidence supports this claim. 1- NIN + dropout + data augmentation has an 8.81% error rate for test set [14] of CIFAR 10 dataset [40], as the authors of original paper indicate. 2- NIN + batch normalization + dropout has an accuracy of 91.9% and vgg-16 + batch normalization + dropout has an accuracy of 67

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

Fig 2. A comparative Schematic of (a) linear convolution layer and (b) MLPCONV layer [14]. The linear convolution layer utilizes a linear operation, whereas the MLPCONV layer includes a micro network to abstract complex structures within a non-linear function.

92.45% [41]. In this regard, dropout [42] and batch normalization [43] are considered as two helpful tools for regularizing and normalizing CNNs which avoid overfitting. The above mentioned results and the existence of MLPCONV layer in the NIN architecture make NIN the best choice for replacing vgg-16 and constructing a more robust network, called NIN faster R-CNN (NFRCNN). As delineated in the faster R-CNN paper [6], the authors’ goal was to share computation of RPN with a fast R-CNN network [5]. Computation sharing is regarded as an effective technique which makes the region proposal fining procedure a nearly cost-free procedure. The authors slided a small network over the output of the last shared convolution layer and used 3 × 3 spatial windows of the convolution feature map as the input for this network. The spatial windows were mapped to a lower dimensional feature and were fed into a classification layer and a boxregression layer. In the proposed network in the present paper, i.e. NF_RCNN, cccp6 layer (the last layer of the third MLPCONV structure) was used as the feature map, where the RPN was disported. This layer has a high level of abstraction for fine details. The architecture of the proposed network is illustrated in Fig. 3. 3.2. Memory footprint and speed Fig 3. NF-RCNN architecture. NIN is used as the region proposal network in this model.

As mentioned in the introduction section, the heart localization algorithm is considered as a pre-processing step for further analysis such Table 1 Network In Network (NIN) architecture in detail. Layer Number

Layer name

Type

Input size

Kernel size

Number of kernels

stride

pad

Output size

1

input

image input layer

32 × 32 × 3









32 × 32 × 3

2 3 4 5 6

conv1 cccp1 cccp2 pool1 drop1

convolution layer MLP layer MLP layer max pooling layer dropout layer

32 32 32 32 15

3 192 160 96 96

5 1 1 3 –

× × × ×

5×3 1 × 192 1 × 96 3

192 160 96 96 –

1 1 1 2 –

2 0 0 0 –

32 32 32 15 15

7 8 9 10 11

conv2 cccp3 cccp4 pool2 drop2

convolution layer MLP layer MLP layer max pooling layer dropout layer

15 × 15 × 96 15 × 15 × 192 15 × 15 × 192 15 × 15 × 192 7 × 7 × 192

5 1 1 3 –

× × × ×

5×3 1 × 192 1 × 192 3

192 192 192 192 –

1 1 1 2 –

2 0 0 0 –

15 × 15 × 192 15 × 15 × 192 15 × 15 × 192 7 × 7 × 19 27 × 7 × 192

12 13 14 15

conv3 cccp5 cccp6 pool3

convolution layer MLP layer MLP layer average pooling layer

7 7 7 7

3 1 1 7

× × × ×

3×3 1 × 192 1 × 192 7

192 192 10 10

1 1 1 1

1 0 0 0

7 7 7 1

16

softmax

softmax layer

1 × 1 × 10









× × × × ×

× × × ×

32 32 32 32 15

7 7 7 7

× × × ×

× × × × ×

192 192 192 10



68

× × × × ×

× × × ×

32 32 32 15 15

7 7 7 1

× × × ×

× × × × ×

192 160 96 96 96

192 192 10 10

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

in the Introduction Section, the heart localization is not a useful independent CMR image analysis tool in clinical routine. However, it is quite helpful in reducing the computational load of the automatic approaches, such as left or right ventricle segmentation in short-axis view and WMA in long-axis and axial view. For this purpose, we provided two distinct datasets containing images from both short-axis and axial view.

as left and right ventricular segmentation and WMA. Therefore, it would be helpful to observe the impact of the NF-RCNN on the memory usage and speed of a WMA algorithm. To this end, we have prepared a comprehensive dataset (see section 5.1) for RV wall motion analysis using a well-known video classification algorithm, namely long-term recurrent convolutional network (LRCN) [44]. However, the existing problem has three classes: (1) Normal wall motion: the patient whose RV has no wall motion abnormality, (2) Akinesia: the patient with akinetic RV wall motion (localized lack of RV wall motion), and (3) Dyskinesia: the patient with dyskinetic RV wall motion (abnormal or impairment movement: instead of contracting in systole, the diskinetic part of myocardium inflates outward in systole). In LRCN, temporal features are learned by long short-term memory (LSTM) networks [45], which are a special type of recurrent neural networks (RNNs) capable of learning long-term dependencies. Video classification using LRCN works by passing each video frame through a feature extractor, i.e. CNN, in order to produce a fixed-length vector representation (ϕt). Then, the sequence model, i.e. LSTM, manages the feature space of video frames, which is produced by the CNN. To be more specific, T individual frames are fed into T CNNs and the resulted feature vectors are the input of a single-layer LSTM with 256 hidden units. Finally, the label of the class including normal, akinetic, and dyskinetic in our case was predicted. Original LRCN uses Alexnet model as its convolutional network. Alexnet architecture outperformed any existing method in ImageNet Challenge 2012. As mentioned earlier, cardiac MRI images have intrinsic complexities due to the anatomical geometrical features, leading us to use NIN instead of vgg-16 for our NF-RCNN approach. This is another reason which assured us of modifying the LRCN architecture by convolution layers with smaller field of view and more feature maps. Table 2 indicates our proposed architecture to be used as the modified LRCN. The effects of using NF-RCNN on the WMA algorithm in terms of the required memory and speed of algorithm are further discussed in the Results Section.

1- A comprehensive dataset containing 3250 axial CMR images with a matrix size of 208 × 170. The CMR dataset is obtained from CMR images for 65 patients with ARVD, considering that each patient has various axial images at different levels. The dataset was collected from Rajaei Cardiovascular Medical and Research Center of Iran University of Medical Sciences using an Avanto Siemens MR machine. The images included a variety of scales, gray levels, window centers, widths, and field of views. The proposed architecture acts as an object detector with both classification and regression operators since it is a modified version of faster R-CNN. Therefore, both positive and negative images were required, where a positive image means an axial CMR image containing the heart and negative image was obtained from the same plane but the lung covers the heart. In this regard, half of the images were positive (n = 1625) and the other half were negative. All images were normalized by applying zero mean and unit variance transformations. 2- A public dataset of short-axis cardiac MR images were provided by Andreopoulos et al. [35] at York University. The dataset contained cardiac MRI images for 33 patients, in addition to the ground truth manual segmentation of the LV endocardial and epicardial borders. Since we required a balanced dataset of positive and negative samples, 1320 images of the dataset from base to apex were used, including 660 images with the heart (positive), and 660 images without the heart (negative: from the apex). For each patient, there were 4D images consisting of 20 temporal steps in a cardiac cycle and 8–15 short-axis slices with a matrix size of 256 × 256.

4. Results In order to obtain more reliable evaluation results, a cross-validation technique was employed, indicating that all the 3250 images were first stratified and partitioned into 5 equally sized folds to ensure that each fold was a robust representative of the entire dataset. Next, 5 iterations of training and validation were performed so that one of the folds was kept for validation in each iteration and the remaining folds were used for the training process. During further iterations, other folds were considered as the test dataset. Eventually, the mean of evaluation results was reported.

First, this section discusses two datasets used for training and evaluating the proposed model. Then, the proposed model is assessed in terms of accuracy, runtime speed, and memory efficiency. Finally, the model is compared to the alternative techniques and the results are presented. 4.1. Training and evaluation data The network was trained and evaluated using two datasets including cardiac MR images in axial and short-axis view. As mentioned Table 2 Architecture of the modified LRCN model, used in this paper. Layer Number

Layer name

Type

Input size

Kernel size

Number of kernels

stride

pad

Output size

1

Input

2 3 4 5 6 7 8 9 10 11 12 13 14

conv1 conv2 pool1 conv3 conv4 pool2 conv5 conv6 pool3 flat1 drop1 lstm1 softmax

image input layer

35 × 35 × 3









35 × 35 × 3

convolution layer convolution layer max pooling layer convolution layer convolution layer max pooling layer convolution layer convolution layer max pooling layer flatten layer dropout layer LSTM layer softmax layer

35 × 35 × 3 18 × 18 × 3 18 × 18 × 3 9×9×3 9×9×3 9×9×3 4×4×3 4×4×3 4×4×3 2×2×3 1 × 1 × 384 1 × 1 × 384 1 × 1 × 384

7 3 2 3 3 2 3 3 2 – – – –

8 8 8 16 16 16 32 32 32 – – – –

2 1 2 1 1 2 1 1 2 – – – –

3 1 0 1 1 0 1 1 0 – – – –

18 × 18 × 3 18 × 18 × 3 9×9×3 9×9×3 9×9×3 4×4×3 9×9×3 9×9×3 2×2×3 1 × 1 × 384 1 × 1 × 384 1 × 1 × 384 1×1×3

69

× × × × × × × × ×

7 3 2 3 3 2 3 3 2

×3 ×3 ×3 ×3 ×3 ×3

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

NVIDIA GeForce GTX 970 with 1664 CUDA cores was used as the processing unit and MATLAB R2018b (9.5.0.944444) was employed as the software platform.

4.2. Data augmentation Both prepared and public datasets suffered from a critical issue, which was “insufficient size” for training NF-RCNN with a high number of parameters. Hence, we decided to enlarge the size of datasets utilizing augmentation algorithms. The main concern in the augmentation is selecting the best method. The produced data should appear in a way that real-world data would. Therefore, three augmentation techniques were applied:

4.5. Performance analysis and comparison with state-of-the-arts Since two different series of cardiac MRI datasets were utilized to train and evaluate the NF-RCNN, we adopted distinct approaches to assess network performance using each dataset. For the prepared dataset, the proposed model performance was compared with R-CNN, fast R-CNN, faster R-CNN, and NF-RCNN. This comparison was aimed to evaluate the proposed network since the prepared dataset was unique and, to the best of our knowledge, the present study was the first to analyze this dataset. Although there are some approaches that have worked on cardiac MRI localization using the second dataset, i.e. shortaxis slices from York University, the dataset was published for segmentation purpose and the ground truth for cardiac localization does not exist globally. Therefore, we had to prepare the localization ground truth for this dataset in order to compare other approaches with NFRCNN [33,48].

1- Rotation: As the heart appears in various angles in the CMR images, rotating the cropped image creates the appropriate frames. We rotated the cropped video frames by ± 5 and ± 10 degrees and produced new images. 2- Horizontal and vertical scaling: The original and rotated images were scaled by a random coefficient between 10 and 20% of the frame dimension. The scaling of images was performed horizontally and vertically and it augmented the dataset by a factor of three. 3- Brightness and contrast manipulation: We manipulated the brightness and contrast of the cropped images and their rotated and scaled versions in several steps. First of all, 20% of the average of the pixel values was added and subtracted from the pixels. Second, the images were filtered by an unsharp masking filter. Finally, the contrast was enhanced by Edge-aware local contrast manipulation [46].

4.5.1. Evaluation metrics Different evaluation metrics including free response operating characteristic (FROC) curve, precision and recall analysis [49], and mean localization errors were used in the present study. In order to have a comprehensive definition of FROC and precision-recall (PR) curves, four concepts derived from the confusion matrix should be defined first, including true positive (TP), true negative (TN), false positive (FP), and false negative (FN). These concepts need to be defined since the prediction results for each sample may or may not match the actual status of the sample in the classification. A criterion is required to define the positive and negative results in order to use the concepts in other computer vision tasks like object detection. A practical and sensible way is to use the Intersection over Union (IoU) of the predicted and actual bounding boxes to determine the level of success in detection [50]. Accordingly, a threshold of 0.75 for IoU was designated for using this measure in TP, TN, FP and FN calculations. The detection was marked as correct detection (positive) if the IoU exceeded the threshold and as incorrect detection (negative) if the IoU fell below the threshold.

The dataset was augmented by a factor of 100 by applying these techniques and the number of video frames reached a reasonable value for training. This “reasonable” term refers to the number of network parameters, which is 439567. By augmenting the videos, we obtained 37,900 videos or 947,500 frames, which was more than the double of network parameters. 4.3. Ground truth delineation procedure The training data of the cardiac localization algorithm needs to be labeled like any other supervised machine learning algorithm. In this case, the heart is delineated by a bounding box, which is determined by four digits: {xmin, ymin, width, and height}. Since two types of cardiac MR images, i.e. axial and short-axis views, were used in the present study, it should be mentioned that the bounding box in both axial and short-axis frames contained the heart only and no other surrounding organs. In the short-axis view, the basal to apex slices were considered, which were necessary for ventricular segmentation. Fig. 1 illustrates further details. As described in Section 5.1, the prepared datasets included the images containing the heart object and images without the heart object. For the negative images, i.e. CMR images not containing the heart object, we did not define the bounding box. Therefore, the four-digit vector was turned into an empty vector for these images.

4.5.2. FROC analysis FROC is inspired from ROC [51] curve, which indicates the true positive rate (TPR = sensitivity or recall) as a function of the false positive rate (FPR = 1-specificity). Therefore, the x-axis is defined as:

FPR =

FP TN =1− = 1 − specificity (FP + TN ) (FP + TN )

(1)

And the y axis is defined as:

TPR = 4.4. Implementation details and processing hardware

TP (TP + FN )

(2)

The ideal ROC curve is a step function corresponding to 100% sensitivity at every FPR. Thus, increasing the accuracy leads the ROC curve to move to the upper left corner of the graph. Region-based analysis can be summarized using FROC curves [52]. FROC curve is similar to ROC curve, except that the FPR on the x-axis is changed to the number of false positives per frames. The area under the curve (AUC) is a measure commonly derived from a FROC curve, which is an indication of the overall performance of the observer. The NF-RCNN is evaluated using FROC analysis and is compared with various networks. As mentioned in Section 4.4, NF-RCNN was compared with R-CNN, fast R-CNN, and faster R-CNN using dataset #1, and was compared with the previous studies [33,48] using dataset #2. FROC curves and corresponding AUCs are depicted in Fig. 4, indicating

The NIN was trained end-to-end by backpropagation stochastic gradient descent (SGD), similar to the original RPN in faster RCNN model. With respect to data augmentation, both datasets had a sufficient size to train a model from scratch. Therefore, all layers were randomly initialized using Gaussian distribution function with zero mean and 0.01 standard deviation. Since the cccp6 was used as the input of the extraction unit in the region proposal, the regions were pulled out from 192 feature maps with the size of 7 × 7. The NF-RCNN was trained with stochastic gradient descent (SGD) [47] with the momentum and weight decay of 0.9 and 0.0005, respectively. The training parameters in the learning phase included learning rate = 0.001, batch size = 64, max epoch = 100, and decay = 10–5. The network was trained through a graphics processing unit (GPU). 70

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

Fig 4. FROC curves. (a) Evaluation of NF-RCNN using the first dataset and comparison with RCNN series. (b) Evaluation of NF-RCNN using the second dataset and comparison with [30,40].

frames, were used for the algorithm of RV wall motion analysis. This section evaluates the effect of NF-RCNN on memory and speed of LRCN and the results are presented in Table 5. The LRCN was trained and evaluated on the images of dataset #1 with and without implementing the cardiac localization algorithm. In both cases, the memory used for training (in Mb) and the speed of predicting the class of the video (in video per minute) were calculated. It should be noted that the implementation hardware is similar to the system described in Section 5.3. As Table 5 shows, the memory decreased to about half of its initial value in the presence of NF-RCNN while the runtime speed increased significantly (11.3 vs. 5.9 videos per minute).

that NF-RCNN has the best performance among various approaches using both datasets. 4.5.3. Mean localization errors Localization error is defined as the absolute difference between the area positions of the predicted and ground truth bounding box. This metric was calculated for all instances and the average value was reported. Table 3 presents the mean localization error for cardiac localization using both datasets. The results revealed that our model achieved the best error in comparison with other approaches. 4.5.4. Precision and recall analysis The precision-recall (PR) curve shows the relationship between precision and recall which are defined by positive predictive value and true positive rate (sensitivity), respectively. Relation (3) indicates the positive predictive value.

precision =

TP (TP + FP )

4.5.6. The effect of data augmentation The augmentation makes the dataset size suitable for training NFRCNN. For evaluating the effect of data augmentation on the network accuracy, we trained and evaluated the network with our datasets in the following states.

(3)

1234-

In the object localization context, precision refers to the proportion of organs which are correctly localized, and recall refers to the proportion of the correct reported localizations [49]. In addition, mean average precision (MAP) was used to quantify the PR curve by a scaler value. MAP is the mean of average precisions (AP) for each class, and AP is calculated through the weighted mean of precisions achieved at each threshold. Fig. 5 compares the precision-recall curve of our proposed model with other approaches such as previous evaluations in Sections 5.5.2 and 5.5.3. The results indicated that NF-RCNN gained the highest MAP compared to the other approaches for both dataset #1 and dataset #2. The average precisions for different recall values for NF-RCNN compared with the related approaches are summarized in Table 4.

Pure dataset without any augmentation, the dataset + rotated images, the dataset + rotated images + scaled image, the dataset + rotated images + scaled image + brightness manipulated image.

In this regard, the data were augmented step by step and the network was evaluated at each step. This procedure was performed for both our prepared and public datasets [35]. Table 6 indicates the results of the evaluations in terms of accuracy. Additionally, we investigated the effect of adding Rician noise to the images on the model accuracy. Rician distribution is the most relevant equation that can describe intrinsic noise of MRI images [53] and it can be modelled by the following equation.

4.5.5. The impact of NF-RCNN on memory and speed In the present study, the images of dataset #1, i.e. axial cardiac MRI

f (x|ν , σ ) =

x −(x 2 + ν 2) ⎞ xν exp ⎛ I0 ⎛ 2 ⎞ 2 σ2 2 σ σ ⎠ ⎝ ⎝ ⎠ ⎜



(4)

Table 3 Bounding box localization errors in mm ± standard deviations for dataset1 dataset2. The table also compares results for NF-RCNN with related works. dataset1

dataset2

Method

mean localization error

method

mean localization error

NF-RCNN R-CNN Fast RCNN Faster RCNN

7.33 19.2 12.1 10.5

NF-RCNN Emad et. al. [30] Mousa et. al. [40]

6.17 ± 6.63 16.3 ± 14.8 19.12 ± 16.8

± ± ± ±

8.1 18.8 8.1 9.9

71

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

Fig 5. Precision-Recall curves. (a) Evaluation of NF-RCNN using the first dataset and comparison with RCNN series. (b) Evaluation of NF-RCNN using the second dataset and comparison with [30,40].

5. Discussion

Table 4 Precision-recall results for NF-RCNN vs RCNN object detector series, trained on dataset1. Our proposed model gives highest precision at all recall levels.

The present study aimed to propose a deep neural network architecture for automatic heart localization in CMR images. The procedure was based on using a nonlinear convolutional neural network as the region proposal network of faster RCNN. NIN was selected due to its capabilities in approximating more abstract representations of the hidden concepts. This replacement reinforced the network to find more complicated features in CMR images. The proposed heart localization algorithm outperformed in terms of explicit boundaries between the heart and encompassing organs. For example, the first image of the top row in Fig. 1 has an unclear lung, which increases the contrast between the heart and the surroundings. On the contrary, the last image of the top row has a low contrast with the surroundings. Compared with other object detectors, faster R-CNN utilizes the pyramids of reference boxes, which serve as references at multiple scales and aspect ratio and are called “anchor” boxes by the authors, in an innovative way [6]. Further, since a single CNN was used to both carryout region proposals and classification (Fig. 3), only one CNN needed to be trained and we obtained region proposals with almost zero computational cost. Furthermore, using a convolutional network as region proposer instead of a computationally expensive algorithm (e.g. selective search used in R-CNN and fast R-CNN) increases the accuracy of the model since it considers k different object boxes, such as a tall, wide, and large box. This allows the faster R-CNN to be more accurate. The majority of studies on heart localization have used the pre-

Recall level

NF-RCNN Faster RCNN Fast RCNN RCNN

0%

25%

50%

75%

1 1 1 1

1 1 1 1

1 1 0.98 0.95

0.99 0.97 0.91 0.85

Table 5 The impact of NF-RCNN on memory usage and speed of wall motion analysis algorithm. Method

Memory (Mb)

Speed (video/minute)

LRCN without NF-RCNN LRCN with NF-RCNN

1413 756

5.9 11.3

where I0(z) is the modified Bessel function of the first kind with order zero [54], σ is the standard deviation, and ν is the distance between the reference point and the center of the bivariate distribution. In our experiment, the Rician noise is generated with three different values of σ (i.e. 0.1, 0.01, and 0.001) and the resulted accuracy is reported. The results show a significant decrease in the accuracy in presence of noise.

Table 6 Study the effect of image augmentation on NF-RCNN accuracy using both datasets. The first column shows augmentation steps: Pure dataset corresponds to the dataset without any augmentation. Augmented data-S1 corresponds to the pure dataset + rotation. Augmented data-S2 corresponds to the augmented dataS1 + scaling. Augmented data-S3 corresponds to the augmented data-S2 + brightness manipulation. Augmented data-S4-σ corresponds to the augmented dataS3 + noisy images using Rician distribution with different values of σ. The best results are highlighted.

72

Physica Medica 70 (2020) 65–74

S. Kermani, et al.

amount of noise artificially makes unreal dataset which corrupts the performance of the model. The major limitation of the present study is lack of heart localization ground truth in the public dataset (dataset #2), which is due to the purpose of releasing this dataset, which was image segmentation and not heart localization. Altogether, there is a growing need to publish a validated public dataset with the bounding boxes of ground truth in order to provide the opportunity of reasonably comparing various methods.

processing for the LV segmentation and most of them have focused on LV localization. The localization of whole heart benefits from considering it as a compact organ. This fact increases the performance of localization algorithm. On the other hand, the algorithm in the present study acts as a prerequisite of LV wall motion analysis, which needs whole heart localization. The proposed method for automatic localization showed a great improvement in the performance after the training data were augmented. Convolutional networks are known to perform well using a reasonable training dataset. This strategy was opted since providing a large set of annotated data in medical images is usually a tedious process. Accordingly, we designed a heart localization tool with relatively small training set with a fast and easy process for providing the ground truth. This characteristic of our proposed network enables it to be used for localization of other anatomical structures. The proposed architecture ignores the temporal dependencies among cardiac cycle frames whereas the dependencies can strongly help the network to localize the heart in MRI images. Among various approaches able to handle the temporal accordance, LSTM [45] can be considered as an appropriate option. In the future, we aim to train an LSTM model with the sequences of video frames slices, which probably provides information regarding temporal relations. Since two different datasets were used for training and evaluating the network, we required several approaches to compare our method with. When the network was trained with our prepared dataset, which was axial cardiac MR images, it was compared with other R-CNN series networks, including R-CNN, fast R-CNN, and faster R-CNN, in order to evaluate the capability of the network to localize the heart. However, the network was compared with our implementation of the two other approaches, which used dataset #2 as well [33,48], when it was trained with another dataset which was a public dataset of short axis cardiac MR images from York University. FROC analysis, widely used in object detection algorithms [28,55–58], graphically represents the performance of an object detection system. FROC curve is similar to the ROC curve, except that the FPR on the x-axis is replaced with the number of false positives per images. The AUC of the FROC curve can yield a good insight into the performance of the network. We evaluated the NF-RCNN network and compared it with other approaches based on the abovementioned dataset. The results showed that our proposed network achieved the best performance among all the methods in both datasets. FROC curves exhibited a high degree of sensitivity level within the range of clinically relevant FP/image rates in the two evaluated datasets of the cardiac MR imaging. Mean localization error was another evaluation metric used to study the robustness and accuracy of the network. Despite large variability in both datasets, a mean error of only 7.33 mm for dataset #1 and 6.17 mm for dataset #2 was obtained, which was undoubtedly satisfactory for the purpose of heart localization. Errors in short-axis images were relatively less than axial images due to the prominence of the left and right ventricles in short-axis images. PR analysis is another helpful tool to evaluate the performance of a network in various recall levels. Moreover, mean average precision is a quantitative measure derived from the PR curve and represents the integral of the PR curve (area under the PR curve). Based on the mathematical definition, decreasing the precision results in increasing the recall, which indicates that the area under the PR curve would be less than or equal to one. Fig. 5 displays that NF-RCNN obtained maximum MAP compared with the other methods and precision and it had the best precision at all recall levels (Table 4). The effect of various augmentation methods on the model accuracy was investigated in our work. The results show a great improvement of the accuracy when rotation, scaling, and gray-scale manipulation are used. On the other hand, adding different levels of noise decreases the accuracy remarkably (Table 6). This is due to the nature of MRI noise which is produced by the acquisition equipment and altering the

6. Conclusion The present study proposed NIN faster R-CNN using a nonlinear structure as RPN, and tested it on CMR images. We evaluated our proposed network with qualitative and quantitative evaluation metrics and yielded perfectly acceptable results. Due to the complicated structure of medical images, a nonlinear convolutional neural network was used to be trained for interpreting and representing these structures. Further research could focus on using this structure for other medical object detection fields and incorporating temporal information of the cardiac cycle into the localization procedure. 7. Funding sources This study was funded by Isfahan University of Medical Sciences, Isfahan, Iran. The grant number was 395679. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] Ward KK, Shah NR, Saenz CC, McHale MT, Alvarez EA, Plaxe SC. Cardiovascular disease is the leading cause of death among endometrial cancer patients. Gynecol Oncol 2012;126:176–9. https://doi.org/10.1016/J.YGYNO.2012.04.013. [2] Litjens G, Ciompi F, Wolterink JM, de Vos BD, Leiner T, Teuwen J, et al. State-ofthe-art deep learning in cardiovascular image analysis. JACC Cardiovasc Imaging 2019;12:1549–65. [3] Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60–88. https://doi.org/10.1016/J.MEDIA.2017.07.005. [4] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, p. 580–7. [5] Girshick R. Fast R-CNN. IEEE Int. Conf. Comput. Vis., 2015, p. 1440–8. [6] Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 2017;39:1137–49. https://doi.org/10.1109/TPAMI.2016.2577031. [7] Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, RealTime Object Detection. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, p. 779–88. [8] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, et al. SSD: Single Shot MultiBox Detector. Eur. Conf. Comput. Vis., Springer, Cham; 2016, p. 21–37. doi: 10.1007/978-3-319-46448-0_2. [9] Dai J, Li Y, He K, Sun J. R-FCN: Object Detection via Region-based Fully Convolutional Networks. ArXiv Prepr ArXiv160506409 2016. [10] Uijlings JR, van de Sande KEA, Gevers T, Smeulders AWM. Selective search for object recognition. Int J Comput Vis 2013;104:154–71. https://doi.org/10.1007/ s11263-013-0620-5. [11] Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 25, Curran Associates, Inc.; 2012, p. 1097–105. [12] Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Their Appl 1998;13:18–28. https://doi.org/10.1109/5254.708428. [13] Rocktäschel T, Riedel S. End-to-end differentiable proving. Adv. Neural Inf. Process. Syst. 2017:3788–800. [14] Lin M, Chen Q, Yan S. Network In Network. ArXiv: 13124400 2013. [15] Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, et al. Speed/accuracy trade-offs for modern convolutional object detectors. IEEE Conf. Comput. Vis. Pattern Recognition, 2017, p. 3296–305. doi: 10.1109/CVPR.2017.351. [16] Jolly M-P, Duta N, Funka-Lea G. Segmentation of the left ventricle in cardiac MR images. IEEE Int. Conf. Comput. Vision, vol. 1, IEEE Comput. Soc; 2001, p. 501–8. doi: 10.1109/ICCV.2001.937558.

73

Physica Medica 70 (2020) 65–74

S. Kermani, et al. [17] Kerbyson D, Atherton T. Circle detection using Hough transform filters. Image Process Appl 1995:370–4. [18] Pavani S-K, Delgado D, Frangi AF. Haar-like features with optimally weighted rectangles for rapid object detection. Pattern Recognit 2010;43:160–72. https:// doi.org/10.1016/J.PATCOG.2009.05.011. [19] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition., vol. 1, IEEE Comput. Soc; 2001, p. I-511-I–518. doi: 10.1109/CVPR.2001.990517. [20] Papageorgiou CP, Oren M, Poggio T. General framework for object detection. Proc. IEEE Int. Conf. Comput. Vis., IEEE; 1998, p. 555–62. doi: 10.1109/iccv.1998. 710772. [21] Lin X, Cowan BR, Young AA. Automated Detection of Left Ventricle in 4D MR Images: Experience from a Large Study. Int. Conf. Med. Image Comput. Comput. Interv., Springer, Berlin, Heidelberg; 2006, p. 728–35. doi: 10.1007/11866565_89. [22] Pednekar A, Kurkure U, Muthupillai R, Flamm S, Kakadiaris I. Automated left ventricular segmentation in cardiac MRI. IEEE Trans Biomed Eng 2006;53:1425–8. https://doi.org/10.1109/TBME.2006.873684. [23] Zhong L, Zhang JM, Zhao X, Tan RS, Wan M. Automatic localization of the left ventricle from cardiac cine magnetic resonance imaging: a new spectrum-based computer-aided tool. PLoS ONE 2014:9. https://doi.org/10.1371/journal.pone. 0092382. [24] Ramachandran S, George J, Skaria S, Varun V. Using YOLO based deep learning network for real time detection and localization of lung nodules from low dose CT scans. Med. Imaging 2018 Comput. Diagnosis, 2018, p. 105751I. [25] Bellver M, Maninis K-K, Pont-Tuset J, Giro-i-Nieto X, Torres J, Van Gool L. Detection-aided liver lesion segmentation using deep learning. ArXiv: 171111069 2017. [26] Ghesu FC, Georgescu B, Mansi T, Neumann D, Hornegger J, Comaniciu D. An Artificial Agent for Anatomical Landmark Detection in Medical Images. Int. Conf. Med. Image Comput. Comput. Interv., 2016, p. 229–37. [27] Lu X, Xu D, Liu D. Robust 3d organ localization with dual learning architectures and fusion. Deep Learn. Data Labeling Med. Appl., 2016, p. 12–20. [28] Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, et al. Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 2016;35:1299–312. https://doi.org/10.1109/ TMI.2016.2535302. [29] Tajbakhsh N, Gotway MB, Liang J. Computer-aided pulmonary embolism detection using a novel vessel-aligned multi-planar image representation and convolutional neural networks. Int. Conf. Med. Image Comput. Comput. Interv., 2015, p. 62–9. [30] Tajbakhsh N, Gurudu SR, Liang J. Automatic polyp detection in colonoscopy videos using an ensemble of convolutional neural networks. Biomed. Imaging (ISBI), 2015 IEEE 12th Int. Symp., 2015, p. 79–83. [31] Ghesu FC, Krubasik E, Georgescu B, Singh V, Zheng Y, Hornegger J, et al. Marginal space deep learning: efficient architecture for volumetric image parsing. IEEE Trans Med Imaging 2016;35:1217–28. [32] Avendi M, Kheradvar A, Jafarkhani H. A combined deep-learning and deformablemodel approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med Image Anal 2016;30:108–19. [33] Emad O, Yassine IA, Fahmy AS. Automatic localization of the left ventricle in cardiac MRI images using deep learning. Eng Med Biol Soc 2015:683–6. [34] Bishop CM. Pattern recognition and machine learning. Springer; 2006. [35] Andreopoulos A, Tsotsos JK. Efficient and generalizable statistical models of shape and appearance for analysis of cardiac MRI. Med Image Anal 2008;12:335–57. [36] de Vos BD, Wolterink JM, de Jong PA, Viergever MA, Isgum I. 2D image classification for 3D anatomy localization: employing deep convolutional neural networks. Med. Imaging 2016 Image Process., 2016, p. 97841Y. [37] Suri JS, Wilson D, Laxminarayan S. Handbook of biomedical image analysis. 2005. [38] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. ArXiv: 14091556 2014.

[39] Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 2013;35:1798–828. [40] Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. 2009. [41] Zagoruyko S. 92.45% on CIFAR-10 in Torch 2016. https://github.com/szagoruyko/ cifar.torch (accessed October 20, 2016). [42] Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. ArXiv Prepr ArXiv12070580 2012. [43] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv Prepr ArXiv150203167 2015. [44] Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, et al. Long-term recurrent convolutional network for visual recognition and description. IEEE Conf. Comput. Vis. pattern Recognit., 2015, p. 2625–34. [45] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9:1735–80. [46] Paris S, Hasinoff SW, Kautz J. Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM Trans Graph 2011;30:68–71. [47] Robbins H, Monro S. A stochastic approximation method. Ann Math Stat 1951:400–7. [48] Mousa D, Zayed N, Yassine IA. Automatic cardiac MRI localization method. Biomed Eng Conf 2014:153–7. [49] Salton G, Lesk ME. Computer evaluation of indexing and text processing. J ACM 1968;15:8–36. [50] Yu J, Jiang Y, Wang Z, Cao Z, Huang T. UnitBox: An Advanced Object Detection Network. 24th ACM Int. Conf. Multimed., 2016, p. 516–20. [51] Metz C. Evaluation of digital mammography by ROC analysis. Excerpta Med 1996;1119:61–8. [52] Chakraborty DP, Winter L. Free-response methodology: alternate analysis and a new observer-performance experiment. Radiology 1990;174:873–81. [53] Gudbjartsson H, Patz S. The Rician distribution of noisy MRI data. Magn Reson Med 1995;34:910–4. [54] Weisstein E. Bessel Function of the First Kind. MathWorld – a Wolfram Web Resour n.d. http://mathworld.wolfram.com/BesselFunctionoftheFirstKind.html. [55] Li J, Liang X, Shen S, Xu T, Feng J, Yan S. Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimed 2018;20:985–96. https://doi.org/10.1109/TMM. 2017.2759508. [56] Roth H, Lu L, Liu J, Yao J, Seff A. Improving computer-aided detection using convolutional neural networks and random view aggregation. IEEE Trans Med Imaging 2016;35:1170–81. https://doi.org/10.1109/TMI.2015.2482920. [57] Van Grinsven MJJP, Van Ginneken B, Hoyng CB, Theelen T, Sánchez CI. Fast convolutional neural network training using selective data sampling: application to hemorrhage detection in color fundus images. IEEE Trans Med Imaging 2016;35:1273–84. https://doi.org/10.1109/TMI.2016.2526689. [58] Setio AAA, Ciompi F, Litjens G, Gerke P, Jacobs C, Van Riel SJ, et al. Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans Med Imaging 2016;35:1160–9. https://doi.org/10. 1109/TMI.2016.2536809. [59] Ghelich Oghli M, Mohammadzadeh A, Kafieh R, Kermani S. A hybrid graph-based approach for right ventricle segmentation in cardiac MRI by long axis information transition. Physica Medica 2018;54:103–16. https://doi.org/10.1016/j.ejmp.2018. 09.011. [60] Moradi S, Ghelich Oghli M, Alizadehasl A, Shiri I, Oveisi N, Oveisi M, et al. MFPUnet: A novel deep learning based approach for left ventricle segmentation in echocardiography. Physica Medica 2019;67:58–69. https://doi.org/10.1016/j. ejmp.2019.10.001. [61] Liu S, Wang Y, Yang X, Lei B, Liu L, Li SX, et al. Deep learning in medical ultrasound analysis: a review. Engineering 2019;5(2):261–75. https://doi.org/10.1016/j.eng. 2018.11.020.

74