Deep learning based early stage diabetic retinopathy detection using optical coherence tomography

Deep learning based early stage diabetic retinopathy detection using optical coherence tomography

ARTICLE IN PRESS JID: NEUCOM [m5G;September 3, 2019;1:28] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing ...

3MB Sizes 0 Downloads 31 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 3, 2019;1:28]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Deep learning based early stage diabetic retinopathy detection using optical coherence tomography Xuechen Li a, Linlin Shen a,∗, Meixiao Shen b,∗, Fan Tan b, Connor S. Qiu c,d a

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China School of Ophthalmology & Optometry, Wenzhou Medical College, Wenzhou, PR China c St Mary’s Hospital, Isle of Wight NHS Trust, Isle of Wight, United Kingdom d Faculty of Medicine, Imperial College London, London, United Kingdom b

a r t i c l e

i n f o

Article history: Received 15 January 2019 Revised 3 July 2019 Accepted 24 August 2019 Available online xxx Communicated by Dr. Leyuan Fang Keywords: Computer-aided diagnosis Diabetic retinopathy Optical coherence tomography Deep learning

a b s t r a c t Diabetic retinopathy (DR) is one of the leading causes of preventable blindness globally. Performing retinal examinations on all diabetic patients is an unmet need, and detection at an early stage can provide better control of the disease. The objective of this study is to provide an optical coherence tomography (OCT) image based diagnostic technology for automated early DR diagnosis, including at both grades 0 and 1. This work can help ophthalmologists with evaluation and treatment, reducing the rate of vision loss, and enabling timely and accurate diagnosis. In this work, we developed and evaluated a novel deep network – OCTD_Net, for early-stage DR detection. While one of the networks extracted features from the original OCT image, the other extracted retinal layer information. The accuracy, sensitivity and specificity was 0.92, 0.90 and 0.95, respectively. Our analysis of retinal layers and the features learned by the proposed network suggests that grade 1 DR patients present with significant changes in the thickness and reflection of certain retinal layers. However, grade 0 DR patients do not have such significant changes. The heatmaps of the trained network also suggest that patients with early DR showed different textures around the myoid and ellipsoid zones, inner nuclear layers, and photoreceptor outer segments, which should all receive dedicated attention for early DR diagnosis. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Diabetic retinopathy (DR) is one of the leading causes of preventable blindness globally. It is estimated that currently 210 million people have diabetes worldwide [1–3], of which about 10– 18% have had or will develop DR [3–6]. The global prevalence of diabetes is estimated to increase to 366 million by 2030 [7]. DR will become an increasingly important and serious problem in the coming years. DR is the commonest complication of diabetes and remains the leading cause of blindness among working-aged individuals in most developed countries [8]. Thus, a protocol is needed to identify the individuals at greatest risk of DR-related visual impairment, before permanent changes in the retina occur. Color fundus images have been used by ophthalmologists for DR diagnosis [9,10]. Typically, DR can be classified into five grades. The details are given in Table 1. Most of the previous studies focus on the classification of grades 1 to 4 [11]. Rahim et al. [12] and others [13–18] presented automatic detection of DR in eye fundus



Corresponding authors. E-mail address: [email protected] (M. Shen).

images by employing image processing techniques and machine learning. Handcrafted features were used in these studies for optic disk, fovea, blood vessel and maculopathy detection, which are the foundations for successful DR detection. These methods are rather complicated; errors can be built up across tissue detections and this affects the overall efficiency and robustness of the system [19]. In contrast, deep learning based methods can extract features automatically, and have achieved state-of-the-art performance for DR detection [20–26]. These algorithms employed convolutional neural networks to extract features on the color fundus images directly, and used softmax layer or SVM to perform classifications. Fundus imaging is more commonly used in present DRdetecting systems since it uses the same concepts as traditional indirect ophthalmoscopy to form a wide view of the retina and to adequately present systemic diseases. However, the retina changes caused by diabetes are not always visible through the regular eye fundus examination. Optical coherence tomography (OCT) with micrometer resolution and cross-sectional imaging capabilities has become a promising biomedical tissue-imaging technique, which is particularly suitable for ophthalmic applications requiring micrometer resolution and millimeter penetration depth [27]. Additionally, OCT is cost-effective, supports quantitative measurements,

https://doi.org/10.1016/j.neucom.2019.08.079 0925-2312/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

ARTICLE IN PRESS

JID: NEUCOM 2

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx Table 1 Diabetic Retinopathy (DR) severity scale. Disease severity level

Findings

Grade 0: No apparent retinopathy Grade 1: Mild – NPDR Grade 2: Moderate – NPDR

No visible sign of abnormalities Presence of microaneurysms only More than just microaneurysms but less than severe NPDR Any of the following: > 20 intraretinal hemorrhages Venous beading Intraretinal microvascular abnormalities No signs of PDR Either or both of the following: Neovascularization Vitreous/pre-retinal hemorrhage

Grade 3: Severe – NPDR

Grade 4: PDR

can evaluate changes without human bias, and has become a critical tool for baseline retinal evaluation before initiation of therapy and subsequent monitoring of therapeutic effects [28,29]. Studies have showed that OCT images can be employed for early DR detection [30,31], before it can be detected using fundus imaging. Early detection of DR lesions can be based on OCT biomarkers, such as retinal volume and total thickness, together with microaneurysms and blood–retinal barrier status [32,33]. There have been several computer-aided diagnosis systems employing OCT images to detect retinal diseases such as DR, glaucoma and age-related macular degeneration (AMD) [30,31,34–38]. Previous studies of OCT-based DR detection rely on layer detection and handcrafted feature extraction. Bernardes et al. [30] employed the standardized histogram information of OCT scans as features for identification between early DR and normal cases. Support vector machines (SVMs) were employed for classification. The experimental results showed that about 72% DR and 65% of normal cases could be accurately classified. These features were not designed based on the prior knowledge of retinal disease diagnosis and cannot achieve good performance. Eltanboly et al. [31] proposed a DR detection method based on the features designed using prior clinical knowledges. Firstly, the retinal layers were segmented using a second-order Markov-Gibbs random field (MGRF) model. Then handcrafted features e.g. the reflection, thickness and curvature of retinal layers were extracted. Finally, an autoencoder based classifier was used for DR detection. The reported sensitivity and specificity were 0.83 and 1, respectively. Most of the prior works mainly employed handcrafted features, and used classifiers such as k-Nearest Neighbor [31,36], SVM [30,36] and random forest [31]. However, there has recently been a revolutionary step forward in machine learning techniques with advances in deep learning, where a neural network with a large number of layers can be trained to learn these convolutional matrices purely from training data. Within ophthalmology, deep learning has been applied for fundus photograph based DR detection, visual field perimetry in glaucoma patients, grading of nuclear cataracts, segmentation of foveal microvasculature [39–42], and recently, on OCT image segmentation [43,44] and OCT-based retinal disease diagnosis [45,46] and referral [47]. Fang et al. [43] presented a novel framework combining CNN and graph search methods (termed as CNN-GS) for the automatic segmentation of layer boundaries on OCT images. CNN-GS first utilizes a patch-based CNN to extract specific retinal layer boundaries. Then, a graph search method uses the probability maps created from the CNN to find the final boundaries. Compared to patch-based CNN, fully convolutional networks (FCN) can achieve end-to-end segmentation and do not need patch extraction as preprocessing and GS method as postprocessing. Roy et al. [44] presented a U-Net like FCN – RelayNet – for OCT layers segmentation. The network uses a contracting path of convolutional blocks (encoders) to learn a hierarchy of contextual features, which is followed by an expan-

sive path of convolutional blocks (decoders) for semantic segmentation. A joint loss of LOG loss and Dice loss were employed as loss function. Both methods achieved good layer segmentation performances. Compared to CNN-GS, the RelayNet achieved endto-end layer segmentation and did not need patch extraction as preprocessing and GS method as postprocessing. Kermany et al. [45] proposed a deep learning based network for detecting three retinal diseases, i.e. choroidal neovascularization, diabetic macular edema and drusen. The classification accuracy was above 95%. Fang et al. [46] propose a novel lesion-aware convolutional neural network (LACNN) method on the same dataset of [45] for retinal OCT image classification, in which retinal lesions within OCT images are utilized to guide the CNN to achieve more accurate classification. While the LACNN simulates the ophthalmologists’ diagnosis that focuses on local lesion-related regions when analyzing the OCT image, the classification network can achieve more efficient and accurate OCT classification (about 2% increase compared to [45]). De Fauw et al. [47] proposed a retinal disease referral system. The network included two parts, the segmentation network and the classification network. The subjects were grouped into four classes, i.e. Urgent, Semi-urgent, Routine and Observation. The deep learning based methods achieved impressive performance. The overall diagnostic accuracy for the four classes was 93.6%. However, to the best of our knowledge, deep learning has not been used for analysing early retinal changes caused by diabetes. In this study, we present a deep learning based early DR (grades 0 and 1) detection network. It focuses on imaging changes within the human retina of diabetic patients, aiming for better characterization and DR detection at the very early stages, even when these changes cannot be detected in the eye fundus. As shown in Fig. 1, the proposed OCTD_Net includes two networks: Org_Net and Seg_Net. The Org_Net (red dot arrows) used DenseNet blocks [48] integrated with Squeeze-and-Excitation (SE) blocks [49] to extract features from the original OCT images. The Seg_Net (green arrows) contained a ReLayNet [44] based OCT layer segmentation block and a convolutional block for features extraction. The classification block combined the output features of both networks by adding the features bitwise and classified the OCT images as early DR or normal. This paper is organized as follows. Section 2 introduces the databases used to train and evaluate the networks, and the OCTDNet employed for early DR detection. Section 3 presents the experimental results of the proposed methods. Finally, Section 4 draws a conclusion. 2. Methods 2.1. Network details The original OCT images were cropped and resized into 224×224 and inputted to both of the networks directly. The

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

3

two integration methods are given as (1) and (2), respectively.

Fig. 1. The structure of the proposed OCTD_Net.

Org_Net extracted the deep features of the OCT images, while the Seg_Net extracted the layer segmentation as the additional features for OCT image classification. As shown in Fig. 2, our Org_Net consists of four dense blocks [48] integrated with SE blocks [49]. For each dense block, showed as the middle image in Fig. 2, the convolutional block consists of three convolutional layers, with size 3 × 3, 1 × 1 and 3 × 3, respectively. The depth and growth rate of the dense block was selected as 13 and 12. Increasing depth and growth rate could bring small performance gains, however, considering the model size and computation cost, we selected a small depth and growth rate for this application. The detailed performance comparison of different depth and growth rates is given in Section 3.2. Each convolutional block in the dense block is followed by rectified linear units (ReLUs) and drop-out layers. Average-pooling of size 2 × 2 was applied at the end of a dense block. After each dense block, SE block was applied for adaptively recalibrating channel-wise feature responses by explicitly modeling interdependencies between channels. The final layer is a global average-pooling layer, which outputs the extracted deep features for DR classification. The DenseNet we employed in this study is one of the state-ofthe-art networks. Instead of drawing representational power from extremely deep or wide architectures, dense block architecture exploits the potential of the network through feature reuse, yielding condensed models that are easy to train and require less parameters. Concatenating feature-maps learned by different layers increases variation in the input of subsequent layers and improves efficiency. Each layer has direct access to the gradients from the loss function and the original input signal, leading to an implicit deep supervision. SE block proposes a mechanism that allows the network to perform feature recalibration, through which it can learn to use global information to selectively emphasize informative features and suppress less useful ones. The structure of SE block is given in the right image of Fig. 2. The W, H and C here mean the width, height and channel number of the input feature map, respectively; r (which equals to 6 in this work) is a parameter for dimension reduction. In this work, we integrated SE block with DenseNet. To the best of our knowledge, such work has not been done before. We tested two different methods for integration. One is to place SE blocks between dense blocks (see the orange pathway in the left image of Fig. 2), the other is to place the SE block inside a dense block, between each convolutional block (shown as the red pathway and red dotted box in Fig. 2). The output features of the

f 1 = [W 1 , . . . , W n · D 1 , W 2 , . . . , W n · D 2 , . . . , W n · D n ]

(1)

f 2 = [W 1 · C 1 , W 2 · C 2 , . . . , W m · C m ]

(2)

Here f represents the output feature of the integration of dense and SE blocks; Wn and Dn in (1) represent the output of the nth SE and dense block, respectively; Wm and Cm in (2) represent the output of the mth SE and convolutional block, respectively. The experimental results suggest that the first method showed better performance than DenseNet while the second method performed worse. Therefore, we employed the first method to integrate SE and dense block in the Org_Net. The Seg_Net mainly used the ReLayNet [44] for retinal layer segmentation, which is a U-Net like fully convolutional network and achieves state-of-the-art performance on retinal layer segmentation of OCT images (see Fig. 3). According to Roy et al. [44], RelayNet performed better than U-Net and FCN on an OCT image segmentation task. In [44], the OCT images were cropped into 512×64 for data augmentation, while in this study the Seg_Net use the whole image for feature extraction. Therefore, we slightly adjusted the structure of RelayNet by changing the kernel size of convolutional layer to 3 × 3, and increasing the depth of network to 4 layers. The Seg_Net was pre-trained using the WMU_S database. 80% of the images were selected from each class for training and the remaining 20% were used for validation. The Dice coefficient of layer segmentation was approximately 0.9. When training finished, the weights of all layers in Seg_Net were frozen, and the first half of the network was used as a feature extractor. The output after a 14 × 14 convolutional layer were employed as layer features, since the layer segmentation result can be obtained from this feature by convolution and upsampling. The classification block consists of feature merging, fully connected layers and softmax classification. In this work, we employed sum instead of concatenation for feature merging. While concatenation doubled the number of feature maps, sum operation does not increase the complexity of the network. 2.2. Data augmentation Dataset augmentation has been a particularly effective technique to improve the performance of deep networks for classification. Images are high dimensional and subject to an enormous variety of variations, many of which can be easily simulated. Operations like translating the training images a few pixels in each direction and other operations such as rotating or scaling have been proven to be effective for network training [50]. In this work, additional training instances were generated by applying random crop (cropping ratio ranging from 0.8 to 1), zoom in (ranging from 1 to 1.2) and horizontal mirroring to the images of the training set. Define the original image size as S. For random crop operation, we randomly crop a sub-image with size equals to S × cropping ratio, and resize the cropped sub-image to S. For zoom in operation, we resize the image into S × zoom in ratio, and crop the image from the center into S. Considering excessive cropping and zooming operations may lead to key information losses for classification, we selected relatively conservative values i.e. 0.8 and 1.2, as the limitations for cropping and zooming operations, respectively. Similar selections were applied for data augmentations in other works [51,52]. As the OCT scans of left and right eye are mirror symmetry, it is reasonable to apply horizontal mirroring to generate more training data. Neural networks do not seem to be very robust to noise [53]. One of the most efficient ways to improve the robustness is applying random noise to the training data. Considering excessive

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

JID: NEUCOM 4

ARTICLE IN PRESS

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

Fig. 2. The network details of Org_Net.

Fig. 3. The network details of Seg_Net.

j

noise may affect the classification performances of the proposed network, in this work, we employed a random Gaussian noise with zero mean and 0.001 standard deviations to the training images for data augmentation.

N images), let M represent the confusion matrix, and Mi represent the class i image classified as class j. The definitions of accuracy, sensitivity and specificity are given as follows.

2.3. Network training

accuracy =

n n 1  j Mi ( i = j ) N

(3)

j n 1  Mi ( i = j ) n n Mj

(4)

i=1 j=1

In this work, the networks were implemented using Keras toolbox, and trained with a mini-batch size of 32 using four GPUs (GeForce GTX TITAN X, 12GB RAM). The initial learning rate was set to 0.001. ‘Adam’ [54], instead of the traditional stochastic gradient descent (SGD) was employed as the optimization algorithm, to iteratively update neural network weights based on training data. The training was stopped when the network converged (validation loss did not drop for 10 epochs). 2.4. Evaluation metrics In this work, we employed accuracy, sensitivity and specificity for network evaluation. For multiple class classification (n classes,

sensit ivit y =

i=1

speci f icity = 1 −

j=1

i

j n 1  Mi ( i = j ) n j n j=1 Mi i=1 N −

(5)

We also employed the receiver operating characteristic (ROC) curve and the area under ROC curve (AUC) to evaluate the performance of the proposed OCTD_Net for binary classification. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true positive rate is also known as sensitivity or recall in machine learning.

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

5

Table 2 The details of training and testing images. Database Classes

WMU

Fig. 4. The age distribution of patients from the three different classes.

The false positive rate is also known as the fall-out and can be calculated as (1 − specificity). The ROC curve is thus the sensitivity as a function of fall-out. In general, if the probability distributions for both successful detection and false alarms are known, the ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from 0 to the discrimination threshold) of the sensitivity in the y-axis versus the cumulative distribution function of the FPR on the x-axis. 3. Experimental results 3.1. Datasets The proposed method was trained and evaluated using a database of OCT images provided by Wenzhou Medical University (WMU). The images of WMU database were captured using a custom-built spectral domain OCT (SD-OCT) system. In this system, a charge-coupled device camera (AViiva EM4; E2v Technologies, Chelmsford, England), with a line rate scan of up to 70,0 0 0 A-lines per second, was used to record the interference signal. To balance the system sensitivity and acquisition time, the acquisition rate was set at 50 kHz. An ocular lens (60 D; Volk Optical, Mentor, Ohio, USA) was adapted to a slit-lamp system to image the posterior segment of the eye. The OCT images were acquired from each subject using a cross-line scan protocol. The scan width was set to 8 mm. The A-line and B-scan for each eye was set to 2048 and 32, respectively. The database contains 4168 OCT images collected from 155 patients, among which, 1112 OCT images from 45 patients showed grade 1 DR (DR 1), 1856 OCT images from 64 patients showed grade 0 (DR_0) and 1200 OCT images from 46 subjects were normal. While grade 1 DR was diagnosed on the basis of clinical examination with the seven-field stereo fundus color photography, grade 0 DR was defined as diabetic without visible abnormalities and the normal images were captured from healthy people without diabetes. All OCT images were annotated by three DR specialists from Wenzhou Medical University. If no agreement could be reached, a senior expert was consulted. The images from WMU database have size of 2048×2048 pixels. All images have a grayscale color depth of 8 bits. The age distribution of patients from the three classes is given in Fig. 4. One can observe from the figure that the age distributions were similar for all three classes. The effects of age on the retina can be considered the same for all three classes. This work has received approval and authorization from the institutional review board. An additional database that contained both the original OCT images and the corresponding layer segmentation masks was employed for training the Seg_Net of the proposed network. This database was also provided by Wenzhou Medical University (WMU_S), and captured using the same system as per the WMU database. The database contained 120 OCT images, 40 images for

DR_1 DR_0 Normal Total

Number of B-scans (Number of patients) Training

Testing

Total

884 (28) 1449 (40) 976 (27) 3309 (95)

228 407 224 859

1112 1856 1200 4168

(17) (24) (19) (60)

(45) (64) (46) (155)

each class, i.e. DR_0, DR_1 and normal. Nine boundaries of the intra-retinal layer structures were firstly annotated using custombuilt software [55], which mainly employed graph theory and the shortest-path search method based on an optimization algorithm of the dynamic programming technique [56]. Then, all of the boundaries were visually checked by two clinicians to correct any segmentation errors. Finally, the areas between these boundaries were marked using different grayscales, as shown in Fig. 5. In this study, we evaluated the DR classification performance of different features using WMU database, which consists of OCT images with DR at grades 0, 1 and normal. The networks were trained using about 80% of the database and tested using the remaining 20%. See Table 2 for details about the number of images used for training and validation. Images for training and testing were selected from different patients. 3.2. Comparison of parameters employed for Org_Net According to Huang et al. [48], the depth and growth rate employed for DenseNet are closely related to the performance of the network. Therefore, in this section we evaluated the performance of Org_Net with different depths and growth rates and listed their accuracies in Table 3. From Table 3 we can see that, the accuracy of Org_Net and OCTD_Net generally improves with the increase of depth and growth rate. For example, the accuracy of Org_Net was improved from 80% to 83% when the depth of the network was increased from 13 to 21. However, due to the increase of network depth, the number of parameters was significantly increased from 7.1 M to 27.2 M. The improvement of accuracy for the proposed OCTD_Net is even smaller with the variance of depth and growth rate. To achieve a trade-off between accuracy and model size, we selected 13 and 12 as the depth and growth rate of Org_Net, respectively. 3.3. Comparison of handcrafted and deep features According to Eltanboly et al. [31], the thickness (pixels) and reflection (gray-level intensity of OCT images) of retinal layers were key features for DR detection using OCT images. We firstly use the 120 OCT images with manual segmentation (WMU_S dataset) to analyze the thickness and reflection values of each retinal layer. All OCT images were horizontally flipped so that the effect of left/right eyes can be depressed. The distributions of thickness and reflection for all layers in Fig. 6 shows that the changes of INL and OS layers caused by DR are relatively more significant than that of other layers. The average thickness of INL and OS layers shown in the 3rd row in Fig. 6(a) suggest that INL layer of DR 1 and OS layer of normal subjects are thicker than that of other subjects. The average reflection of INL and OS layers shown in the 3rd row in Fig. 6(b) suggest that the INL layer of DR 1 patients present with higher reflection, while the OS layer of normal subjects have lower reflection. After statistical analysis of the thickness and reflection using manually segmented images, we now use the WMU dataset automatically segmented by Seg_Net for experiments. Table 4 lists the

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

Fig. 5. The annotations for eight retinal layers and their corresponding abbreviations (the retinal layers were marked using different colors). Table 3 The performance of Org_Net and OCTD_Net with different depth and growth rate. Org_Net

OCTD_Net

depth

Growth rate

Accuracy

Number of parameters

Model size

accuracy

13 13 13 16 19 21

12 18 24 12 12 12

0.80 0.81 0.81 0.80 0.82 0.83

7.1M 15.5M 27.3M 12.3M 19.1M 27.2M

83M 182M 320M 145M 224M 320M

0.92 0.92 0.92 0.92 0.93 0.93

Table 4 The comparison of handcrafted and deep features for DR classification. Features

Thickness and Reflection Seg_Net Org_Net OCTD_Net

Accuracy

Top 1 All Top 1 All Top 1 All All

Classifier

DR_1 vs. DR_0+normal

DR_0 vs. normal

DR_1 vs. DR_0 vs. normal

0.69 0.80 0.78 0.82 0.86 0.91 0.95

0.62 0.64 0.76 0.78 0.84 0.89 0.96

0.50 0.59 0.61 0.67 0.74 0.80 0.92

classification accuracy of the best performing features for three different tasks on the test set, when thresholding was used for classification. Among all of the thickness and reflection features, the thickness of INL layer achieved the best accuracy, 69% for grade 1 DR detection. It seems that patients with grade 1 DR have the greatest changes in the thickness of INL layer. The thickness and reflection of OS layer seems to be the best feature to distinguish DR_0 with normal, which might be the earliest layer affected by diabetes. Though the thickness of INL showed the greatest differences among three classes, the accuracy is as low as 50%. The improvement of combining other features with the best one is very small. When support vector machine (SVM) was used as the classifier, only a 9 percent accuracy increase was recorded for the three categories tasks. In summary, while DR_1 patients present with relatively more significant changes in the thickness and reflection of certain retinal layers, DR_0 patient do not have such significant changes and such features cannot discriminate well between the three classes. As data driven approaches, deep networks can learn the features needed to detect DR, directly from the training samples. To quantitatively identify the features significantly different among two classes, we define the criterion FD to measure the difference

Thresholding SVM Thresholding SVM Thresholding SVM Softmax

between feature FA and FB :

F DA v s B ( i ) =

|FA (i ) − FB (i )| ( i = 1, · · · , n ) max (FA (i ), FB (i ) ) + e

(6)

where i is the feature ID, n is the length of the feature and e is a small positive number. Fig. 7 shows the values of FD among DR_1, DR_0 and normal for each dim of the average feature vector extracted by Seg_Net and Org_Net, respectively. The larger the FD value, the bigger difference of the deep features between each classes. One can observe from the figure that the deep features of DR_1 samples are significantly different with that of DR_0 and normal, i.e. with larger FD values in most of the dimensions. While the features of DR_0 samples look similar as a whole compared with that of normal samples, i.e. with smaller FD values, they also differ significantly in some dimensions. As shown in the figure, while feature #S_13 and #O_176 (marked as blue) are the most discriminative features learned by Seg_Net and Org_Net to detect DR_1, feature #S_132 and #O_141 (marked as red) are the most significant features learned by Seg_Net and Org_Net to discriminate DR_0 with normal. Fig. 8 shows the mean and standard deviation of the four top features for the samples from different categories for patients with

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

7

Fig. 6. The histograms and average curve of thickness (a) and reflection value (b) of different retinal layers for DR 0, 1 and normal.

grade 1 DR (a) and grade 0 DR (b), which justifies that they are discriminative. The classification performance of the top and all features learned by Seg_Net and Org_Net are also listed in Table 4. As listed in the table, feature #S_13 and #O_176 achieved 78% and 86% accuracy for DR_1 detection, respectively. Feature #S_132 and #O_141 achieved 76% and 84% accuracy for DR_0 detection, respectively. The classification performances for the top features learned by the deep networks to discriminate DR_0 with normal, and DR_1 with DR_0 and normal, are now similar. It seems that the network has learned good features to discriminate DR_0 samples.

To visualize and analyze the features learned by the deep networks, we visualized the top features learned by Org_Net for discriminating DR_1 (#O_176) and DR_0 (#O_141) in Fig. 9 (the images have been cropped for better visualization). The red color marks the high response areas of Org_Net. For DR_1 discrimination, take feature #O_176 for example, the red areas of heatmaps mainly focus on retinal layers with higher reflection values and mark the edge between neighboring layers, especially the first and last few layers, e.g. INL, MEZ, and OS. Other heatmaps also suggested that the patients with DR_1 showed greater difference

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

JID: NEUCOM 8

ARTICLE IN PRESS

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

Fig. 7. The average deep feature vectors extracted by Seg_Net (a) and Org_Net (b) for DR_1, DR_0 and normal cases (blue: #S_13 and #O_176; red: #S_132 and #O_141).

Table 5 The significant layers suggested by analysis of handcrafted and deep features. Features

Significant layers for classification

Thickness and Reflection Deep network

DR_1 vs. DR_0+normal

DR_0 vs. normal

INL INL, MEZ, OS

OS NFL, MEZ, OS

Table 6 The confusion matrix of the proposed OCTD_Net.

DR_1 DR_0 Normal

DR_1

DR_0

Normal

192 11 2

30 393 21

6 3 201

Table 7 The performance of the DenseNet, Org_Net and OCTD_Net for three categories classification.

Fig. 8. The mean and standard deviation of the top features for (a) DR_1 and (b) DR_0.

VGG [57] Xception [58] DenseNet [48] Org_Net OCTD_Net

Accuracy

Sensitivity

Specificity

0.72 0.77 0.77 0.80 0.92

0.70 0.74 0.76 0.78 0.90

0.85 0.88 0.87 0.89 0.95

3.4. Performance of OCTD_Net and comparison with the state-of-the-art

Fig. 9. The heatmaps of the Org_Net features (left: top feature for DR_1 vs. DR_0 and normal; right: DR_0 vs. normal).

in textures on those retinal layers. For DR_0 discrimination, the heatmaps of feature #O_141 suggest that the significant layers were mainly NFL, MEZ and OS, which seem to be the earliest retinal layers affected by diabetes. We listed in Table 5 the significant layers found by analyzing the handcrafted and deep features. One can observe from the table that both the handcrafted and deep networks suggest that the patients with DR_1 and DR_0 have the greatest changes on INL and OS layers, respectively. Patients with both DR_1 and DR_0 showed different textures at MEZ and OS layers. These layers should receive more attention by doctors when diagnosing early DR.

Table 6 shows the classification confusion matrix M of the proposed OCTD_Net. The principal diagonal of the matrix represents j the number of accurately classified images Mi (i = j ). One can observe from the matrix that most of the images were accurately classified, i.e. the overall classification accuracy was 0.92. While two normal images were classified as DR_1 and six DR_1 images were classified as normal, DR_0 samples seem to contribute most of the misclassifications, i.e. DR_1 to DR_0 or normal to DR_0. We also compared the performance of the networks, i.e. Org_Net and OCTD_Net, with the widely used DenseNet [48] in the literature and some other classic deep networks, i.e. VGG [57] and Xception [58], and reported their results in Table 7. As shown in the table, while DenseNet showed similar performance as Xception, e.g. equal accuracy, 0.2 higher sensitivity and 0.1 lower specificity, both of them showed much better performances than VGG. By adding SE block, Org_Net achieved slightly better accuracy than DenseNet, e.g. 3%, 2% and 2% improvement were achieved for accuracy, sensitivity and specificity, respectively. By employing the

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

9

Fig. 10. The ROC curves of different approaches for DR_1 detection (a) and DR_0 detection (b).

Table 8 The performance of the networks on DR_1 and DR_0 detection. DR type

Database Name

Number

DR_1

WMU

859

DR_0

Private WMU’

52 631

Private

68

4. Conclusion and future work

Method

SEN

SPE

AUC

DenseNet Org_Net OCTD_Net Eltanboly [31] DenseNet Org_Net OCTD_Net Bernardes et al. [30]

0.62 0.77 0.87 0.83 0.72 0.86 0.96 0.71

0.96 0.96 0.97 1 0.89 0.95 1 0.65

0.89 0.92 0.97 N/A 0.78 0.88 0.99 N/A

layer features extracted using Seg_Net, the proposed OCTD_Net generally achieved much better performance than the Org_Net. The classification accuracy improved from 80% to 92%. Similarly, 12% and 6% improvement were achieved for sensitivity and specificity, respectively. The Bernardes’s et al. [30] and Eltanboly’s et al. [31] works are currently the only available methods using OCT for early DR detection which only consider the detection of DR_0 [30] and DR_1 [31], respectively. Therefore, we also implemented binary classifications for DR_0 and DR_1 detections. For DR_1 detection, we combined DR_0 and normal as one class. Table 8 lists the sensitivity and specificity of the proposed networks at the operation point marked with circles in the ROC curves shown in Fig. 10(a), together with that of DenseNet. The area under the ROC curve (AUC) of each network is also included. While the sensitivity and specificity of Org_Net was 0.77 and 0.96, respectively, those of OCTD_Net were increased to 0.87 and 0.97, respectively. Similarly, the AUC was improved from 0.92 to 0.97. For DR_0 detection, we removed DR_1 cases from the test set. If DR_0 was classified as normal or normal was classified as DR_1 or DR_0, we considered it as misclassified. The performances of the proposed networks and DenseNet are also given in Table 8 and Fig. 10(b). Similarly for DR_1 detection, the OCTD_Net also achieved better performance than the other two networks. The sensitivity and specificity of the works proposed by Bernardes et al. [30] and Eltanboly et al. [31] were directly included in Table 8 for comparison. As listed in the table, our approach actually achieved much better performance than [30] and comparable performance to that of [31], i.e. a slightly higher sensitivity and lower specificity. However, the number of subjects (407) with DR_1 included in this study was significantly more than the 26 subjects in [31].

In this study, we proposed a novel deep network – OCTD_Net – for early-stage DR classification using OCT images. The network consisted of two independent networks to combine the deep features and layer information. For multi-categories classification, i.e. grade 0, 1 DR and normal, the accuracy, sensitivity and specificity are 0.92, 0.90 and 0.95, respectively. For binary classification, i.e. grade 1 DR detection, the sensitivity, specificity and AUC of the proposed OCTD_Net are 0.87, 0.97 and 0.97 respectively. The OCTD_Net outperformed previous OCT image based methods for early-stage DR detection. To the best of our knowledge, the proposed work is the first study that gives diagnostic suggestions for early DR detection using OCT images by visualizing deep features. This study suggests that the patients with early DR have greater difference of textures around INL, MEZ and OS layers, which should receive more attention in early DR diagnosis. Overall, the experimental results show the potential of OCT images to detect early-stage DR in a cost-effective and time efficient manner. The application of such a CAD algorithm for DR diagnosis could reduce the rate of vision loss attributed to DR, improve clinical management and create a novel diagnostic workflow for disease detection and referral. For clinical application of our method, further testing and optimization may be necessary to ensure the network’s robustness and minimize false-negative rates. The cross-modality recognition [59–61] is a new promising way for increasing the diagnostic accuracy. In future work, it may also be important to add clinical history and other clinical data sources, i.e. fundus images, for cross-modality early DR detection. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments The work is supported by National Natural Science Foundation of China (Grant Nos. 61672357, 61702337 and U1713214), and the Science and Technology Project of Guangdong Province (Grant No. 2018A050501014). Reference [1] S. Wild, G. Roglic, A. Green, R. Sicree, H. King, Global prevalence of diabetes estimates for the year 20 0 0 and projections for 2030, Diabetes Care 27 (5) (2004) 1047–1053.

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

JID: NEUCOM 10

ARTICLE IN PRESS

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx

[2] C. Day, The rising tide of type 2 diabetes, Br. J. Diabetes Vasc. Dis. 1 (1) (2001) 37–43. [3] J.E. Shaw, R.A. Sicree, P.Z. Zimmet, Global estimates of the prevalence of diabetes for 2010 and 2030, Diabetes Res. Clin. Pract. 87 (1) (2010) 4–14. [4] R.L. Thomas, F. Dunstan, S.D. Luzio, C.S. Roy, S.L. Hale, R.V. North, R.L. Gibbins, D.R. Owens, Incidence of diabetic retinopathy in people with type 2 diabetes mellitus attending the diabetic retinopathy screening service for wales: retrospective analysis, BMJ 344 (2012) e874. [5] C.S. Fox, M.J. Pencina, J.B. Meigs, R.S. Vasan, Y.S. Levitzky, R.Sr D’Agostino, Trends in the incidence of type 2 diabetes mellitus from the 1970s to the 1990s: the framingham heart study, Circulation 113 (25) (2006) 2914–2918. [6] R. Raman, P.K. Rani, R.S Reddi, P. Gnanamoorthy, S. Uthra, G. Kumaramanickavel, T. Sharma, Prevalence of diabetic retinopathy in India: sankara nethralaya diabetic retinopathy epidemiology and molecular genetics study report 2, Ophthalmology 116 (2) (2009) 311–318. [7] D.S.W. Ting, G.C.M. Cheung, T.Y. Wong, Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review, Clin. Experiment. Ophthalmol. 44 (4) (2016) 260–277. [8] J.W. Yau, S.L. Rogers, R. Kawasaki, E.L. Lamoureux, J.W. Kowalski, T. Bek, S.-J. Chen, J.M. Dekker, A. Fletcher, J. Grauslund, Global prevalence and major risk factors of diabetic retinopathy, Diabetes Care 35 (3) (2012) 556–564. [9] E. T. D. R. S. Group, Grading diabetic retinopathy from stereoscopic color fundus photographs—an extension of the modified airlie house classification, Ophthalmology 98 (5) (1991) 786–806. [10] T. Wang, A contribution of image processing to the diagnosis of diabetic retinopathy–detection of exudates in color fundus images of the human retina, IEEE Trans. Med. Imaging 21 (10) (2002) 1236–1243. [11] B. Antal, A. Hajdu, An ensemble-based system for microaneurysm detection and diabetic retinopathy grading, IEEE Trans. Biomed. Eng. 59 (6) (2012) 1720–1726. [12] S.S. Rahim, V. Palade, J. Shuttleworth, C. Jayne, Automatic screening and classification of diabetic retinopathy and maculopathy using fuzzy image processing, Brain Inform. 3 (4) (2016) 1–19. [13] C. Sinthanayothin, J.F. Boyce, T.H. Williamson, H.L. Cook, E. Mensah, S. Lal, D. Usher, Automated detection of diabetic retinopathy on digital fundus images, Diabet. Med. 19 (2) (2002) 105–112. [14] O. Faust, A.U. Rajendra, E.Y.K. Ng, K.H. Ng, J.S. Suri, Algorithms for the automated detection of diabetic retinopathy using digital fundus images: a review, J. Med. Syst. 36 (1) (2012) 145–157. [15] J. Nayak, P.S. Bhat, R. Acharya, C.M. Lim, M. Kagathi, Automated identification of diabetic retinopathy stages using digital fundus images, J. Med. Syst. 32 (2) (2008) 107–115. [16] M. Ramaswamy, D. Anitha, S.P. Kuppamal, R. Sudha, A study and comparison of automated techniques for exudate detection using digital fundus images of human eye: a review for early identification of diabetic retinopathy, Int. J. Comput. Technol. Appl. 02 (05) (2011) 1503–1516. [17] U.R. Acharya, C.M. Lim, E.Y. Ng, C. Chee, T. Tamura, Computer-based detection of diabetes retinopathy stages using digital fundus images, Proc. Inst. Mech. Eng. H 223 (5) (2009) 545–553. [18] S. Ravishankar, A. Jain, and A. Mittal, “Automated Feature Extraction for Early Detection of Diabetic Retinopathy in Fundus Images.” pp. 210–217. [19] A.R. Youssif, A.Z. Ghalwash, A.R. Ghoneim, Optic disk detection from normalized digital fundus images by means of a vessels’ direction matched filter, IEEE Trans. Med. Imaging 27 (1) (2008) 11–18. [20] G. Quellec, K. Charriere, Y. Boudi, B. Cochener, M. Lamard, Deep image mining for diabetic retinopathy screening, Med. Image Anal. 39 (2017) 178–193. [21] H. Pratt, F. Coenen, D.M. Broadbent, S.P. Harding, Y. Zheng, Convolutional neural networks for diabetic retinopathy, Procedia Comput. Sci. 90 (2016) 200–205. [22] Q. Abbas, I. Fondon, A. Sarmiento, S. Jiménez, P. Alemany, Automatic recognition of severity level for diagnosis of diabetic retinopathy using deep visual features, Med. Biol. Eng. Comput. 6 (2017) 1–16. [23] H.H. Vo, and A. Verma, “New Deep Neural Nets for Fine-Grained Diabetic Retinopathy Recognition on Hybrid Color Space.” pp. 209–215. [24] R. Gargeya, T. Leng, Automated identification of diabetic retinopathy using deep learning, Ophthalmology 124 (7) (2017) 962–969. [25] T.Y. Wong, N.M. Bressler, Artificial intelligence with deep learning technology looks into diabetic retinopathy screening, JAMA 316 (22) (2016) 2366–2367. [26] V. Gulshan, L. Peng, M. Coram, M.C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros, Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, JAMA 316 (22) (2016) 2402–2410. [27] A. Zysk, F. Nguyen, A. Oldenburg, D. Marks, S. Boppart, Optical coherence tomography: a review of clinical development from bench to bedside, J. Biomed. Opt. 12 (5) (2007) 051403-1–051403-21. [28] P.A. Keane, P.J. Patel, S. Liakopoulos, F.M. Heussen, S.R. Sadda, A. Tufail, Evaluation of age-related macular degeneration with optical coherence tomography, Surv. Ophthalmol. 57 (5) (2012) 389–414. [29] T. Ilginis, J. Clarke, P.J. Patel, Ophthalmic imaging, Br. Med. Bull. 111 (1) (2014) 77–88. [30] R. Bernardes, P. Serranho, T. Santos, V. Gonçalves, J. Cunha-Vaz, Optical coherence tomography: automatic retina classification through support vector machines, Eur. Ophthalmic Rev. 6 (4) (2012) 200–203. [31] A. Eltanboly, M. Ismail, A. Shalaby, A. Switala, A. El-Bazy, S. Schaal, G. Gimel’Farb, M. El-Azab, A computer aided diagnostic system for detecting

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41] [42]

[43]

[44]

[45]

[46]

[47]

[48]

[49] [50] [51] [52]

[53] [54] [55]

[56]

[57]

diabetic retinopathy in optical coherence tomography images, Med. Phys. 44 (3) (2017) 914–923. M.R. Hee, C.R. Baumal, C.A. Puliafito, J.S. Duker, E. Reichel, J.R. Wilkins, J.G. Coker, J.S. Schuman, E.A. Swanson, J.G. Fujimoto, Optical coherence tomography of age-related macular degeneration and choroidal neovascularization, Ophthalmology 103 (8) (1996) 1260–1270. D. Mitry, C. Bunce, D. Charteris, Anti-vascular endothelial growth factor for macular oedema secondary to branch retinal vein occlusion, Cochrane Database Syst. Rev. 1 (1) (2013) CD009510. A. Pachiyappan, U.N. Das, T.V. Murthy, T. Rao, Automated diagnosis of diabetic retinopathy and glaucoma using fundus and oct images, Lipids Health Dis. 11 (73) (2012). C.S. Lee, D.M. Baughman, and A.Y. Lee, “Deep Learning is Effective for the Classification of oct Images of Normal Versus Age-Related Macular Degeneration,” arXiv:1612.04891, 2016. L. Guillaume, R. Mojdeh, M. Joan, C.Y. Cheung, T.Y. Wong, L. Ecosse, M. Dan, M. Fabrice, S. Désiré, Classification of sd-oct volumes using local binary patterns: experimental validation for DME detection, J. Ophthalmol. 6 (2016) 3298606. P.P. Srinivasan, L.A. Kim, P.S. Mettu, S.W. Cousins, G.M. Comer, J.A. Izatt, S. Farsiu, Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images, Biomed. Opt. Express 5 (10) (2014) 3568–3577. Y.Y. Liu, M. Chen, H. Ishikawa, G. Wollstein, J.S. Schuman, J.M. Rehg, Automated macular pathology diagnosis in retinal oct images using multi-scale spatial pyramid and local binary patterns in texture and shape encoding, Med. Image Anal. 15 (5) (2011) 748–759. R. Asaoka, H. Murata, A. Iwase, M. Araie, Detecting preperimetric glaucoma with standard automated perimetry using a deep learning classifier, Ophthalmology 123 (9) (2016) 1974–1980. M.D. Abràmoff, Y. Lou, A. Erginay, W. Clarida, R. Amelon, J.C. Folk, M. Niemeijer, Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning, Investig. Ophthalmol. Vis. Sci. 57 (13) (2016) 5200–5206. X. Gao, S. Lin, and T.Y. Wong, “Automatic Feature Learning to Grade Nuclear Cataracts Based on Deep Learning.” pp. 632–642. P. Prentašic, M. Heisler, Z. Mammo, S. Lee, A. Merkur, E. Navajas, M.F. Beg, Š. M, S. Loncaric, Segmentation of the foveal microvasculature using deep learning networks, J. Biomed. Opt. 21 (7) (2016) 75008. L. Fang, D. Cunefare, C. Wang, R.H. Guymer, S. Li, S. Farsiu, Automatic segmentation of nine retinal layer boundaries in oct images of non-exudative amd patients using deep learning and graph search, Biomed. Opt. Express 8 (5) (2017) 2732–2744. A.G. Roy, S. Conjeti, K. Spk, D. Sheet, A. Katouzian, C. Wachinger, N. Navab, ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks, Biomed. Opt. Express 8 (8) (2017) 3627–3642. D.S. Kermany, M. Goldbaum, W. Cai, C.C.S. Valentim, H. Liang, S.L. Baxter, A. Mckeown, G. Yang, X. Wu, F. Yan, Identifying medical diagnoses and treatable diseases by image-based deep learning, Cell 172 (5) (2018) 1122–1131.e9. L. Fang, C. Wang, S. Li, H. Rabbani, X. Chen, Z. Liu, Attention to lesion: lesion-aware convolutional neural network for retinal optical coherence tomography image classification, IEEE Trans. Med Imaging 38 (8) (2019) 1959–1970. J. De Fauw, J.R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev, S. Blackwell, H. Askham, X. Glorot, B. O’Donoghue, D. Visentin, G. van den Driessche, B. Lakshminarayanan, C. Meyer, F. Mackinder, S. Bouton, K. Ayoub, R. Chopra, D. King, A. Karthikesalingam, C.O. Hughes, R. Raine, J. Hughes, D.A. Sim, C. Egan, A. Tufail, H. Montgomery, D. Hassabis, G. Rees, T. Back, P.T. Khaw, M. Suleyman, J. Cornebise, P.A. Keane, O. Ronneberger, Clinically applicable deep learning for diagnosis and referral in retinal disease, Nat. Med. 24 (2018) 1342–1350 2018/08/13. G. Huang, Z. Liu, K.Q. Weinberger, L. van der Maaten, Densely connected convolutional networks, in: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017, pp. 2261–2269. J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” arXiv:1709. 01507, 2017. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016. C. Wang, H. Yang, C. Bartz, and C. Meinel, “Image Captioning with Deep Bidirectional LSTMs.” pp. 988–997. L. Mansheng, O. Chunjuan, L. Huan, F. Qing, Image recognition of camellia oleifera diseases based on convolutional neural network & transfer learning, Trans. Chin. Soc. Agric. Eng. 34 (18) (2018) 194–201. Y. Tang, and C. Eliasmith, “Deep Networks for Robust Visual Recognition.” pp. 1055–1062. D.P. Kingma, and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980, 2014. L. Xinting, S. Meixiao, H. Shenghai, L. Lin, Z. Dexi, L. Fan, Repeatability and reproducibility of eight macular intra-retinal layer thicknesses determined by an automated segmentation algorithm using two sd-oct instruments, PLoS One 9 (2) (2014) e87996. A. Bagci, M. Shahidi, R. Ansari, M. Blair, N. Blair, R. Zelkha, Thickness profiles of retinal layers by optical coherence tomography image segmentation, Am. J. Ophthalmol. 146 (5) (2008) 679–687. K. Simonyan, and A. Zisserman, “Very Deep Convolutional Networks for LargeScale Image Recognition,” arXiv:1409.1556, 2014.

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 3, 2019;1:28]

X. Li, L. Shen and M. Shen et al. / Neurocomputing xxx (xxxx) xxx [58] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” arXiv:1610.02357, 2016. [59] C. Peng, N. Wang, J. Li, X. Gao, DLFace: deep local descriptor for cross-modality face recognition, Pattern Recognit. 90 (2019) 161–171. [60] C. Peng, X. Gao, N. Wang, D. Tao, X. Li, J. Li, Multiple representations-based face sketch–photo synthesis, IEEE Trans. Neural Netw. Learn. Syst. 27 (11) (2015) 2201–2215. [61] C. Peng, X. Gao, N. Wang, J. Li, Graphical representation for heterogeneous face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2) (2017) 301–312. Xuechen Li received the B.Sc. degree in electric information engineering from Northwestern Polytechnical University, Xi’an, China, the M.S. degree in detection technology from Guangdong University of Technology, Guangzhou, China, and the Ph.D. degree in information technology from The University of Newcastle, Callaghan, NSW, Australia, in 2009, 2012, and 2016, respectively. He is currently a post-doctor at the College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include medical image processing and machine learning and applications. Linlin Shen received the B.Sc. degree from Shanghai Jiaotong University, Shanghai, China, and the Ph.D. degree from the University of Nottingham, Nottingham, U.K. He was a Research Fellow with the University of Nottingham, working on MRI brain image processing. He is currently a Professor and the Director of the Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China. His research interests include Gabor wavelets, facial recognition, analysis/synthesis, and medical image processing. Prof. Shen was listed as the Most Cited Chinese Researcher by Elsevier. He received the Most Cited Paper Award from the journal of Image and Vision Computing. His cell classification algorithms were the winners of the International Contest on Pattern Recognition Techniques for Indirect Immunofluorescence Images held by ICIP 2013 and ICPR 2016.

11

Meixiao Shen received the B.Sc. and M.Sc. degree in optics from Zhejiang University, Hangzhou, China, and the Ph.D. degree in ophthalmology from Wenzhou Medical University, Wenzhou, China. She was a senior research assistant in Bascom Palmer Eye Institute, working on the OCT system development and its applications in the ophthalmology. She is currently a professor and the principle investigator (PI) of the ocular imaging laboratory, School Ophthalmology and Optometry, Wenzhou Medical University. Her research interests include the development of the Optical coherence tomography (OCT) imaging technology and its application in the diagnosis of ocular disease. Fan Tan received her master’s degree in Ophthalmology from Wenzhou Medical University in 2018. She is currently a Resident Doctor in Department of Ophthalmology at Sichuan University West China Hospital, Sichuan, China. Now she is doing research regarding the clinical application of optical coherence tomography and optical coherence tomography angiography. Ms. Tan’s research interests are diabetic retinopathy.

Connor S. Qiu received his B.Sc. degree in Medical Sciences with Management from Imperial College London, London, UK, graduating with First Class Honours, and his M.B.B.S. degree in Medicine from Imperial College London, London, UK. He was the only undergraduate recipient of the Outstanding Student Achievement Award in the graduating cohort of his year. Currently, he is working as a NHS doctor and he recently won 3rd prize in the Royal College of Ophthalmologists 2018 Essay Prize for Foundation Doctors.

Please cite this article as: X. Li, L. Shen and M. Shen et al., Deep learning based early stage diabetic retinopathy detection using optical coherence tomography, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.08.079