Domain-invariant interpretable fundus image quality assessment

Medical Image Analysis 61 (2020) 101654 Contents lists available at ScienceDirect Medical Image Analysis journal homepage: www.elsevier.com/locate/m...

Download PDF

4MB Sizes 0 Downloads 53 Views

Report

Full Text

Medical Image Analysis 61 (2020) 101654

Contents lists available at ScienceDirect

Medical Image Analysis journal homepage: www.elsevier.com/locate/media

Domain-invariant interpretable fundus image quality assessment Yaxin Shen a, Bin Sheng a,∗, Ruogu Fang b, Huating Li c, Ling Dai a, Skylar Stolte b, Jing Qin d, Weiping Jia c, Dinggang Shen e,f a

Department of Computer Science and Engineering, Shanghai Jiao Tong University, China Department of Biomedical Engineering, University of Florida, United States c Shanghai Jiao Tong University Aﬃliated Sixth People’s Hospital, China d School of Nursing, the Hong Kong Polytechnic University, Hong Kong e Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27510, USA f Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea b

a r t i c l e

i n f o

Article history: Received 2 August 2019 Revised 29 December 2019 Accepted 18 January 2020 Available online 30 January 2020 Keywords: Fundus image quality assessment Domain adaptation Interpretability Multi-task learning

a b s t r a c t Objective and quantitative assessment of fundus image quality is essential for the diagnosis of retinal diseases. The major factors in fundus image quality assessment are image artifact, clarity, and ﬁeld deﬁnition. Unfortunately, most of existing quality assessment methods focus on the quality of overall image, without interpretable quality feedback for real-time adjustment. Furthermore, these models are often sensitive to the speciﬁc imaging devices, and cannot generalize well under different imaging conditions. This paper presents a new multi-task domain adaptation framework to automatically assess fundus image quality. The proposed framework provides interpretable quality assessment with both quantitative scores and quality visualization for potential real-time image recapture with proper adjustment. In particular, the present approach can detect optic disc and fovea structures as landmarks, to assist the assessment through coarse-to-ﬁne feature encoding. The framework also exploit semi-tied adversarial discriminative domain adaptation to make the model generalizable across different data sources. Experimental results demonstrated that the proposed algorithm outperforms different state-of-the-art approaches and achieves an area under the ROC curve of 0.9455 for the overall quality classiﬁcation. © 2020 Elsevier B.V. All rights reserved.

1. Introduction Fundus imaging is the most commonly used modality in diagnosing diabetic retinopathy (DR), glaucoma, age-related macular degeneration, and other retinal diseases (Jelinek and Cree, 2010). The detection of some symptoms, such as microaneurysms and intraretinal microvascular abnormalities are reliant on highquality images where inadequate quality may result in misdiagnoses (Fleming et al., 2006). Recently, automatic diagnosis of retinal diseases has attracted signiﬁcant research and clinical interests to ease the burden on ophthalmologists and enable largescale regular examination of diabetes patients via fundus imaging. Computerized screening can facilitate early diagnosis and timely treatment of DR, which helps to avoid blindness. Here, the image quality plays a key role in automatic diagnosis, especially at early stages when the symptoms are delicate. Despite the continuous optimization in digital fundus cameras, aging, experience,

∗

Corresponding author. E-mail addresses: [email protected] (B. Sheng), [email protected]ﬂ.edu (R. Fang), [email protected] (H. Li), [email protected] (D. Shen). https://doi.org/10.1016/j.media.2020.101654 1361-8415/© 2020 Elsevier B.V. All rights reserved.

lighting, and other non-biological factors resulting from improper operation still results in high percentage of low-quality fundus images, and reacquisition is time-consuming and some times impossible (Fleming et al., 2006; Boucher et al., 2003). Thus real-time feedback on the image quality problem is essential for reacquiring high-quality images on site. Furthermore, large-scale populationlevel screening always involve a variety of fundus cameras and target population under different imaging conditions. Robustness to various devices, eye conditions, and imaging environment is a key for clinical deployment of the automatic image quality assessment (IQA) system. Herein, domain-invariant automatic IQA with realtime interpretable feedback is crucial for ﬁltering ungradable images, improving overall images acquisition quality, and the following automatic diagnosis. While fundus image quality assessment has been investigated for decades, interpretable and domain-invariant IQA is still under exploration. Generally conventional methods (Lalonde et al., 2001b; Kohler et al., 2013) mostly consider speciﬁc factors affecting fundus image quality using hand-crafted features, such as analyzing the brightness and sharpness of the image. These methods neglect features of the holistic picture and do not generalize well

2

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

Fig. 1. Fundus images taken at different sites using different cameras. These images differ in visual information distribution.

to new data. Recently, deep convolutional neural networks (CNNs) have shown promising results in various computer vision tasks. Compared with traditional methods that extract hand-crafted features, deep learning approaches ﬁnd the high-level semantic representations for more comprehensive and robust assessments. Previous approaches (Tennakoon et al., 2016; Yu et al., 2017) based on deep learning usually lack analysis of factors affecting the quality of the images, such as image clarity and artifacts. Our previous work (Yaxin et al., 2018) evaluates the fundus image quality with additional quality factors. However, in clinical applications, existing methodologies cannot solve the problem that the source and target domains have different distributions, which happens when images are taken using different types of camera devices, or vary between sites and individuals as shown in Fig. 1. Hence, automatic assessment of retinal image quality remains an open and challenging research direction. To address the above challenges, a novel multitask deep learning framework is proposed which utilizes domain adaptation and landmark detection to build a domain-invariant, interpretable fundus IQA system. An overview of the proposed architecture is shown in Fig. 2. Speciﬁcally, the image quality is analyzed from three clinically established aspects: image artifact, clarity and ﬁeld deﬁnition. Domain adaptive neural networks with landmark detection is proposed for robust and precise IQA across different devices and imaging conditions. Deep feature learning for discriminative localization is used to visualize problematic locations in the fundus images in each aspect. Experimental results demonstrated that the proposed framework can provide reliable image quality assessment with interpretable quality factor scores. The contributions of the proposed framework are three-fold:

•

•

Interpretable fundus image quality assessment: The fundus image quality assessment is interpreted through three clinically accepted aspects (artifact, clarity, and ﬁeld deﬁnition) and visual feedback for real-time image reacquisition. Coarse-to-ﬁne landmark localization: The architecture is integrated with an eﬃcient coarse-to-ﬁne landmark detection (e.g. optic disc, fovea) for robust quality assessment of each aspect of IQA.

•

Semi-tied adversarial discriminative domain adaptation: A semi-tied adversarial discriminative domain adaptation model is developed to improve the generalization performance across datasets with different distributions.

The remainder of this paper is organized as follows. Section 2 overviews related works. Section 3 describes the dataset preparation and the deﬁnition of fundus image quality criteria. Section 4 presents the proposed quality assessment networks in details. Implementation details and experimental results are presented in Section 5, followed by conclusions in Section 6. 2. Related work This paper focus on no-reference image quality assessment (NR-IQA), which evaluates image quality without original undistorted images. NR-IQA approaches for natural images include algorithms using hand-crafted features (Anush Krishna and Alan Conrad, 2011; Mittal et al., 2012; Fang et al., 2014) and deep neural network methodologies (Ma et al., 2018; Bosse et al., 2017). Ma et al. (2018) proposed a multi-task end-to-end optimized deep neural network (MEON) for NR-IQA. Kim and Lee (2017) used FR-IQA metrics to alleviate the discrepancy between FR-IQA and NR-IQA. Deep Neural Networks for Image Quality Assessment (Bosse et al., 2017) (DeepIQA) allows for joint learning of patch quality and patch weights. Here, this section provides an overview of related work in NR-IQA for fundus images, optic disc and fovea detection, and domain transfer learning. 2.1. Fundus image quality assessment Automated fundus IQA is important to ﬁlter ungradable images before ophthalmologists make diagnosis decisions. IQA methods for natural images cannot be well applied to fundus images. Criteria based on natural image does not consider the position of the quality defects, and the quality assessment of natural images is not related to deep semantic information. More speciﬁcally, NR-IQA methods (Anush Krishna and Alan Conrad, 2011) was based on image statistics, and made an assumption that natural scenes possess certain statistical properties which are altered

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

3

Fig. 2. The framework of the proposed automatic fundus image quality assessment system. C in the yellow dashed box represents a classiﬁer to assess the overall quality and quality factors. Semi-tied domain adaptation method is proposed to solve the problem of distribution differences in source and target data as shown in the blue dashed box. In the green box, class activation maps are utilized to localize the defects. The red box shows the data ﬂow of landmarks localization. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

in the presence of distortion, rendering them un-natural; this kind of approaches characterized this un-naturalness using scene statistics. However the statistical characteristics of natural images are not consistent with those of fundus images. Fundus image quality is closely related to the visibility of speciﬁc landmarks such as the optic disc, fovea and blood vessels, and the geometric relationship between these landmarks is also a key factor to assess image quality. (Pires Dias et al., 2014) deﬁned several quality indicators for fundus images including clarity, ﬁeld deﬁnition, and focus. The ophthalmologists of Shanghai Sixth People’s Hospital further deﬁned a fundus interpretation criteria with artifacts, clarity and ﬁeld deﬁnition factors and gave grading standard to each quality factor according to “Guidelines for image acquisition and interpretation of diabetic retinopathy screening in china” (of Ophthalmologist Branch of The Chinese Medical Doctor Association and Eye Foundation Disease Group, 2017). For fundus image IQA, the current research can be summarized as the following three categories: Global image features Global image features have been explored for fundus IQA, where the overall quality of the image is assessed by simple image analysis methods to avoid complex and time-consuming segmentation of fundus image structures. Lalonde et al. (2001b) used global edge histograms with local intensity measurements to automatically assess the quality of retinal images. Lee and Wang (1999) used a quality index calculated by the convolution of a template intensity histogram. Bartling et al. (2009) analyzed the brightness and sharpness of the image, mainly by dividing the image into non-overlapping square areas and analyzing global image features such as contrast, brightness. Structural image features Structural images features needs segmentation of key structures in fundus images such as vessels, maculars, and optical discs. Usher et al. (2003) were the ﬁrst to use vessel detection and achieved 84.3% sensitivity and 95.0% speciﬁcity in fundus IQA. Hunter et al. (2011) focused on image contrast, blood vessel visibility, and differences between the macular

area and the fundus image background. Although these structurebased approaches are theoretically accurate for the assessment of fundus image quality, it is hard and error prone to perform image segmentation with inadequate image quality Deep learning Deep learning has shown promising performance in a variety of computer vision tasks such as image classiﬁcation, detection, and segmentation, It has also been introduced into fundus IQA in recent years. Tennakoon et al. (2016) proposed shallow CNNs to classify fundus image quality. Yu et al. (2017) extracted unsupervised features from saliency maps and supervised features coming from CNN feature maps (DRIQC). While salient maps reﬂect regions of interest for human interpretation, such as edges and textures, CNN features reﬂect high-level features. Our previous conference publication (Yaxin et al., 2018) analyzed fundus image quality with several quality factors (MFIQA). However, these works do not apply in the cases when the training and target data distributions are different and without providing visual feedback for real-time image re-acquisition. 2.2. Optic disc and fovea detection The optic disc (OD) and the fovea are both key landmarks in fundus images. optic disc is a bright circle in the fundus images representing the starting point of optic nerves. The fovea is a dark region in the fundus image localized in the center of the fovea lutea. For optic disc detection, conventional algorithms usually identify the largest cluster of bright pixels in the retinal images (Lalonde et al., 2001a), using methods including principal component analysis (PCA), active shape models, and leveraging geometric relationship between the optic disc and its main vessel branches. For fovea detetion, Siddalingaswamy and Prabhu (2010) detected the fovea using template matching correlation with typical fovea features. Sagar et al. (2007) detected the fovea based on blood vessels detection by ﬁnding the dark regions from retina surfaces. Salam et al. (2015) localized optic disc us-

4

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

Fig. 3. Images (a)–(c) are ungradable for DR due to image artifacts (a), clarity (b), and ﬁeld deﬁnition (c), respectively. Image (d) is a gradable image.

ing local vessel based features and support vector machine. However, these methods depend on detection or segmentation of certain structures which may be diﬃcult in low-quality images. Recently, CNNs have shown outstanding performance in object detection and localization (Girshick, 2015; Shaoqing et al., 2015; He et al., 2017) by using two-stage training algorithms. These algorithms ﬁrst provides numerous proposals and then reﬁne those candidate object locations for precise detection. These methods work well for multi-object detection tasks, but they are limited in the effectiveness when detecting the optic disc and the fovea of a retina image since a single object does not require a large number of proposals. 2.3. Domain transfer learning Domain transfer learning or domain adaptation is a novel research topic attracting lots of interests in recent years. The goal of domain transfer learning is to train models on a source domain and maintain high performance on a target domain with different data distributions. Recent studies have focused minimizing the distribution differences between the source and target features (Sun and Saenko, 2016; Ghifary et al., 2016). In particular, adversarial learning methods learn a representation to minimize the domain shift through an adversarial loss. Domain adversarial neural networks (DANN) (Ganin et al., 2017) uses a gradient reversal layer to maximize the loss of the domain classiﬁer. Domain separation networks (DSN) (Bousmalis et al., 2016) learns a shared representation by ﬁnding the shared space and private space simultaneously. Adversarial Discriminative Domain Adaptation (ADDA) (Tzeng et al., 2017) borrows the idea of GAN by combining adversarial learning with discriminative modeling. However, there is no study so far addressing the unique characteristics of the domain transfer learning problems in fundus image analysis. 3. Dataset The dataset was collected from patients who participated in the Shanghai Diabetic Retinopathy Screening Program (SDRSP). The assessments of diabetic retinopathy and image quality were conducted by authorized ophthalmologists objectively. According to “Guidelines for artiﬁcial intelligent diabetic retinopathy screening system based on fundus photography” (Intelligent Medicine Special Committee of China Medicine Education Association et al., 2019), ﬁrstly, all the images are labeled by three ophthalmologists. If the labels are exactly the same, it will be regarded as the ground truth. Otherwise, the other two ophthalmologists will conduct a second round of review and make the ﬁnal decision. The image quality was graded according to standards deﬁned in terms of three quality factors: artifact, clarity, and ﬁeld deﬁnition as listed

in Table 1 based on “Guidelines for image acquisition and interpretation of diabetic retinopathy screening in china” (of Ophthalmologist Branch of The Chinese Medical Doctor Association and Eye Foundation Disease Group, 2017). The degrees of the vascular arch are determined by the diameter of the blood vessels and the level of branches. Vascular arch of level I indicate the main vessel branches located near the optic disc, and vascular arch of level II is derived from the level I branches in the vessel tree, and level III vascular arch is the next level of vessel branches. In addition to these quality factors, images are labeled according to whether they are adequate for retinal deceases diagnosis especially diabetic retinopathy (DR). “0” indicates gradable image and “1” indicates ungradable image. Some retinal image samples with quality issues are shown in Fig. 3, and Fig. 4 plots samples with each quality score annotations corresponding to Table 1. Speciﬁcally, image artifacts are deﬁned as irregular patterns (e.g. caused by eyelash shadows and lens stains). Image clarity refers to image blurness which makes lesion detection diﬃcult. Field deﬁnition checks whether the optic disc and fovea are near the image center (Fleming et al., 2006). The used dataset in this study including 18,653 retinal images from SDRSP by stratiﬁed random sampling. All images are 3-channel RGB images with size 4224 × 3168 × 3. The collected dataset is seperated into three domains: •

•

•

images taken from desktop fundus cameras and from Jingan, Yangpu and Songjiang districts in target domain T1 ; images taken from hand-held fundus cameras in target domain T2 ; images taken from desktop fundus cameras and from districts besides Jingan, Yangpu and Songjiang districts in source domain S.

Domain S and T1 were selected from different districts of Shanghai and have subtle regional differences in distributions. Of the selected images, the training dataset contains 10,0 0 0 images from domain S; test dataset includes 1653 images from domain T1 , 60 0 0 images from domain T2 , and 10 0 0 images from domain S. The optic disc and fovea localization tasks used the Indian Diabetic Retinopathy Image Dataset (IDRiD) (Sahasrabuddhe and Meriaudeau, 2018) which includes 516 images without quality defects. The localization task was trained and tested on the IDRiD dataset, then the trained model is applied to the SDRSP dataset for ﬁeld deﬁnition task. 4. Domain-invariant interpretable quality assessment networks 4.1. Overview The proposed domain-invariant, interpretable fundus image quality assessment system has three major novel modules: 1)

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

5

Table 1 Image quality scoring criteria. Type

Image quality speciﬁcation

Score

Artifact

No artifacts Artifacts are outside the aortic arch with scope less than 1/4 of the image Artifacts do not affect the macular area with range less than 1/4 Artifacts cover more than 1/4 but less than 1/2 of the image Artifacts cover more than 1/2 without fully covering the posterior pole Cover the entire posterior pole Only Level I vascular arch is visible Level II vascular arch and a small number of lesions are visible Level III vascular arch and some lesions are visible Level III vascular arch and most lesions are visible Level III vascular arch and all lesions are visible Do not include the OD and fovea Only contain either OD or fovea Only contain either OD or fovea Only contain either OD or fovea The OD and fovea are within 1 PD of the center

0 1 4 6 8 10 1 4 6 8 10 1 4 6 8 10

Clarity

Field Deﬁnition

Fig. 4. Images with quality score labels. From left to right are the scores of artifact, clarity, ﬁeld deﬁnition and overall quality scores.

Multi-task multi-factor image quality assessment with visual feedback; 2) landmarks (optic disc and fovea) detection for robust IQA; 3) semi-tied adversarial discriminative domain adaptation. Fig. 2 provides the overview of these modules in the system.

4.2. Multi-task quality assessment with ensemble weights estimation In this module, the fundus image quality is automatically graded according to standards in terms of three quality factors: artifact, clarity, and ﬁeld deﬁnition as listed in Table 1. Fundus image quality is unevenly distributed over an image and the visibility of the optic disc and fovea is an important index of image quality assessment. Thus, to fully evaluate the quality factors and leverage their inter-connections to assess the overall image quality, the CNN encoders extract the features of global image and local landmarks (optic disc and fovea) simultaneously, and then assign weights to the outputs of each branch to explain their inﬂuences to the ﬁnal predictions. In addition, the proposed algorithm also approximates the ﬁeld deﬁnition task based on the center location of optic disc and fovea. Above all the network can ﬁlter images of inade-

quate quality and provide visual feedback for quality factors analysis. Here, the model conﬁguration is described in detail as bellow. 1) Global image feature extraction: The networks of feature encoders and classiﬁers are shown in Fig. 5. There are two input branches of images in rectangular and polar coordinates, and four output classiﬁcation tasks: overall image quality, artifact, clarity and ﬁeld deﬁnition in the architecture. t(x) is a function transforming input image x into polar coordinate. Subscripts p, r represent polar and rectangular coordinates branches. q, a, c and d denote the image quality, artifact, clarity and ﬁeld deﬁnition tasks. Let E(x) be a function mapping image x to hidden features shared by tasks. The backbone networks for the feature encoder E(x) is VGG-16 as shown in Fig. 6, where the input size is modiﬁed as 800 × 800 × 3 compared with 224 × 224 × 3 in the original architecture. All the convolutional layers use padding to keep the feature size unchanged after 3 × 3 convolution. Through four 2 × 2 max-pooling layers, the feature is downsized from 800 × 800 to 50 × 50. Meanwhile the depth of the output features deepen to 512, since the network is designed with increased number of kernels to extract deep semantic information. Therefore, the network has extracted features with a size of 50 × 50 × 512 through this architecture.

6

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

Fig. 5. Multi-task deep learning architecture. Feature vectors are extracted from rectangular and polar coordinate encoders. RoIs of the optic disc and fovea are pooled into ﬁxed-size feature maps in the rectangular coordinate branch. Quality Estimates are binary classiﬁers for artifact, clarity and overall quality tasks, the predictions from each quality estimate component are weighted aggregated for the ﬁnal estimate. In addition, the results of the ﬁeld deﬁnition only come from the global image branch.

Fig. 6. Architecture of CNN-based fundus image feature encoder.

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

Due to the limitation of quality labeled images, the parameters of the encoder are pre-initialized by DR grading model which is trained without quality assessment. In this way, the network with pre-trained parameters has the capability of fundus image feature extraction, and the issue of data insuﬃciency is addressed to avoid overﬁtting to a speciﬁc task. 2) Global and landmark feature embedding: Because the visibility of the optic disc and fovea is essential for the assessment of overall quality and quality factors, the networks also analyze the local quality of these landmarks. First, the centers of optic disc and fovea are predicted through the landmarks localization model (details are described in Section 4.3). Then it is able to extract square ROI features of both optic disc and fovea with length randomly in range of 80 0–120 0 pixels in raw input image (with size of 4224 × 3168 × 3) according to the object centers. The RoI features are mapped to ﬁxed-size features (10 × 10 × 512) through RoIAlign (He et al., 2017), and then embedded into feature vector of 2048 dimensions through fully connected layer for following classiﬁcations. 3) Classiﬁcation with weights estimation: The classiﬁers include two type of estimate components: quality estimation and weight estimation. The quality estimate component makes multiple binary predictions for the three quality factor tasks. The artifact task works on purpose to classify whether the artifact score is greater than 0, 1, 4, 6 and 8 respectively, while the clarity task and the ﬁeld deﬁnition task are similar based on Table 1. And the overall image quality task makes a binary prediction on whether the image is gradable for DR diagnosis. The quality estimations after the embedded vectors are quality classiﬁcation components. The architecture of each classiﬁer is comprised of two fully-connected layers and then followed by sigmoid layers as described in Fig. 5 in detail. The weight estimate is a regression component which accounts for the inﬂuence of each branch and each classiﬁcation task. The architecture of the weight estimation composes of two fully-connected layers and a ReLU activation function. To guarantee that the weights are positive, the outputs of the fully-connected layers w are activated through ReLU and an additional small term which is set as 0.0 0 0 01 in the experiments.

w∗ = max(0, w ) +

(1)

The ﬁnal visual qualities y of the artifact, clarity and overall quality tasks can be aggregated as:

i∈{ f ovea,OD,image}

yt∈{q,a,c} =

w∗ti hti

i∈{ f ovea,OD,image}

w∗ ti

(2)

where t ∈ {q, a, c} represents the classiﬁcation task, i ∈ {fovea, OD, image} represents the CNN branch, and h denotes the sigmoid output of binary quality classiﬁers. Since the ﬁeld deﬁnition task requires the geometric relationship between optic disc and fovea, the local feature of single landmark will not contribute to the prediction. Therefore, the ﬁeld deﬁnition task is evaluated through the global image branch, together with the predicted optic disc and fovea centers to optimize the outputs of quality estimate. According to Table 1, images with ﬁeld deﬁnition score larger than 6 are considered to be valid on the ﬁeld deﬁnition factor. Therefore, the predictions are corrected based on whether the centers of optic disc and fovea are within 2 papillary diameters (PD) of the center. 4) Loss function: The goal of joint training the multi-task quality assessment network is to minimize loss:

L=

Lt

(3)

t∈{q,a,c,d}

Lq = −

N i=0

(yqi · logyˆqi + (1 − yqi ) · log(1 − yˆqi ))

(4)

7

where t ∈ {q, a, c, d} represents the classiﬁcation task. The overall quality task loss Lq is the negative log-likelihood of the label for q whether it is gradable, where yi ∈ {0, 1} denotes the class label, q yˆi represents the aggregate sigmoid prediction. The losses of artifact, clarity, and ﬁeld deﬁnition tasks are denoted as La , Lc , and Ld . These quality factor tasks are learned by multi-label classiﬁcation with sigmoid cross-entropy loss function as well. 5) Visual feedback: The system also consists a visual feedback module which can localize the image regions with quality problem using Class Activation Mapping (CAM) (Zhou et al., 2016) as shown in the green dashed box of Fig. 2. CAM is generated from global average pooling which can highlight the discriminative defects detected by the CNN. Therefore the quality assessment system not only provides ophthalmologists with quality classiﬁcation results, but also the discriminative regions of quality defects. 4.3. Optic disc and fovea detection Landmarks localization is essential for fundus image quality assessment. Identifying the centers of optic disc (OD) and fovea can be designed as a regression task. The proposed deep regression architecture is depicted in the red dashed box of Figs. 2 and 7 illustrates the architecture with more details. The detections of both optic disc and fovea centers are initially performed through a regression network (global feature encoder). The global CNN encoder localizes both the optic disc and fovea centers simultaneously, which can be therefore used for the selection of the regions of interest (ROIs). And it is followed by two separate neural networks (local feature encoders) to localize the optic disc and fovea centers, respectively. The local encoders are designed as components which focus on single target localization, and reﬁne the predictions of the previous step. In this way, the optic disc and fovea localization tasks are simultaneously modeled. 4.3.1. Global encoder For the global CNN encoder which extracts the features of the entire-image, it localizes the centers of the optic disc and fovea through a CNN encoder with a backbone network of ResNet-50 (He et al., 2015a). To avoid the loss of critical pixel-level information, all the max pooling layers are replaced with average pooling layers compared with the original ResNet architecture. 4.3.2. Local encoder Next step the local features of the optic disc and fovea are extracted separately based on the cropped ROIs. The fundus images in the IDRiD are only labeled with target center coordinates. The fovea is an oral area about 1.5 mm in diameter, and the optic disc is a disc-shaped structure with a diameter of 1.5 mm. The shapes and sizes of the optic disc and fovea are nearly invariant compared with objects in natural images, whereas the sizes of target optic disc and fovea may slightly change due to the operations of photographer and individual differences. Utilizing these prior information, the bounding box can be found based on the predicted center, the approximate shape and scale. The backbone network architecture of the local encoder is VGG16 (Simonyan and Zisserman, 2014). All the max pooling layers are replaced with average pooling layers. Because the input image is resized to 224 × 224 × 3 which is compatible with the CNN architecture, the corresponding feature is deformed as well and may lead to inaccurate predicted localization. Thus the architecture does not use a variety of aspect ratios for the candidate regions, instead, it crops square ROIs directly from the raw image which would not be deformed after resizing. The proposed coarse-to-ﬁne network regresses the target center with reference to ROIs of multiple scales. At the training stage, cropping square ROIs of 4 sizes (H, W ∈ {80 0, 90 0, 10 0 0, 110 0})

8

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

Fig. 7. The deep regression framework of landmarks localization.

dicted region only includes part of the target object in the testing stage. Therefore, the selection strategy augments the training data by 20 times, and through random cropping, the target can be localized wherever the object center is. At the test stage, ROIs are selected by cropping a 950 × 950 × 3 region for the optic disc, and 1050 × 1050 × 3 region for the fovea on the raw image based on the current predicted centers. The ROIs selection is iteratively performed to reﬁne the target predicted center, as illustrated in Fig. 7 step 2. The described algorithm won the ﬁrst place at the IEEE International Symposium on Biomedical Imaging (ISBI) in the 2018 Indian Diabetic Retinopathy Image Dataset (IDRiD) Challenge on fovea and optic disc detection (Sahasrabuddhe and Meriaudeau, 2018). 4.3.3. Learning The input image is normalized by subtracting the mean RGB value. The regression loss for the center location is scaled Euclidean loss:

Lt∈{OD, Fig. 8. ROI selection for the local encoder. The white solid boxes indicate the selection of the training set taking size 800 × 800 × 3 as an example. The yellow dashed box indicates the input ROI at the test stage which is centered on the coordinates predicted by the previous step. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

f ovea}

= (α1t vt1 − vˆ t1 )2 + (α2t vt2 − vˆ t2 )2

(5)

where (v1 , v2 ) is the ground-truth label of target, (vˆ1 , vˆ2 ) is the predicted center, t represents the target object, α 1 and α 2 denote the normalized coeﬃcients. For the global CNN encoder, the loss function is:

Lglobal = LOD + L f ovea

(6)

For the local encoders, the loss function are: on the raw images based on the ground-truth centers of the optic disc, and sizes H, W ∈ {90 0, 10 0 0, 110 0, 120 0} for the fovea because it needs a larger region to determine whether there are blood vessels around. For each scale, images are randomly cropped into ﬁve OD-based patches or ﬁve fovea-based patches to build the training dataset for the local encoder, as shown in Fig. 8. The proposals may not include the whole target, since sometimes the pre-

Ll ocal OD = LOD Ll ocal f ovea = L f ovea

(7)

We adopt the normalized parameterization of the centers:

α1t = 1/wt α2t = 1/ht

(8)

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

9

Fig. 9. An overview of the domain adaptation module in the IQA framework. Source encoders and classiﬁers are trained ﬁrstly. In the following step, we adversarially train the target encoder by aligning the target discriminator distribution to the source distribution. The high-level weights are tied but the low-level weights keep trainable, which is called semi-tied ADDA. At the inference stage, the test images are mapped through the target encoder. The layers in dashed lines remain ﬁxed.

where t ∈ {OD, fovea}, wt and ht denote the average weight and height of optic disc and fovea in the training set. 4.4. Semi-tied adversarial discriminative domain adaptation In clinical application, there are many scenarios where the data distributions are different at the training and test stages, such as when images are collected using different devices. There could also be slight distribution differences when data are collected at different areas, and population differences are also common. To make the model generalize better to new data for practical applications, an unsupervised adversarial discriminative domain adaptation (ADDA) method (Tzeng et al., 2017) is exploited to address the domain shift phenomenon between the training dataset and the test dataset. This methodology is improved to ﬁt for similar source domain and target domain. High level weights are shared between the source mapping and target mapping by considering the similarity of deep semantic features named as semi-tied ADDA as shown in the blue dashed box in Fig. 2. 4.4.1. Adversarial training procedure s Let Xs = {(xi , yi )}N represent labeled dataset of Ns samples in i=0 N

t the source domain, and Xt = {(xi )}i=0 represent unlabeled target images. The propose semi-tied ADDA algorithm minimizes the distance between the encoded representations of source images Xs and target images Xt . The “semi-tied” indicates that high level weights are ﬁxed during the adversarial training considering the similarity in deep semantic features. This approach performs three main steps: (1) pre-train a source model on the source dataset only; (2) learn a target encoder to adapt the distributions between source and target domains via adversarial training; and (3) infer to the target dataset with the target encoder and the shared classiﬁer. The whole procedure is illustrated in Fig. 9. Speciﬁcally, domain discriminator D is a classiﬁer that determines whether an image is from source domain or target domain. The target encoders of rectangular and polar coordinates are learned by alternating minimization between D and target encoders. In the proposed architecture, the discriminator D consists of three fully connected layers: two layers with 4096 units and a ﬁnal discriminator layer. The loss of discriminator is:

LD = −Exs ∼Xs logD(Es (xs )) − Ext ∼Xt log(1 − D(Et (xt )))

(9)

where E(x) is a function that maps image x to a hidden representation shared by tasks, the details of E(x) is illustrated in Fig. 5. The

source encoder Es (xs ) and target encoder Et (xt ) have same network structure with independent weights. In order to train the target encoder adversarially, the target domain labels are inverted as GAN loss. The loss to train the target encoders is:

LE = −Ext ∼Xt logD(Et (xt ))

(10)

4.4.2. Selection of trainable parameters The parameters of source encoders and multi-task classiﬁers remain ﬁxed during adversarial training, since the goal is to learn a target representation matching the source distribution. The learned target representations would probably lead to degenerate solution without weight sharing and proper initialization. Thus, the target encoders are initialized with all the parameters of source encoders in both rectangular and polar coordinates: θtr = θsr , θt p = θsp , where θ denotes the parameters, t and s represent the target domain and source domain, r and p represent the rectangular and polar coordinates, respectively. Fundus images of different regions or taken by different cameras usually differ in low-level features such as color and texture by observation. Thus the target encoders are learned through semi-tied weights strategy, which means, ﬁxing the high-level parameters that extract deep semantic features and untieing the low-level parameters trainable as described in Fig. 9. After learning the target rectangular and target polar encoders, target images can be mapped to the corresponding hidden representations which can be correctly classiﬁed. 5. Experiments and results All the experiments have been performed on Intel Xeon E52630 v4 @ 2.20GHz CPU and NVIDIA Tesla K80 GPU on Ubuntu 16.04. All models were implemented in tensorﬂow using the Stochastic Gradient Descent algorithm with momentum. The multitask quality networks were ﬁne-tuned from the DR grading network and trained for 15 epochs. The semi-tied ADDA training proceeded for another 10 epochs. The mini-batch size was set as 64 and weight decay was set to 0.0 0 01. All the new initialized weights were initialized as in He et al. (2015b). For the optic disc (OD) and fovea localization task, the global encoder was trained for 200 epochs, and local encoders for 30 epochs. The batch sizes were set to 16 and 64 for the global encoder and two local encoders, respectively. The learning rate was set as 0.01, and was divided by 10 when the error plateaued. There are amount of approaches in fundus image processing, for example image standardization, enhancement, color space

10

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654 Table 2 Experimental results of the overall image quality task with different network architecture presented in the paper. (e)–(g) evaluated several domain adaptation methods. Method

Sensitivity

Speciﬁcity

AUC

(a) global image on rectangular branch only (b) global image on both coordinate branches (c) global image with OD and fovea detection without weight estimate (d) global image with OD and fovea detection with weight estimate (e) scenario (d) + gradient reversal (Ganin et al., 2017) (f) scenario (d) + ADDA (Tzeng et al., 2017) (g) scenario (d) + semi-tied ADDA (ours)

0.7778 0.8314 0.8456 0.8479 0.8094 0.8257 0.8362

0.829 0.852 0.8419 0.8645 0.9064 0.9161 0.9172

0.904 0.9259 0.9319 0.9356 0.9426 0.9437 0.9455

Table 3 Experimental results of the overall image quality task compared with existing works. Method

Sensitivity

Speciﬁcity

AUC

Tennakoon et al. (2016) DeepIQA (Bosse et al., 2017) DRIQC (Yu et al., 2017) MEON (Ma et al., 2018) MFIQA (Yaxin et al., 2018) WLDFIQA (ours)

0.7235 0.7907 0.8302 0.8444 0.8 0.8362

0.8127 0.7936 0.8487 0.8452 0.8968 0.9172

0.8907 0.9025 0.9251 0.9306 0.9317 0.9455

conversion (RGB to LAB or RGB to YUV), and vessels detection (Usher et al., 2003) are commonly used in fundus image processing. Whereas some fundus lesions are delicate (i.e. microaneurysms), excessive image processing will cause deformation, and blur out key information. To preserve the original image features for quality assessment, complex image processing methods mentioned above have not been adopted. Instead, we cropped the circular ﬁeld of view (FOV), which was detected by traversing the image from the edges to the center. 5.1. Quality assessment experiments The proposed model was mainly evaluated with test dataset from domain T1 as described in Section 3. The source data and target data were selected from different districts in Shanghai. The multi-task neural networks were evaluated in the following architectures without domain adaptation: (a) global image on rectangular coordinate branch only, (b) global image on the concatenation of rectangular branch and polar branch, (c) global image, optic disc and fovea branches on average aggregation, (d) global image, optic disc and fovea branches with weight estimate. All scenarios were ﬁne-tuned from the DR grading network. In order to verify the validity of the semi-tied ADDA, experiments on different approaches of domain adaption were evaluated: (e) gradient reversal (Ganin et al., 2017), (f) ADDA (untied weights) (Tzeng et al., 2017), and (g) semi-tied ADDA as introduced in Section 4.4. Tables 2 and 3 report the sensitivity, speciﬁcity and AUC for the overall image quality task using each of the comparing approaches. Receiver operating characteristic (ROC) curves of image quality task for different network settings and architectures are shown in Fig. 10. Fig. 10a plots the ROC for the comparison architectures without domain adaptation; the Area Under Curve (AUC) of the proposed architecture as shown in Fig. 5 is 0.9356, which is the highest among the experiments. The results demonstrated that the combination with the local features of the optic disc and fovea could effectively improve the prediction performance, and weighted aggregation of the global image, optic disc and fovea branches improved the performance over simple averaging. The ROC curves of different domain adaption methods are plotted in Fig. 10c. The results indicated that transfer learning can effectively solve the problem of inconsistent data distribution, especially the semi-tied ADDA reached the highest AUC. Fig. 10b

shows the results of the proposed architecture and existing methods without domain adaptation. We replicated the experiments from Tennakoon et al. (2016), Bosse et al. (2017), Yu et al. (2017), Ma et al. (2018) and Yaxin et al. (2018). In conclusion, the proposed model outperforms other state-of-the-art methods by effectively making use of local features and domain adaptation algorithm which obtains a AUC of 0.9455. Fig. 11 plots ROC curves of the quality factor classiﬁcations. The results of the ﬁeld deﬁnition task with score larger than 6 were corrected based on whether the optic disc and fovea are within 2 PD of the center, the AUC value was improved by about 2% as shown in Fig. 11d. Semi-tied ADDA between domain S and T2 were also validated. Images from these two domains were taken from different types of fundus cameras (desktop fundus cameras and hand-held fundus cameras). The experiments included S to S, S to T2 , and S to T1 . All these experiments were evaluated by the proposed architecture without domain adaption as scenario (d), and S to T2 and S to T1 were also experimented with extra semi-tied ADDA method. The results are plotted in Fig. 12a and b, which proved that models with domain adaption increased the prediction performance in different target domains. And because the test set from source data in experiment S to S have same distribution with the training set, the AUC result is highest among these experiments. In addition to the simple classiﬁcation of image quality, the system is also capable of locating areas with quality problems through the component of visual feedback. Ophthalmologists can easily locate the quality defects, such as lens stains, or improper operation. Fig. 13 shows the Class Activation Maps (CAMs) (Zhou et al., 2016) which can interpret the predictions made by the proposed architecture. According to the CAMs, the network is triggered by different regions for different quality factors, which proves that the network can effectively detect image quality problems. The proposed fundus image quality assessment network outperforms all comparing approaches. It combines the prediction results of the global image, optic disc and fovea through weighted aggregation, and is able to shift the distribution of target representations to source distribution by domain adaptation. 5.2. Optic disc and fovea localization experiments For the optic disc and fovea localization, 350 images were randomly selected from the IDRiD dataset as the training set. No external dataset except the pre-trained ImageNet (Deng et al., 2009) network was used for learning. The global CNN encoder, the optic disc regression network and the fovea regression network were trained separately. At the test stage, the local encoder was reused to approximate the ground-truth target center multiple times. To avoid deviation of the output for any encoder, the predictions of previous steps and current step were merged: 0 0 Pin = Pout

i Pin

i∈{1,2,3,4}

(11) =

1 i 1 P + P i−1 2 out 2 out

(12)

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

11

Fig. 10. ROC plots image quality tasks. (a) plots the experimental results among different architectures presented in Section 4.2 without domain transfer, (b) plots the results of existing methods and (c) plots the results of different domain adaptation methods.

Fig. 11. ROC plots quality factor tasks. (a) plots the ROC curves of artifact classiﬁcations, (b) plots the ROC curves of clarity task, and (c) plots the ROC curves of ﬁeld deﬁnition. ROC_i plots ROC curve of binary classiﬁcation on whether quality factor score is greater than score i. (d) plots the ROC curve of score 6 for ﬁeld deﬁnition corrected by the optic disc and fovea localization.

12

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

Fig. 12. ROC plots the quality task results of adaptation from source domain S to target domain (a) T1 and (b) T2 .

Fig. 13. Class activation maps of fundus images with different quality problems. For different quality factors, especially artifacts, the defect areas can be effectively localized. For example, the CAM of (a) highlights the lens stains, (b)(d) show the CAMs of images with ring artifact. Table 4 Euclidean distances between manual (ground-truth) and automatically localized optic disc coordinates. Method

ED/train

ED/validation

ED/test

Mask R-CNN (He et al., 2017) Vessels based (Salam et al., 2015) only global CNN encoder global + local global + local (2 times) without data augmentation global + local (2 times)

21.556 22.712 54.731 20.979 29.435 18.093

26.918 26.638 95.369 22.657 34.167 19.515

36.22 33.538 90.425 25.603 39.286 21.072

where P represents the predicted optic disc or fovea center coordinates, the subscripts out refers to the output coordinates, in refers to the center of the ROI as the input image for next step, and i denotes the step, where i ∈ {1, 2, 3, 4}. Pout_0 denotes the output predictions from the global encoder, Pout_i denotes the output predictions from the local encoders. Because the training set was randomly cropped, the local encoder could detect the target even if the target object is not completely revealed in the ROI. As shown in Fig. 14, we tested on different iterations of the local encoders to measure the time complexity and model performance, and chose to iterate twice for the local encoders considering the time complexity and accuracy.

The experimental results of the optic disc and fovea localization are presented in Tables 4 and 5. The proposed architecture produced the state-of-the-art results on IDRiD. From the experiments, the performance of the single global encoder was inferior to the networks that combine both the global and local encoders. In order to ensure that the inputs of the local encoders are valid, we measured the largest errors for the optic disc and fovea center coordinates when using only the global CNN encoder, and found that the proposed global CNN encoder can robustly catch the regions containing the target optic disc and fovea. The local encoder networks without data augmentation were also experimented, in which we only cropped each fundus image with a single OD-based

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

13

Table 5 Euclidean distance between manual (ground-truth) and automatically localized fovea coordinates. Method

ED/train

ED/validation

ED/test

Mask R-CNN (He et al., 2017) Vessels based (Salam et al., 2015) only global CNN encoder global + local global + local (2 times) without data augmentation global + local (2 times)

54.31 36.53 57.186 38.345 42.843 36.345

72.643 39.208 133.658 42.102 47.072 37.658

85.40 68.466 129.329 64.777 71.09 64.492

Fig. 14. Euclidean distances on the validation dataset between ground-truth labels and predicted coordinates are plotted with different iterations of the local encoder. The red line refers to the ED changes of the fovea over iterations, and the green line refers to the optic disc. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

image and a fovea based image. The results demonstrated the necessity of data augmentation. 5.3. Discussion The proposed technique can beneﬁt the automatic analysis system of fundus image; for instance, it can be used at the initial grading stage of a diabetic retinopathy (DR) screening program. The technique is able to ﬁlter out images with inadequate qual-

ity as shown in Fig. 15. Experiments on the DR grading task with quality assessment were conducted to verify the positive impact of fundus image quality assessment on the performance of a fundus image auto-analysis system. Comparison on the DR grading task with or without quality assessment is shown in Fig. 16, from left to right, it plots the ROC curve of the following classiﬁcations: (1) NO DR, (2) mild NPDR or worse diabetic retinopathy, and (3) moderate or worse diabetic retinopathy, under image quality factor assessment or overall quality assessment. For each quality factor, such as artifact, a smaller score indicates greater improvement in the DR grading results, which is opposite for the clarity and ﬁeld deﬁnition. Considering the fact that as the ﬁltering requirements become more stringent, the valid pictures after ﬁltering will be fewer, and the burden of the second shooting will increase. Actually, the slight quality problem does not seriously affect the performance of DR grading. Combining the experimental results, the corresponding proportion of the ﬁltered gradable images, and the clinical requirements set by professional ophthalmologist, scores 4, 6 and 6 are selected as the thresholds for artifact, clarity, and ﬁeld deﬁnition to separate whether image quality or the corresponding quality factor is adequate for DR grading. In the datasets, 13.7% images had artifact scores larger than 4, 14.1% images had clarity scores less than 6, and 0.86% images had ﬁeld deﬁnitions score less than 6. As shown in Fig. 16, with the overall fundus image quality evaluation, the AUC for the classiﬁcation of NO DR was improved by 0.65%; for the mild NPDR or worse diabetic retinopathy classiﬁcation, the AUC was increased by 1.01%; the AUC for moderate or worse diabetic retinopathy was improved by 0.29%. These ﬁgures also plot the DR grading results after quality factors assessment; all the experimental results were greater than those for the DR task only without quality assessment. The experimental results demonstrated that the proposed method could effectively ﬁlter out in-

Fig. 15. Application examples of fundus image quality assessment system. Images in the ﬁrst line are in inadequate quality selected by the proposed system. Images in the second row are of the second shot to the same eyes under the reference of the quality factor results.

14

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654

Fig. 16. ROC plots DR grading tasks. (a) plots the results of NO DR, (b) plots the results of Mild or worse DR and (c) plots Moderate or worse DR. Each ﬁgure demonstrates the expermental results under 5 quality conditions including (1) without quality assessment, (2) with artifact assessment, (3) with clarity assessment, (4) with ﬁeld deﬁnition assessment and (5) with overall image quality assessment.

adequate quality images and retain images which are gradable for automatic DR classiﬁcation.

6. Conclusion This paper presented a novel fundus image quality assessment approach using landmarks detection and semi-tied adversarial discriminative domain adaptation. The proposed approach evaluates the fundus image quality by the predictions of the overall quality task with quality factor analysis in terms of artifact, clarity, and ﬁeld deﬁnition. The proposed framework integrates polar transformation, landmarks localization, and domain adaptation. The fundus images are analyzed through the weighted aggregation of the global image branch and local landmarks branches. In the landmark detection, the optic disc and fovea are localized through a coarse-to ﬁne network. We also developed semi-tied domain adaptation for the image quality assessment that can improve the generalize performance between the source domain and target domain. The experimental results demonstrated that the proposed architecture improves the performance of state-of-the-art methods on the fundus image quality assessment especially when the test set and training set have large distribution differences. Since image quality is one of the essential prerequisites for diabetic retinal images 540 grading, this proposed image quality assessment module can be integrated with an automated retinal image grading system for DR screening programs.

Declaration of Competing Interest This manuscript has not been published or presented elsewhere and is not under consideration by another journal. We have read and understood your journal’s policies, and we believe that neither the manuscript nor the study violates any of these. There are no conﬂicts of interest to declare.

Acknowledgments This work was supported in part by the National Natural Science Foundation of China under Grant 61872241, Grant 61572316, in part by the Science and Technology Commission of Shanghai Municipality under Grant 1841075070 0, Grant 1741195260 0, and Grant 16DZ0501100, and in part by The Hong Kong Polytechnic University under Grant P0030419 and Grant P0030929.

References Anush Krishna, M., Alan Conrad, B., 2011. Blind image quality assessment: from natural scene statistics to perceptual quality. IEEE Trans. Image Process. 20 (12), 3350–3364. Bartling, H., Wanger, P., Martin, L., 2009. Automated quality evaluation of digital fundus photographs.. Acta Ophthalmol. 87 (6), 643–647. Bosse, S., Maniry, D., Muller, K.R., Wiegand, T., Samek, W., 2017. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Transactions on Image Processing PP (99). 1–1 Boucher, M.C., Gresset, J.A., Angioi, K., Olivier, S., 2003. Effectiveness and safety of screening for diabetic retinopathy with two nonmydriatic digital images compared with the seven standard stereoscopic photographic ﬁelds.. Can. J. Ophthalmol. 38 (7), 557–568. Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D., 2016. Domain separation networks. In: Advances in Neural Information Processing Systems, pp. 343–351. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., 2009. ImageNet: a large-scale hierarchical image database. CVPR09. Fang, Y., Ma, K., Zhou, W., Lin, W., Zhai, G., 2014. No-reference quality assessment of contrast-distorted images based on natural scene statistics. IEEE Signal Process. Lett. 22 (7), 838–842. Fleming, A.D., Philip, S., Goatman, K.A., Olson, J.A., Sharp, P.F., 2006. Automated assessment of diabetic retinal image quality based on clarity and ﬁeld deﬁnition. Invest Ophthalmol. Vis. Sci. 47 (3), 1120–1125. Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., Lempitsky, V., 2017. Domain-adversarial training of neural networks. Journal of Machine Learning Research 17 (1). 2096–2030 Ghifary, M., Kleijn, W.B., Zhang, M., Balduzzi, D., Li, W., 2016. Deep reconstruction– classiﬁcation networks for unsupervised domain adaptation. In: European Conference on Computer Vision, pp. 597–613. Girshick, R., 2015. Fast R-CNN. In: International Conference on Computer Vision (ICCV). He, K., Gkioxari, G., Dollar, P., Girshick, R., 2017. Mask R-CNN. pp. 2980–2988. doi:10. 1109/ICCV.2017.322 He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep residual learning for image recognition. In: CVPR, pp. 770–778. He, K., Zhang, X., Ren, S., Sun, J., 2015. Delving deep into rectiﬁers: surpassing human-level performance on imagenet classiﬁcation. In: ICCV, pp. 1026–1034. Hunter, A., Lowell, J.A., Habib, M., Ryder, B., 2011. An automated retinal image quality grading algorithm. In: International Conference of the IEEE Engineering in Medicine and Biology Society, p. 5955. Intelligent Medicine Special Committee of China Medicine Education Association, N. K. R., of China Development, D. P., of Ophthalmic Multimodal Imaging, A., Diagnosis, A. I., Team, T. S. P., 2019. Guidelines for artiﬁcial intelligent diabetic retinopathy screening system based on fundus photography. doi:10.3760/cma.j. issn.2095-0160.2019.08.001. Jelinek, H.F., Cree, M.J., 2010. Automated Image Detection of Retinal Pathology. CRC Press, Boca Raton. Kim, J., Lee, S., 2017. Fully deep blind image quality predictor. IEEE J. Sel. Top. Signal Process. 11 (1), 206–220. doi:10.1109/JSTSP.2016.2639328. Kohler, T., Budai, A., Kraus, M.F., Odstrcilik, J., 2013. Automatic no-reference quality assessment for retinal fundus images using vessel segmentation. In: IEEE International Symposium on Computer-Based Medical Systems, pp. 95–100. Lalonde, M., Beaulieu, M., Gagnon, L., 2001. Fast and robust optic disc detection using pyramidal decomposition and Hausdorff-based template matching. IEEE Trans. Med. Imaging 20 (11), 1193–1200. doi:10.1109/42.963823. Lalonde, M., Gagnon, L., Boucher, M.C., 2001. Automatic visual quality assessment in optical fundus images. Vision Interface.

Y. Shen, B. Sheng and R. Fang et al. / Medical Image Analysis 61 (2020) 101654 Lee, S.C., Wang, Y., 1999. Automatic retinal image quality assessment and enhancement. Proc. SPIE 3661, 1581–1590. Ma, K., Liu, W., Zhang, K., Duanmu, Z., Wang, Z., Zuo, W., 2018. End-to-end blind image quality assessment using deep neural networks. IEEE Transactions on Image Processing PP (99). 1–1 Mittal, A., Moorthy, A.K., Bovik, A.C., 2012. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21 (12), 4695. Ophthalmologist Branch of The Chinese Medical Doctor Association, E. F. D. C., Eye Foundation Disease Group, ophthalmology branch, C. M. A., 2017. Guidelines for image acquisition and interpretation of diabetic retinopathy screening in china (2017). doi:10.3760/cma.j.issn.0412-4081.2017.12.003 Pires Dias, J.M., Oliveira, C.M., da Silva Cruz, L.A., 2014. Retinal image quality assessment using generic image quality indicators. Inf. Fusion 19, 73–90. Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., Meriaudeau, F., 2018. Indian diabetic retinopathy image dataset (IDRID). doi:10. 21227/H25W98 Sagar, A.V., Balasubramanian, S., Chandrasekaran, V., 2007. Automatic detection of anatomical structures in digital fundus retinal images. In: IAPR Conference on Machine Vision Applications, pp. 483–486. Salam, A.A., Akram, M.U., Abbas, S., Anwar, S.M., 2015. Optic disc localization using local vessel based features and support vector machine. In: IEEE International Conference on Bioinformatics and Bioengineering. Shaoqing, R., Kaiming, H., Ross, G., Jian, S., 2015. Faster R-CNN: towards real-time object detection with region proposal networks. arXiv:1506.01497.

15

Siddalingaswamy, P.C., Prabhu, K.G., 2010. Automatic grading of diabetic maculopathy severity levels. In: International Conference on Systems in Medicine and Biology, pp. 331–334. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. Comput. Sci.. Sun, B., Saenko, K., 2016. Deep coral: correlation alignment for deep domain adaptation. In: European Conference on Computer Vision, pp. 443–450. Tennakoon, R., Mahapatra, D., Roy, P., Sedai, S., Garnavi, R., 2016. Image quality classiﬁcation for dr screening using convolutional neural networks. In: MICCAI Workshop on OMIA 2016, pp. 113–120. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T., 2017. Adversarial discriminative domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2962–2971. Usher, D., Himaga, M., Dumskyj, M., 2003. Automated assessment of digital fundus image quality using detected vessel area, 81–84. Yaxin, S., Ruogu, F., Bin, S., Ling, D., Huating, L., Jing, Q., Qiang, W., Weiping, J., 2018. Multi-task fundus image quality assessment via transfer learning and landmarks detection. In: MICCAI Workshop on MLMI, pp. 28–36. Yu, F.L., Sun, J., Li, A., Cheng, J., Cheng, W., Liu, J., 2017. Image quality classiﬁcation for dr screening using deep learning.. In: Engineering in Medicine and Biology Society, pp. 664–667. Zhou, B., Khosla, A., A., L., Oliva, A., Torralba, A., 2016. Learning deep features for discriminative localization.. CVPR.

Domain-invariant interpretable fundus image quality assessment

Domain-invariant interpretable fundus image quality assessment

Recommend Documents