Deep adversarial model for musculoskeletal quality evaluation

Deep adversarial model for musculoskeletal quality evaluation

Information Processing and Management 57 (2020) 102146 Contents lists available at ScienceDirect Information Processing and Management journal homep...

1MB Sizes 0 Downloads 36 Views

Information Processing and Management 57 (2020) 102146

Contents lists available at ScienceDirect

Information Processing and Management journal homepage: www.elsevier.com/locate/infoproman

Deep adversarial model for musculoskeletal quality evaluation Shenglong Li

T

Department of Bone and Soft Tissue Tumor Surgery, Cancer Hospital of China Medical University, Liaoning Cancer Hospital & Institute, Shenyang, Liaoning Province, China

ARTICLE INFO

ABSTRACT

Index Terms: Health evaluation Deep neural network Musculoskeletal abnormal Adversarial learning

Radiographic images are commonly used in medical imaging techniques. Interpretation and diagnosis of radiographic images are essential for the treatment of disease. However, it is a timeconsuming task for radiologists to interpret a large number of radiological images, so it is significant to develop deep learning techniques to evaluate abnormal parts in radiographic images automatically. With the releasing of the musculoskeletal X-ray image dataset MURA, the evaluation of skeletal muscle abnormal sites in radiographic images has received increasing attention. In this paper, we propose a deep neural network based method for evaluating musculoskeletal quality and finding abnormal sites in radiographic images. We develop a deep dilated convolutional neural network (CNN) for automatic learning of visual features that are highly related to musculoskeletal quality. Based on the quality evaluation results, the model is able to locate abnormal sites. To improve the performance of the method, we introduce an adversarial learning based model to guide its training process iteratively. We test the performance of the proposed method on the standard dataset for musculoskeletal abnormal evaluation. Experimental results are compared with state-of-the-art methods, showing that the proposed method exhibits impressive performance on all of the test classes.

1. Introduction Medical images are attracting more and more emphasis on the diagnosis of diseases. With the advancement of medical digitalization, doctors rely more on the use of medical images collected by various devices for disease diagnosis. However, the analysis of medical images often takes a lot of time. Therefore, the wide application of medical images is increasing the workload of doctors as well as the diagnostic accuracy. Therefore, computer-aided diagnosis (CAD) (Giger, Chan, & Boone, 2008; Lodwick, 1966) techniques are designed to assist doctors in medical image analysis. The utilization of CAD can reduce the diagnostic time as well as the misdiagnosis rate. In recent years, we have witnessed that CAD techniques can effectively reduce the doctor's diagnosis time for each patient and improve diagnosis accuracy. At the same time, the diagnosis results become more credible, and the doctor-patient dispute is reduced. Convolutional neural networks (CNNs) (Krizhevsky, Sutskever, & Hinton, 2012; LeCun et al., 1990) have demonstrated impressive performance in various research areas in the last decade. CNNs are known to have strong generalization ability. A CNN structure that is proved useful in one medical image processing task can often be modified and applied to another task. For example, it is possible that a neural network structure for diagnosing retinal diabetes is directly used for diagnosing cerebral hemorrhage, while only the training data need to be switched. Also, the network weights for one classification task can be migrated to assist the model training in another task, through transfer learning techniques. The strong generalization ability of CNNs makes them widely used in medial

E-mail address: [email protected]. https://doi.org/10.1016/j.ipm.2019.102146 Received 10 July 2019; Received in revised form 11 October 2019; Accepted 14 October 2019 Available online 24 October 2019 0306-4573/ © 2019 Elsevier Ltd. All rights reserved.

Information Processing and Management 57 (2020) 102146

S. Li

tasks, including image classification and segmentation. Considering the unique characteristics of medical images, it is crucial to fully exploit the performance of CNNs to better assist doctors in diagnosis, reduce diagnosis time, and improve diagnostic accuracy. Research to improve the performance of deep learning techniques on medical images has been extensively studied. Medicalrelated image tasks have higher requirements in terms of processing detail and precision than ordinary image tasks. Since there are dimensional differences between various types of medical data, it is often difficult to verify the capabilities of the employed network. In this paper, we propose an adversarial based method to train deep CNNs for musculoskeletal abnormal quality evaluation. The major contributions of this paper can be summarized as follows: 1) We proposed a dilated dense convolutional network to automatically extract important visual features in radiographic images. The visual features are leveraged to determine the health quality of the input radiographic image, and abnormal detection is performed based on the quality evaluation results. Dilated convolutional kernels are introduced to the network to increase the receptive field while the image detail is retained. We apply convolutional kernels of different dilations to form a dense block and use dense connections between the blocks to enhance their feature reuse. This modification offers the network the ability to obtain higher overall accuracy than conventional convolutional networks. 2) We introduce adversarial learning techniques to improve the model performance in real time. A generator is trained to produce fake samples for adversarial learning, and a discriminator evaluates the performance of both the generator and a classifier at the same time. The adversarial model gives a guide to both the quality evaluation network and abnormal detection network to generate discriminative features from the input image, and both networks enjoy the benefits of the enhanced samples that are generated by unsupervised learning. 3) The overall framework can be jointly trained in an end-to-end fashion. We develop a loss function that considers influences from both supervised and unsupervised processes. This property gives the model more freedom in adjusting the learnable parameters based on the training data and increases the generalization ability of the learned model. 2. Related work 2.1. Computer aided diagnosis in medicine science In recent years, the community has witnessed a rapid development of medical imaging technologies such as CT, X-ray, MRI, and ultrasound. Nowadays, imaging techniques have gradually become an essential means of many research areas, including tumor detection, staging, and so on. However, a large amount of image data and functional imaging data analysis makes the task of disease diagnosis more complicated and increases the complexity of disease diagnosis and dependence on physician experience. In 1966, Lodwick proposed the concept of computer-aided diagnosis (CAD) that use computers to analyze medical images (Lodwick, 1966). However, due to the limitation of technology level, the research on CAD in the next ten years has been slow. Until the 1980s and 1990s, thanks to the quick development of computer technology, mathematical algorithms, and statistics, CAD has developed rapidly in the field of medical imaging diagnosis, and CAD research for different diseases has emerged, especially computer-aided diagnosis (CADx) (Giger et al., 2008). In terms of medical image segmentation, manual segmentation is proved to be time-consuming and often subjective. Therefore, automatic segmentation has attracted great emphasis in medical image analysis. While the previous efforts in CADx mainly used traditional computer vision techniques or shallow neural networks, in recent years, significant progress has been made in integrating deep learning techniques with CADx. Wang et al. introduced a hospital-scale chest X-ray database named ChestX-ray, which provides benchmarks for weak surveillance classification and common chest disease location (Wang et al., 2017). Deep CNNs are leveraged in Wang et al. (2017) to generate heat maps for localization. Rajpurkar et al. trained a 121-layer dense CNN named CheXNet on the ChestX-ray dataset to generate radiologist-level diagnosis results (Pranav Rajpurkar et al., 2017). Latter, Rajpurkar et al. carried out the MURA dataset for the detection of radiologist level abnormalities in muscle-bone radiographs (Pranav Rajpurkar et al., 2017). MURA is provided to the community to host contests to demonstrate if the CNN models can outperform board-certified radiologists in medical imaging. 2.2. Deep neural network With the quick emergence of deep neural networks (DNNs), neural network based deep learning technologies have begun to shine in many research fields, showing excellent performance. For example, Krizhevsky et al. achieved state-of-the-art performance on ImageNet classification competition using a deep convolutional neural network (CNN) (Krizhevsky et al., 2012). A lot of new CNN architectures and algorithms are proposed afterward, including VGG-Net (Simonyan & Zisserman, 2014), GoogleNet (Szegedy et al., 2015), residual network (ResNet) (He et al., 2016), etc. CNNs also exhibit excellent results in other image-related research areas, including object detection, image segmentation, and image super-resolution reconstruction. CNNs are potent tools for processing images and other spatial structures, and recurrent neural networks (RNNs) are more feasible in handling problems with temporal structure patterns. In 2005, Graves et al. successfully applied bi-directional RNN (BRNN) (Schuster & Paliwal, 1997) in framewise phoneme classification tasks (Graves & Schmidhuber, 2005). In 2009, Graves et al. used RNNs in handwriting digits recognition, improving the accuracy of several test sets (Graves & Schmidhuber, 2009). In 2010, Mikolov et al. demonstrated that RNN-based language model could yield superior performance to traditional N-gram models (Mikolov, Karafiát, Burget, Černocký, & Khudanpur, 2010). In 2014, Cho et al. proposed the gated recurrent unit (GRU) that is designed as a simplified RNN model with gated structures. They applied it in speech recognition tasks and demonstrated impressive performance (Cho et al., 2014). 2

Information Processing and Management 57 (2020) 102146

S. Li

Recent researches have shown that if there is a "shortcut" (skip connection) between adjacent layers in a deep neural network, we can construct a deeper network that can be trained more effectively. For example, Hochreiter et al. designed the long short-term memory (LSTM) structure to alleviate the gradient vanishing problem and improve the memory capacity of the RNN (Hochreiter & Schmidhuber, 1997). The concept of constant error carrousel (CEC) is introduced in LSTM that enables the network to remember long history. In CNN domain, highway network (Srivastava, Greff, & Schmidhuber, 2015) and ResNet (He et al., 2016) have adopted similar ideas to construct deeper networks. However, the segmentation task of medical images has only a small number of segmentation types. Higher segmentation accuracy is often more critical, which makes the nonlinearity of larger depths difficult to show advantages. The densely connected convolutional network (DenseNet) (Iandola et al., 2014) uses a more straightforward feature transfer method based on the residual network. In DenseNet, feature reuse is achieved by directly connecting the earlier layer feature maps to the latter layer. 2.3. Adversarial learning Ian J. Goodfellow proposed a generative adversarial network (GAN) model in 2014 (Goodfellow et al., 2014). Since then, GAN has demonstrated superior performance in deep generation models. The GAN model can be used to generate models based on adversarial processes. Both generative models and a discriminative model are learned. The generative model learns the joint distribution of the observed data, while the discriminative model learns the distribution of non-observed variables under the premise of known observed variables (Goodfellow et al., 2014). Since proposed, adversarial networks have achieved remarkable results in research areas, including natural image generation, image style migration, and image-to-text generation. Many GAN variants have been proposed for different interests. The innovations of these models include model structure improvement, theoretical expansion, and application. Gradient vanishing problem often happens in GAN training when the optimization target is discontinuous. Wasserstein GAN (W-GAN) is proposed by Arjovsky et al., which replaces the traditional JensenShannon divergence with Earth-Mover in measuring the distribution distance that separates the real and generated samples (Arjovsky, Chintala, & Bottou, 2017). Also, since the discriminator of GAN has infinite modeling ability, it tends to distinguish real samples and generated samples no matter how complex the target space is. Qi et al. proposed Loss-sensitive GAN (LS-GAN) that sets a constraint on the loss function by enforcing the objective function to meet on the Lipschitz conditions; the author then gives a quantitative analysis results on the gradient vanishing problem (Qi, 2017). Odena et al. designed a semi-GAN that takes advantage of the labels of real samples when training the discriminator (Odena, 2016). Conditional GAN (CGAN) proposed to add a label or other auxiliary information in training the model (Mirza & Osindero, 2014). Donahue et al. proposed bidirectional GANs (BiGANs) that adds a decoder Q for mapping real data x to hidden variable space (Donahue, Krähenbühl, & Darrell, 2016). InfoGAN is proposed to consider the mutual information between specific semantics and the input of the hidden layer variables (Chen et al., 2016). The auxiliary classifier GAN (AC-GAN) is proposed by Odena et al. to solve the multi-classification problem, while its discriminator outputs a probability for each corresponding label (Odena, Olah, & Shlens, 2017). Considering that the output of GAN is a continuous real number distribution, the distribution of discrete space cannot be generated, Yu et al. proposed a generation model Seq-GAN capable of generating discrete sequences (Yu, Zhang, Wang, & Yu, 2017). 3. The proposed model In this paper, we offer a deep model to automatically evaluate the health quality in skeletal images and detect abnormal structure. The overall framework of the proposed model is illustrated in Fig. 1.

Fig. 1. The overall framework of the proposed adversarial learning method. 3

Information Processing and Management 57 (2020) 102146

S. Li

Fig. 2. An illustration of different dilations of a 3 × 3 filter. Colored squares denote the positions that the filter operates on. Note that the receptive field increases as the dilation is incremented.

3.1. Dilated DenseNet The health quality evaluation and detection of skeletal abnormal structures requires that the skeletal structure is continuous. To obtain more accurate results, we leverage a DenseNet with dilated convolutional kernels for feature extraction. By introducing dilated convolutional kernels to the network, we can increase the receptive field while avoiding the loss of image detail. Fig. 2 shows the effect of introducing dilation in a 3 × 3 filter. In traditional CNNs, the pooling layer is often used to obtain a sizeable receptive field as well as compressed feature maps. The types of pooling layers include max-pooling, average-pooling, and stochastic pooling. The pooling operations often result in loss of details of the image feature. Therefore, we introduce dilated convolutional kernels to replace the pooling layers in the network. Specifically, when a k × k pooling operation is needed in the network, we replace the traditional convolutional kernel with a convolutional kernel of dilation k and remove the following pooling layer. Considering that a standard convolutional kernel is a convolutional kernel of dilation 1, we are using dilated convolutional kernels that the dilation grows accordingly. Fisher et al. used dilated residual networks (DRN) (Yu, Koltun, & Funkhouser, 2017) with expanded convolution combined with residual structure and proved its structural validity. Here we choose densely connected structures and still use densely connected block structures. Our network structure has a skeleton based on a DenseNet. In this structure, we use convolutional kernels of different dilations to form a block and use dense connections between the blocks to enhance their feature reuse. Considering a densely connected network, dense connections within a block pass features directly to subsequent layers. That is, feature reuse or reconstruction features may have partial common reasons for the performance of the network. While using dilated convolution instead of the pooling layer to obtain the same receptive field, the resulting convolutional neural network also acquires a new feature, that is, the feature map produced by each layer has a consistent size. This new feature offers the network to obtain higher accuracy than conventional networks. At the same time, a consistent feature map size allows us to have dense connections directly to intermediate layers. In the absence of a pooling layer, each point on the feature map of every middle layer corresponds to the same point on the feature map at the output layer. Therefore, we also use dense connections between each block. This allows information and gradients to flow better in the convolutional neural network. Finally, the dilated DenseNet proposed is shown in Fig. 3.

Fig. 3. The dilated DenseNet proposed in the paper. 4

Information Processing and Management 57 (2020) 102146

S. Li

3.2. Adversarial learning For the task of health quality evaluation and abnormal structure detection in skeletal images, we leverage a quality-aware adversarial learning technique. We modified the structure of the dilated DenseNet to make it work as a discriminator and trained another generator to produce fake samples for adversarial learning. The discriminator here is a quality model that evaluates the performance of both the generator and a classifier. Specifically, the output layer of the network is designed as a softmax classifier of K + 1 classes, here, K is the number of classes in the training data. Note that we have K = 2 in the case that the problem is a binary classification task. The (K + 1) th class denotes the probability that the input to the discriminator is a fake sample, which is a sample output by the generator. This quality model is based on unsupervised learning principles since the model can take advantage of both labeled training samples and unlabeled generated samples. We define the loss function of the quality model as follows.

L = LU + LS = x , y Pdata (x , y) [log pmodel (y|x )] x G [log pmodel (y

(1)

= K + 1|x )]

where

LS = LU = +

x

x , y Pdata (x , y) [log pmodel (y|x ,

{

x , y Pdata (x , y) }log[1

(2)

y < K + 1)]

pmodel (y = K + 1|x )] (3)

G [log pmodel (y = K + 1|x )]

In (1) LU and LS are unsupervised and supervised losses, respectively. LU is a typical optimization target in standard GAN, and LS is designed so that all generated samples of different classes contribute the same to the final loss function. In terms of the generator structure, we adopt another dilated DenseNet to produce fake samples that have the same size and similar distribution with the input image. What is more, strided convolutions is used instead of the pooling layer in the discriminator model, and Four fractionally-strided convolution is used in the generator model to map random noise to the generated image. In the network structure, batch normalization (BN) (Ioffe & Szegedy, 2015) is used on all layers except for the output layer of the generator model and the input layer of the corresponding discriminator model. By adding the BN layer, we are able to produce shift-invariant input distributions at all hidden layers, and alleviate the possible gradient vanishing problem (Bengio, Simard, & Frasconi, 1994). Besides, this feature helps to prevent the generator from mapping all training samples to the same point. Also, we remove the fullyconnected layers and use convolutional layers to connect the input and output layers of the generator and discriminator, respectively. Note that canceling the fully connected layer increases the stability of the model, while the convergence speed is slightly slower. The generator's output layer uses a Tanh (double-cut tangent function) activation function, and the remaining layers use ReLU (Rectified linear unit) (Nair & Hinton, 2010); all layers of the discriminator use leaky rectified linear unit (Leaky ReLU) (Xu, Wang, Chen, & Li, 2015). We observe that in the training process, the discriminator tend to win the adversarial training with the generator, resulting in the generator's vanishing gradient (Arjovsky & Bottou, 2017). Therefore, we choose in training to update the generator two times before the discriminator is updated. This training strategy ensures that the discriminator cannot reach (approximate) optimality too quickly during the training process, thereby maintaining a balanced state between the discriminator and the generator. 4. Experiments In this section, we present experimental results on the MURA (musculoskeletal radiographs) dataset (Pranav Rajpurkar et al., 2017). The performance of the proposed model is compared with previous methods. MURA is a large dataset of musculoskeletal radiographs. Each study was manually labeled by a professional radiologist. The task is to determine whether the musculoskeletal structure within an X-ray study is normal or abnormal. 4.1. Data MURA is consisted of 14,982 studies from 12,251 patients, with a total of 40,895 multi-view radiographs. Each study is one of seven standard upper extremity radiographic study types: elbow, finger, forearm, hand, humerus, shoulder and wrist (https:// stanfordmlgroup.github.io/). Each image is labeled as 1 (abnormal) or 0 (normal) based on whether its corresponding study is negative or positive, respectively. Each view (image) is in RGB color space with a pixel range [0, 255] and varies in dimensions. We plot the statistics of the number of patients with each radiographic study type in Fig. 4. We also illustrate some example images in Fig. 5. Fig. 5(a) shows negative samples (normal studies), and Fig. 5(b) are positive samples (abnormal studies). 4.2. Implementation details The implemented model is a 169 layer DenseNet. We use weights from a pre-trained model on ImageNet classification task to initialize the network weights. Before the images are fed to the network, we perform normalization on each image using the mean and

5

Information Processing and Management 57 (2020) 102146

S. Li

Fig. 4. Number of patients with different study types.

Fig. 5. Some example test images. (a) and (b) are negative and positive samples, respectively. From left to right: elbow, finger, forearm, hand, humerus, shoulder, wrist.

standard deviation statistics of the images in the ImageNet dataset. The images are scaled to 224 × 224 and augmented with random rotations and lateral inversions. The model uses a modified binary cross-entropy loss function, as mentioned in Pranav Rajpurkar et al., (2017). In training the model, each input mini-batch consists of all the views in a study. The model predicts both the estimated health quality score and the abnormality probability individually for each view, and the overall probability of abnormal for a study is determined by computing the average value of output results in all views.

Table 1 The quality evaluation test results of the proposed model on MURA dataset. Models

LCC

SROCC

RMSE

PNSR SSIM IW-SSIM FSIM RFSIM VIF Proposed

0.8484 0.8952 0.9026 0.9104 0.9286 0.9207 0.9341

0.8525 0.9015 0.9063 0.9170 0.9384 0.9230 0.9443

16.94 11.28 11.17 10.83 8.62 10.25 7.84

6

Information Processing and Management 57 (2020) 102146

S. Li

Table 2 The abnormal evaluation test results of the proposed model on MURA dataset. Study types

Radiologist 1

Radiologist 2

Radiologist 3

DenseNet (Pranav Rajpurkar et al., 2017)

Proposed

Elbow Finger Forearm Hand Humerus Shoulder Wrist Overall

0.850 0.304 0.796 0.661 0.867 0.864 0.791 0.731

0.710 0.403 0.802 0.927 0.733 0.791 0.931 0.763

0.719 0.410 0.798 0.789 0.933 0.864 0.931 0.778

0.710 0.389 0.737 0.851 0.600 0.720 0.931 0.705

0.801 0.417 0.802 0.927 0.733 0.864 0.931 0.821

4.3. Test results We present the test results on MURA dataset in Table 1. Firstly, we compare the health quality evaluation performance of the proposed method with conventional methods. The test criterions include Linear Correlation Coefficient (LCC), Spearman Rank Order Correlation Coefficient (SROCC), and Root Mean Square Error (RMSE). The comparison results are shown in Table 1. We can see that the proposed model is superior to previous methods on all of the three criterions. Our model improves the LCC and SROCC by 1.45% and 2.31%, respectively. In terms of RMSE, our model improves it by as much as 23.51%. These results demonstrate the effectiveness of the proposed health quality evaluation model. Next, following (Pranav Rajpurkar et al., 2017), we compare the proposed model in terms of abnormal evaluation performance with three board-certified radiologists as well as the DenseNet based method in Pranav Rajpurkar et al. (2017). From Table 2, we observe that radiologist 1 achieves the best results on elbow studies, and radiologist 3 achieves the best performance on humerus test cases. Meanwhile, our proposed method yields the highest results on the other five studies, including finger, forearm, hand, shoulder, and wrist. Specifically, the proposed model achieves a performance of 41.7% on finger studies, which is slightly higher than the best result among the three radiologists. On the forearm, hand and wrist studies, the proposed model has a performance of 80.2%, 92.7%, and 93.1%, respectively, which are comparable to the previous best results of radiologist 2. On the shoulder studies, the proposed method achieves a performance of 86.4%, which is also similar to the previous best result of radiologist 3. On the other two study types, however, the proposed method does not have the best performance among all methods but is better than the results in Pranav Rajpurkar et al. (2017). Our proposed method also has the highest overall performance of 82.1%. Following Pranav Rajpurkar et al. (2017), we also plot a receiver operating characteristic (ROC) curve to better compare the proposed model with previous methods. The results are illustrated in Fig. 6. We generate the ROC curve by plotting the sensitivity of the model in different classification boundary thresholds. From Fig. 6, we observe that the model has an area value under the ROC

Fig. 6. The ROC curve of the proposed model. The results of the three radiologists in Pranav Rajpurkar et al. (2017) are also illustrated. 7

Information Processing and Management 57 (2020) 102146

S. Li

curve (AUROC) of 0.943. It should also be noted that the model has an overall performance higher than the three radiologists, given that the corresponding points of the radiologists are all below the curve. The model achieves a specificity of 0.905 and a sensitivity of 0.834 when the classification boundary threshold is set at 0.5. We can see that the proposed model outperforms the three radiologists in the abnormal evaluation task. 5. Conclusion Compared with traditional image-related tasks, medical tasks require higher accuracy as well as processing detail. In this paper, we propose a deep model based on adversarial learning for musculoskeletal health quality evaluation and abnormal detection. First, we develop a deep dilated CNN for feature learning and classifier construction. This CNN is designed to preserve detailed information during the training process. Next, we introduce a model to iteratively guide the training process of the classifier through adversarial learning. The performance of the proposed method is tested on MURA, the largest musculoskeletal dataset, and the test results are compared with previous state-of-the-art methods. Experimental results have demonstrated that the proposed method achieves the best overall performance on the test dataset. References Arjovsky, M., & Bottou, L. (2017). Towards principled methods for training generative adversarial networks. International conference on learning representations. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. International conference on machine learning. Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, (5), 157–166. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in Neural Information Processing Systems, 2172–2180. Cho, K., et al. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Donahue, J., Krähenbühl, P., & Darrell, T. (2016)"Adversarial feature learning." arXiv preprint arXiv:1605.09782. Giger, M. L., Chan, H.-P., & Boone, J. (2008). Anniversary paper: history and status of CAD and quantitative image analysis: The role of medical physics and AAPM. Medical Physics, 1(35), 5799–5820. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 2672–2680. Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 5-(18), 602–610. Graves, A., & Schmidhuber, J. (2009). Offline handwriting recognition with multidimensional recurrent neural networks. Advances in Neural Information Processing Systems. He, K., et al. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, (9), 1735–1780. Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. (2014)"Densenet: Implementing efficient convnet descriptor pyramids." arXiv preprint arXiv:1404.1869. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, 448–456. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 60(6), 84–90. LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 396–404. Lodwick, G. S. (1966). Computer-aided diagnosis in radiology: A research plan. Investigative Radiology, (1), 72–80. Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent neural network based language model. Eleventh annual conference of the international speech communication association. Mirza, M., & Osindero, S. (2014)"Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807–814). . Odena, A. (2016)"Semi-supervised learning with generative adversarial networks." arXiv preprint arXiv:1606.01583. Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier gans. Proceedings of the 34th International Conference on Machine Learning. 70. Proceedings of the 34th International Conference on Machine Learning (pp. 2642–2651). JMLR. org. Qi, G.-J. (2017)"Loss-sensitive generative adversarial networks on lipschitz densities." arXiv preprint arXiv:1701.06264. Rajpurkar, P., Irvin, J., Bagul, A., Ding, D., Duan, T., Mehta, H. et al. (2017)"Mura: Large dataset for abnormality detection in musculoskeletal radiographs." arXiv preprint arXiv:1712.06957. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T. et al. (2017)"Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning." arXiv preprint arXiv:1711.05225. Schuster, M., & Paliwal, K. K (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1(45), 2673–2681. Simonyan, K., & Zisserman, A. (2014)"Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556. Srivastava, R.K., Greff, K., & Schmidhuber, J. (2015)"Highway networks." arXiv preprint arXiv:1505.00387. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9). . Wang, X., Peng, Y., Le Lu, Z. L., Bagheri, M., & Summers, R. M. (2017). Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2097–2106). . Xu, B., Wang, N., Chen, T., & Li, M. (2015)"Empirical evaluation of rectified activations in convolutional network." arXiv preprint arXiv:1505.00853. Yu, F., Koltun, V., & Funkhouser, T. (2017a). Dilated residual networks. Proceedings of the IEEE conference on computer vision and pattern recognition. Yu, L., Zhang, W., Wang, J., & Yu, Y. (2017b). Seqgan: Sequence generative adversarial nets with policy gradient. Thirty-First AAAI Conference on Artificial Intelligence.

8