Accepted Manuscript
Deep Deformable Registration: Enhancing Accuracy by Fully Convolutional Neural Net Sayan Ghosal, Nilanjan Ray PII: DOI: Reference:
S0167-8655(17)30170-8 10.1016/j.patrec.2017.05.022 PATREC 6828
To appear in:
Pattern Recognition Letters
Received date: Revised date: Accepted date:
27 November 2016 3 April 2017 22 May 2017
Please cite this article as: Sayan Ghosal, Nilanjan Ray, Deep Deformable Registration: Enhancing Accuracy by Fully Convolutional Neural Net, Pattern Recognition Letters (2017), doi: 10.1016/j.patrec.2017.05.022
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Pattern Recognition Letters Authorship Confirmation Please save a copy of this file, complete and upload as the “Confirmation of Authorship” file.
CR IP T
As corresponding author I, Nilanjan Ray, hereby confirm on behalf of all authors that: 1. This manuscript, or a large part of it, has not been published, was not, and is not being submitted to any other journal. 2. If presented at or submitted to or published at a conference(s), the conference(s) is (are) identified and substantial justification for re-publication is presented below. A copy of conference paper(s) is(are) uploaded with the manuscript.
AN US
3. If the manuscript appears as a preprint anywhere on the web, e.g. arXiv, etc., it is identified below. The preprint should include a statement that the paper is under consideration at Pattern Recognition Letters. 4. All text and graphics, except for those marked with sources, are original works of the authors, and all necessary permissions for publication were secured prior to submission of the manuscript. 5. All authors each made a significant contribution to the research reported and have read and approved the submitted manuscript. Date
M
Signature
PT
ED
List any pre-prints:
CE
Relevant Conference publication(s) (submitted, accepted, or published):
AC
Justification for re-publication:
ACCEPTED MANUSCRIPT
CR IP T
Research Highlights (Required) Highlights Highlights of the research.
• Deep network has been utilized in designing a deformable registration framework for end-to-end optimization.
AN US
• Deep deformable registration minimizes a cost that an upper bound to the sum of squared difference image registration cost. • Proposed deep deformable registration improves diffeomorphic and log-demons registration significantly on 3D brain MRI data. • The network architecture used in this research is a variation of a fully convolutional SKIP architecture.
AC
CE
PT
ED
M
• Prior to this work, convnet was used in learning features for registration; in contrast, this work uses it for optimization.
ACCEPTED MANUSCRIPT 1
Pattern Recognition Letters journal homepage: www.elsevier.com
Deep Deformable Registration: Enhancing Accuracy by Fully Convolutional Neural Net
a Department b Department
of Electronics and Telecommunication, Jadavpur University, India of Computing Science, University of Alberta, Canada
ABSTRACT
CR IP T
Sayan Ghosala , Nilanjan Rayb
M
AN US
Deformable registration is ubiquitous in medical image analysis. Many deformable registration methods minimize sum of squared difference (SSD) as the registration cost with respect to deformable model parameters. In this work, we construct a tight upper bound of the SSD registration cost by using a fully convolutional neural network (FCNN) in the registration pipeline. The upper bound SSD (UB-SSD) enhances the original deformable model parameter space by adding a heatmap output from FCNN. Next, we minimize this UB-SSD by adjusting both the parameters of the FCNN and the parameters of the deformable model in coordinate descent. Our coordinate descent framework is end-to-end and can work with any deformable registration method that uses SSD. We demonstrate experimentally that our method enhances the accuracy of deformable registration algorithms significantly on two publicly available 3D brain MRI data sets. c 2017 Elsevier Ltd. All rights reserved.
ED
1. Introduction
AC
CE
PT
Image registration or image alignment is the process of overlaying two images taken at different time instants, or different view points, or from different subjects in a common coordinate system. Image registration has remained a significant tool in medical imaging applications [1]. 3D data in medical images, where image registration is applied, generally includes Computed Tomography (CT), Cone-beam CT (CBCT), Magnetic Resonance Imaging (MRI) and Computer Aided Design (CAD) model of medical devices. Among the image registration methods, deformable image registration is important in neuroscience and clinical studies. Diffeomorphic demons [2] and Log-domain diffeomorphic demon [3] algorithms are popular deformable registration methods. In these optimization-based methods, the deformable transformation parameters are iteratively optimized over a scalar valued cost, sum of squared differences (SSD), representing the quality of registration [4]. To impose smoothness on the solution, typically a regularization term is also added to the registration cost function. These costs are nonconvex in nature, hence the optimization sometimes get trapped in the local minima. There is no easy way to overcome the non-convex nature of the cost function and hence the solutions obtained can be inferior local minima. Regularizations help to avoid poor quality
solutions. Other measures include different optimization algorithms. For example, in the Gauss-Newton method for minimizing SSD cost a projective geometric deformation is used [5]. The method is still sensitive to local minima. LevenbergMarquadt algorithm is another option used in [6]. The combination of Levenber-Marquadt method and SSD is used in [7]. To further overcome poor local minima of a non-convex cost, optimization methods sometimes design a surrogate cost function, which may be easier to minimize than the original nonconvex cost [8]. If the surrogate cost is an upper bound of the original cost, then minimizing the surrogate cost can often provide a better quality solution, when this upper bound is tight enough. In our proposed work, we design a tight upper bound cost to the registration SSD cost and call it upper bound to the sum of squared differences (UB-SSD). Our proposed UB-SSD is a function of two variables: a heatmap h and the registration deformation field T , while the original SSD is a function of T only. This deliberate design of UB-SSD provides us with a unique opportunity to find a better minimum via coordinate descent by iteratively alternating between the following two steps until convergence: (1) fixing h and optimizing UB-SSD over T and (2) fixing T and optimizing UB-SSD over h. It is noteworthy that coordinate decent procedure is endowed with a convergence guarantee to a local minimum [9]. To the best of our knowledge, the use of the aforementioned
ACCEPTED MANUSCRIPT 2 Z 1 C= [F(x) − M(T (x))]2 dx. (1) 2 In DDR, the FCNN is followed by a non-linear function σ as shown in Fig. 1. The output of the FCNN is the heatmap h(x), which is modified by the non-linear function σ to hm (x): hm (x) = σ(h(x)).
(2)
CR IP T
The modified heat map hm (x) is added pixel-wise with the fixed image F(x) as a distortion. Then, the registration module receives the input fixed image as F(x) + hm (x) and the input moving image as M(x) with which the module computes deformation T . Thus, the registration module in our DDR framework works on a modified SSD cost as follows: Z 1 [F(x) + hm (x) − M(T (x))]2 dx. (3) CU = 2 Once again in equation (3), we omitted any regularizations on T for simplicity and compatibility with equation (1). A suitable design of σ will ensure that CU will act as a tight upper bound for C. We refer to CU as UB-SSD. The target cost function for DDR is the UB-SSD, CU provided in (3). To minimize CU , we propose a coordinate descent procedure shown in Algorithm 1. The algorithm initializes deformation T by minimizing equation (1). Then the while loop computes coordinate decent by alternating between the following two steps until convergence: (a) fix deformable parameters T (x) and optimize for heatmap hm (x) and (b) fix hm (x) and optimize for T (x). The for loop implements (a) by holding the deformation T fixed and minimizing CU by adjusting the parameters of FCNN in a succession of K forward and backward passes. The number of passes K is 50 for all our experiments to obtain a substantial change in the modified heatmap hm (x), before the coordinate descent reaches its next step (b) to produce a suitable deformation T via the registration module. Thus, the DDR framework works in an iterative and end-to-end fashion.
AN US
optimization trick in image registration is novel. Moreover, we employ a deep network to construct the heatmap h and adjust the parameters of the network by back-propagation to minimize the UB-SSD. We refer to our proposed method as deep deformable registration (DDR). We employ a fully convolutional neural network (FCNN) [10] in DDR. FCNN is a type of deep network that has been successfully used for semantic segmentation in [10]. However, unlike the normal learning-based use of FCNN, DDR does not employ any learning, rather it uses FCNN to optimize the registration cost. It is a newer trend in computer vision and graphics, where deep networks are used only for optimization purposes and not for learning. One prominent example is artistic style transfer [11]. Prior to our work, convolutional network has been used for image registration in [12], where the authors trained the parameters of a 2-layers convolutional network. The network was used to seek the hierarchical representation for each image, where high level features are inferred from the low level features. The goal of their work was to learn feature vectors for better registration. In contrast, DDR focuses on finding a better solution during optimization, where we make use of the end-to-end back-propagation of a deep network architecture. 2. Proposed Method
ED
2.1. Deep Deformable Registration Framework
M
In this section, we provide detailed descriptions of each component of our solution pipeline. We start with the overall registration framework. Then, we illustrate our FCNN architecture and define upper bound of the SSD cost. We end the section by explaining the registration module.
AC
CE
PT
Throughout this paper the moving image is represented as M(x) and the fixed (or reference) image is represented as F(x). The output of the fully convolutional neural network is represented as h(x). T (x) represents the deformation produced by a registration algorithm. The complete DDR framework is shown in Fig. 1.
Algorithm 1 Coordinate Descent 1: T ← deformation from Registration by minimizing (1) 2: while Convergence is not achieved do 3: for k ∈ {1, ..., K} do 4: Forward pass through FCNN 5: Forward pass through σ 6: Backward pass through σ 7: Backward pass through FCNN 8: T ← deformation from Registration by minimizing (3) In Algorithm 1, the backward passes produce the error signals α1 and α2 as follows: α1 = ∇hm CU = F(x) + hm (x) − M(T (x)),
(4)
α2 = σ0 (h(x))α1 .
(5)
Fig. 1. DDR framework showing three modules: FCNN, non-linear function σ and registration. Forward and backward passes are indicated by arrows from left-to-right and right-to-left, respectively.
and
We consider SSD as the registration cost between the moving and the fixed image. For simplicity, we omit any regularization term here. Hence, the SSD registration cost is as follows:
These error signals flow through the DDR network in the backward direction as shown on Fig. 1, while back propagation adjusts the parameters of FCNN.
ACCEPTED MANUSCRIPT 3 2.3. FCNN Architecture We have used VGG-net [13] to serve as FCNN. This network contains both convolutional and deconvolutional layers. In the VGG-net, we decapitated the final classifier layer and converted all fully connected layers to convolutional layers. This is followed by multiple deconvolutional layers to bi-linearly upsample the coarse outputs to pixel dense outputs. Additionally, the convolutional part consists of multiple ReLU layers and max-pooling layers. The deconvolutional part is also known as up-sampling layers. A typical skip architecture used in our module is shown in details in Fig. 3. The long forward arrows leading to nodes with plus signs designate skip connections. The architecture also points out maxpooling, ReLu non-linear functions, convolutional filters and up-sampling or deconvolutional layers.
CR IP T
Thus, the DDR framework enhances the space of optimization parameters from T to a joint space of T and h (or hm ). When the registration optimizer is stuck at a local minimum of T , the alternating coordinate descent finds a better minimum for h in the joint space and the registration improves because of the upper bound property of the cost function. The decrease of UB-SSD and SSD is shown in Fig. 2 over iterations of the coordinate descent for a registration example. The tightness of the upper bound made the two curves practically indistinguishable in Fig. 2. Next section discusses our construction technique of a tight UB-SSD.
Input image – 192x192
3x3 conv, 64
3x3 conv, 64
AN US
Output image – 96x96x64 4
Fig. 2. Original cost and UB-cost vs iterations.
Max-pool
3x3 conv, 128
3x3 conv, 128 Max-pool
Output image – 48x48x128
2.2. Construction of Tight UB-SSD
3x3 conv, 256 3x3 conv, 256 3x3 conv, 256
Output image – 24x24x256
Subtracting SSD cost (1) from UB-SSD cost (3), we obtain the following inequality:
Max-pool 3x3 conv, 512 3x3 conv, 512 Max-pool
Z
M
Output image – 12x12x512
σ(h(x))[σ(h(x)) + 2(F(x) − M(T (x)))]dx ≥ 0.
(6)
3x3 conv, 512 3x3 conv, 512
Output image – 6x6x512
ED
We ensure condition (6) by realizing σ as a soft thresholding function with threshold t:
PT
z − t, if t ≥ z 0, if − t ≤ z ≤ +t σ(z) = z + t, if z ≤ −t.
CE
[σ(h(x))[σ(h(x)) + 2(F(x) − M(T (x)))]dx ≥ 0,
AC
≥
1x1 conv, 4096 1x1 conv, 1
Output image – 6x6x1
(7)
Up-2, Deconv
+ Output image – 12x12x1 Up-2, Deconv
+ Output image – 24x24x1 Up-2, Deconv
(8)
where is a small positive number. Algorithm 2 makes sure that with the soft thresholding function σ, condition (8) is met. Algorithm 2 is applied during the forward pass through σ to determine the value of the threshold t. Algorithm 2 Soft Thresholding 1: t ← 0 2: Set stepsize to a small number 3: while condition (8) is not met do 4: t ← t + stepsize
Max-pool 7x7 conv, 4096
Note that σ is applied pixel-wise on the heatmap h(x). For a tight UB-SSD condition (6) can be restated as follows: Z
3x3 conv, 512
+ Output image – 48x48x1 Up-4, Deconv
Heat map – 192x192
Fig. 3. FCNN with skip architecture.
2.4. Registration Module In order to register the moving image M(x) with the modified reference image F(x) + hm (x), we have used the demons [2, 3] method. Registration module appears as the rightmost node in
ACCEPTED MANUSCRIPT 4 our DDR network in Fig. 1. To find the optimum transformation T, we optimize the following cost: Z 1 [F(x) + hm (x) − M(T (x))]2 + λk∇T (x)k2 ]dx, (9) E(T ) = 2
AC
CE
PT
ED
M
AN US
CR IP T
where λ is a non-negative parameter balancing the two terms in equation (9).
Due to the large number of transformation parameters in the transformation field T , we use limited memory BFGS (LBFGS) algorithm to find the optimum transformation field. This algorithm is computationally less extensive than BFGS when the number of optimization parameters is large. While calculating the step length and direction, LBFGS store the Hessian matrix for the last few iterations and use them to calculate the direction
Fig. 4. Results on IXI dataset using demons and demons+DDR. Top row: SSIM vs. subjects, Middle row: PSNR vs. subjects, Bottom row: SSD vs. subjects.
Fig. 5. Results on IXI dataset using log-demons and log-demons+DDR. Top: SSIM vs. Subjects, Middle: PSNR vs. subjects, Bottom: SSD vs. subjects.
ACCEPTED MANUSCRIPT 5 and step length instead of updating and storing the Hessian matrix in each iteration. After finding the optimum transformation field, the error is back-propagated through the pipeline which helps the FCNN in finding the necessary distortion required to reduce the energy further down.
Table 1. Improvement in registration for IXI dataset: average percentages
DDR with diffeomorphic demons SSIM PSNR SSD 9.2 5.0 19.0
DDR with log-demons SSIM PSNR SSD 11.0 5.3 21.0
3. Results are used: 1. DDR + Diffeomorphic demons
AC
CE
PT
ED
M
AN US
CR IP T
3.1. Registration Algorithms To establish the usefulness of DDR, the following two deformable registration algorithms, each with and without DDR,
Fig. 6. Results on ADNI dataset using demons and demons+DDR. Top : SSIM index vs. Subjects, Middle: PSNR value vs. subjects, Bottom: Mean SSD value vs. subjects.
Fig. 7. Results on ADNI dataset using log-demons and log-demons+DDR. Top : SSIM vs. subjects, Middle: PSNR value vs. subjects, Bottom: Mean SSD value vs. subjects.
ACCEPTED MANUSCRIPT 6
In our setup, to register images using DDR + diffeomorphic demons, we have used FCNN-16s network and for registration using DDR + log-demon, we have used FCNN-32s architecture for the FCNN. Details of these two skip architectures can be found in [10]. 3.2. Registration Evaluation Metrics For performance measures, we have used structural similarity index (SSIM) [14], Peak signal to noise ratio (PSNR) and the SSD error. SSIM can capture local differences between the images, whereas SSD and PSNR can capture global differences. 3.3. Experiments with IXI Dataset
Table 2. Improvement in registration for ADNI dataset: average percentages
DDR with diffeomorphic demons SSIM PSNR SSD 13.6 6.2 19.9
DDR with log-demons SSIM PSNR SSD 4.3 3 12.6
AC
CE
PT
ED
M
AN US
IXI dataset (http://biomedic.doc.ic.ac.uk/ brain-development/index.php?n=Main.Datasets) consists of 30 subjects, which are all 3D volumetric data. Among them, we have randomly chosen one as our reference volume and we have registered the others using the aforementioned four algorithms. Fig. 4 shows the SSIM (top panel), PNSR (middle panel) and SSD (bottom panel) obtained by using diffeomorphic demons and DDR+diffeomorphic demons algorithms. The bar diagrams for 29 registrations show improvements obtained by using the proposed DDR. Similar improvements in performance metrics using DDR with log-demons registration are illustrated in Fig. 5. The results are summarized by averaging over 29 registrations for methods using DDR and not using DDR. Next, we compute percentage improvements in the metrics and show them in Table 1. For example, we note that DDR with logdemons has achieved on an average 21% reduction in SSD cost. From these results we observe that significant improvements were gained by DDR, especially in reducing the SSD cost, because the optimization has targeted SSD. However, other measures such as SSIM and PSNR have also increased significantly. Fig. 8 shows improvement in the visual quality of registration using DDR on an IXI data slice. The top left and the top right panels of Fig. 8 show a moving and a fixed image, respectively. Bottom left and right panels show residual images for the diffeomorphic demons and diffeomorphic demons+DDR, respectively. Note that a residual image is defined as the absolute difference image |F(x) − M(T (x))|. We notice significant reduction in the intensity of the residual image when DDR is used, implying that DDR produced smaller residual values.
metrics using DDR that are summarized in Table 2. For example, using DDR with diffeomorphic demons on an average has improved PNSR values by 13.6%. The averages in Table 2 are computed over 19 registrations for the ADNI dataset.
CR IP T
2. Diffeomorphic demons 3. DDR + Log-demons 4. Log-demons.
3.4. Experiments with ADNI Dataset In these experiments, we have randomly selected 20 MR 3D volumes from the ADNI dataset (http://adni.loni.ucla. edu/). Among them, one is randomly selected to be the template, and the rest are registered with it. The top, middle and the bottom panels in Figs. 6 and 7 respectively show the SSIM, PSNR and SSD values for registrations with and without DDR. Fig. 6 uses diffeomorphic demons, while Fig. 7 uses logdemons algorithm. We notice improvements in the performance
Fig. 8. Top left: A moving image slice; top right: a fixed image slice; bottom left: residual image from demons. bottom right: residual using DDR+demons.
4. Conclusions and Future Work We have proposed a novel method for improving deformable registration using fully convolutional neural network. While previous studies have focused on learning features, here we have utilized FCNN to help optimize registration algorithm better. On two publicly available datasets, we show that improvements in registration metrics are significant. In the future, we intend to work with other diffeomorphic registration algorithms, such HAMMER [15].
Acknowledgments Authors acknowledge support from MITACS Globallink and Computing Science, University of Alberta.
ACCEPTED MANUSCRIPT 7
AC
CE
PT
ED
M
AN US
[1] R. Liao, L. Zhang, Y. Sun, S. Miao, C. Chefd’Hotel, A review of recent advances in registration techniques applied to minimally invasive therapy, IEEE Transactions on Multimedia 15 (2013) 983–1000. [2] T. Vercauteren, X. Pennec, A. Perchant, N. Ayache, Efficient nonparametric image registration, NeuroImage 45 (2009) S61–S72. [3] H. Lombaert, L. Grady, X. Pennec, N. Ayache, F. Cheriet, Spectral logdemons: Diffeomorphic image registration with very large deformations, International Journal of Computer Vision 107 (2014) 254–271. [4] A. Sotiras, C. Davatzikos, N. Paragios, Deformable medical image registration: A survey,, IEEE Transactions on Medical Imaging 2 (2013) 1153–1190. [5] R. Sharma, M. Pavel, Multisensor image registration, Proceedings of the Society for Information Display XXVIII (1997) 951–954. [6] H. Sawhney, R. Kumar, True multi-image alignment and its applications to mosaicing and lens distortion correction, IEEE Transactions on Pattern Analysis and Machine Intellingece 21 (1999) 235–243. [7] P. Thevenaz, U. Ruttimann, M. Unser, Iterative multiscale registration without landmarks, Proceedings of the IEEE International Confernece on Image Processing (1995) 228–231. [8] S. Koziel, D. E. Ciaurri, L. Leifsson, Surrogate-Based Methods, Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, pp. 33– 59. URL: http://dx.doi.org/10.1007/978-3-642-20859-1_3. doi:10.1007/978-3-642-20859-1_3. [9] D. P. Bertsekas, Nonlinear programming, Athena Scientific, Belmont (Mass.), 1999. URL: http://opac.inria.fr/record=b1106028. [10] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, Conference on Computer Vision and Pattern Recognition (2015). [11] L. A. Gatys, A. S. Ecker, M. Bethge, A neural algorithm of artistic style, arXiv:1508.06576 (2015). [12] G. Wu, M. Kim, Q. Wang, Y. Gao, S. Liao, D. Shen, Unsupervised Deep Feature Learning for Deformable Registration of MR Brain Images, 2013, pp. 649–656. URL: http://dx.doi.org/10.1007/ 978-3-642-40763-5_80. doi:10.1007/978-3-642-40763-5_80. [13] K. Simonyan, A. Zisserman, Very deep convolutional networks for largescale image recognition, ICLR (2015). [14] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Transactions on Image Processing 13 (2004) 228–231. [15] S. DG, Image registration by local histogram matching, Pattern Recognition (2007) 1161–1172.
CR IP T
References