Deep learning based image Super-resolution for nonlinear lens distortions

Deep learning based image Super-resolution for nonlinear lens distortions

ARTICLE IN PRESS JID: NEUCOM [m5G;September 28, 2017;8:30] Neurocomputing 0 0 0 (2017) 1–14 Contents lists available at ScienceDirect Neurocomput...

5MB Sizes 0 Downloads 17 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 28, 2017;8:30]

Neurocomputing 0 0 0 (2017) 1–14

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Deep learning based image Super-resolution for nonlinear lens distortions Qinglong Chang, Kwok-Wai Hung∗, Jianmin Jiang College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China

a r t i c l e

i n f o

Article history: Received 1 March 2017 Revised 6 September 2017 Accepted 13 September 2017 Available online xxx Communicated by Jun Yu. Keywords: Nonlinear lens distortion Radial distortion Super-resolution Up-sampling

a b s t r a c t Recent developments of virtual reality applications have accelerated the usage of cameras with wideangle and telephoto/macro lens, which produce nonlinear radial lens distortions, such as barrel distortion and pincushion distortion. However, due to many reasons, the resolution of images with nonlinear lens distortions is often limited. In this paper, we address the image super-resolution (SR) for images with nonlinear lens distortions through the deep convolutional neutral network with residual learning, which can significantly improve the image quality before and after the camera calibration. The proposed deep learning network was trained using hundreds of simulated images and tested on real cameras with fisheye and macro lens. Experimental results show that the proposed image SR method outperforms stateof-the-art SR methods for various degrees of radial-based barrel and pincushion distortions. © 2017 Elsevier B.V. All rights reserved.

1. Introduction Image super-resolution has been a popular research topic due to its wide applications [1–12], such as 2D/3D video coding [1,2], scalable video coding [3], inpainting [4], surveillance [5], remote sensing [6–8], multiview synthesis [9,10], etc. For example, due to limited network bandwidth and cost of hardware, online video streaming are often transmitted in 720P or 1080P resolutions, which are required to be super-resolved for displaying on recent 4k displays using state-of-the-art super-resolution algorithms [11,12]. Recently, due to popularity of virtual reality (VR) applications, the panoramic system often makes use of multiple cameras with different field-of-views (FOVs) for stitching/fusing the captured multi-view images to form a panoramic picture [13]. However, due to many reasons, such as the cost of cameras, limited network bandwidth, etc., the resolutions of images captured by fisheye or telephoto/macro lens with different FOVs are often limited to provide a high resolution panoramic picture after image stitching/fusion. To the best knowledge of authors, there are very limited researches for directly improving the resolutions of images captured by fisheye or telephoto/macro lens [14,16], which inherently generate nonlinear lens distortions, as shown in Fig. 1. However, the existing super-resolution methods for fisheye cameras [14–16] were designed for one particular lens [14] and required multi-frame image registration (video sequences) for



Corresponding author. E-mail address: [email protected] (K.-W. Hung).

super-resolution [14–16]. On the contrary, there are some researches on super-resolution for images captured by omnidirectional camera, which captures the panorama with one standalone camera [17–20]. Moreover, instead of software post-processing, hardware design of super-resolution system for wide-angle capture was proposed [21]. Super-resolution for wide-angle corneal images captured from human eyes was also investigated [22]. Moreover, to improve the image quality after camera calibration [23,24], the research community suggested a more accurate camera model [25,26], a more accurate rectification process [27], a local planarity and orthogonality constrain [28], etc. In the literature, there are still a lack of super-resolution algorithms for images captured by both fisheye lens and telephoto/macro lens, which inherently generate nonlinear lens distortions. The difficulty of image super-resolution caused by nonlinear lens distortions is the complex image characteristics due to spatially-varying image resolutions along the radius from the image center. To address this complicated problem for both fisheye and telephoto/macro lens, it is essential to apply the deep learning techniques such as convolutional neural networks with thousands of filter parameters through learning from abundant training datasets. In this paper, we propose a learning-based single-frame superresolution method for increasing the resolutions of images with nonlinear radial lens distortions, as shown in Fig. 1. More specifically, we analyze the image formulation model of cameras with nonlinear lens and propose a deep convolutional neural network to learn the explicit end-to-end relationship of original highresolution images and observed low-resolution images. To simulate

https://doi.org/10.1016/j.neucom.2017.09.035 0925-2312/© 2017 Elsevier B.V. All rights reserved.

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

Fig. 1. Cameras with nonlinear lens distortions.

formulation model for cameras with nonlinear lens and gives the detail descriptions of the proposed deep convolutional network for images with nonlinear lens distortions. Section 4 shows the experimental results of simulated data and real data. Section 5 gives the conclusion and further discussions of this work. 2. Deep convolutional neural networks for image super-resolution

Fig. 2. Relationship between radius r and distorted radius rd for distortion coefficients k1 = 1, k2 = [−0.4,0.4].

the nonlinear lens distortions, we adopt a common lens distortion model using the polar transformation [29,30], in order to generate the training samples for network learning. Inspired by the stateof-the-art deep convolutional neural networks [31–33], we analyze the adjustable settings, such as convolutional filter sizes, channels per layer, number of network layers, activation functions, and batch normalization, in order to obtain the suitable network for the single-frame super-resolution for images with nonlinear lens distortions. For the state-of-the-art super-resolution methods using deep neural networks, such as SRCNN [31], FSRCNN [32], and VDSR [33], these methods were not optimized for images with nonlinear lens distortions. Due to the fact that the degrees of distortion vary from lens to lens, the proposed method further embeds various degrees of nonlinear distortions using a single model, in order to improve the ease of use for different lens. Experimental results were done using both simulated data and real data (camera with fisheye and macro lens) to verify the performance of the proposed super-resolution method before and after camera calibration. Specifically, the proposed method outperforms the state-of-the-art super-resolution methods [31,33,34] in terms of PSNR (0.6–1.03 dB) and SSIM values on average for the simulated data. The contributions of this paper are summarized as follows: - The proposed method is the first single-frame image superresolution algorithm for images with nonlinear lens distortions. - Convolutional neural networks with various network architectures are analyzed to use a single model for different kinds of lens distortions. - The resolution of images with nonlinear lens distortions is significantly improved before and after camera calibration for both simulated data and real data. The rests of the organization of this paper are as follows. Section 2 explains the state-of-the-art deep convolutional networks for generic image super-resolution. Section 3 describes the image

Deep neural networks have been widely adopted for many image processing applications [35–37], such as image superresolution [31–33], human pose recovery [38,39], image ranking [40], image privacy protection [41], image recognition [42], image restoration [43], etc. In this section, let us review the state-of-theart deep convolutional neural networks for generic image superresolution of images without distortion effects, which aim for a fundamentally different objective from the proposed work. 2.1. SRCNN [31] SRCNN is the first deep convolutional network for image superresolution to obtain successful results. Let us denote the convolution layer as Conv(fi , ni , ci ), where the variables fi , ni , ci represent the filter size, the number of filters and the number of channels, respectively. SRCNN uses 3 layers, including Conv(9,64,1), Conv(5,32,64), Conv(5,1,32), as the network structure. For the training process, no adjustable gradient clipping was applied for fast training. For the scales, it uses a standalone model for each superresolution scale. 2.2. FSRCNN [32] FSRCNN is an improved and accelerated version of the SRCNN image super-resolution, which is targeted for real-time applications. For the network structure, it uses 8 layers, including Conv(5,56,1), Conv(1,12,56), 4×Conv(3,12,12), Conv(1,56,12), DeConv(9,1,56), where the de-convolutional layer is denoted as DeConv(fi , ni , ci ). For the training process, no adjustable gradient or residual learning was applied for fast training. For the scales, it shares the first seven convolutional layers for different superresolution scales. 2.3. VDSR [33] VDSR is a state-of-the-art image super-resolution method using deep convolutional network. For the network structure, it consists of 20 convolution layers, including Conv(3,64,1), 18×Conv(3,64,64) and Conv(3,1,64). For the training process, the residual learning and adjustable gradient clipping were utilized for fast training. For the scales, it uses a single model for multiple SR scales by embedding training samples with different scales.

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

3

Fig. 3. Simulations of barrel and pincushion distortions using polar transformation with k1 = 1, k2 = [−0.4,0.4].

Fig. 4. Super-resolution for images with radial distortions.

Fig. 5. Flowchart of the proposed super-resolution method using convolutional neural network, which use a single model for various super-resolution scales and distortion coefficients.

3. Proposed deep convolutional neural network for nonlinear lens distortions In many digital cameras, nonlinear lens distortions are widely observed due to the several reasons, including wide-angle captures, zooming effects, etc. Among the commonly observed distortions, radial distortions occur frequently due to the physical structure of photographic lens [23–25]. Radial distortions can be classified as barrel distortions and pincushion distortions [23]. In this paper, we will present a super-resolution method that simultaneously deals with both types of distortions using a single deep convolutional neural network. 3.1. Radial distortion models Brown’s distortion model [23] and Zhang’s distortion model [24] are the common models for radial distortions. Fig. 1 shows the image formation process for cameras with radial distortions. In this paper, for simulating the nonlinear lens distortions, we apply the Polar transformation [29,30] which can simulate the forward distortion process and inverse distortion process (camera calibration) of radial distortions for common cameras, except for very wideangle fisheye cameras with severe barrel distortions.

3.1.1. Polar transformation to simulate radial distortions Without the loss of generality, let us denote the original image without distortions as Z with Cartesian coordinates (x, y), where its polar coordinates are

r, θ = P (x, y )

(1)

where the radius r and angle θ are the polar coordinates of (x, y) in the polar domain. To simulate the lens distortion, the radius r is expanded using nth order polynomial function [29,30],

rd = k1 r 1 + k2 r 2 + k3 r 3 + · · · ·

(2)

where the distorted radius rd is represented as nth order polynomial function in terms of the original radius r and distortion coefficients ki . For the same resolution of image after distortion, the distorted radius rd is normalized to

rdN = rd max(r )/ max(rd )

(3)

To display the distorted image in Cartesian coordinates, the distorted polar coordinates (rdN and θ ) are inversely Polar transformed back to Cartesian coordinates (xd ,yd ),

xd , yd = P −1 (rdN , θ )

(4)

which forms the distorted high-resolution (HR) image Zd . Fig. 2 shows the relationship between radius r and distorted radius

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

JID: NEUCOM 4

ARTICLE IN PRESS

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

hidden layers, activation function, and batch normalization. The proposed network uses a single model for various super-resolution scales and various degrees of distortion effects, which is fundamentally different from the original works [31–33]. Experimental results show that the proposed network structure obtains significantly higher PSNR and SSIM values for the images with nonlinear lens distortions. Fig. 5 shows the flowchart of the proposed super-resolution method. Given a low-resolution distorted image zd , it is initially up-sampled using Bicubic interpolation according to recent superresolution approaches. Patches from LR image zd and original HR image Zd are denoted as xd and yd respectively, which are the inputs and labels for end-to-end network training, as shown in Fig. 5. After network training, the input LR distorted image is fed into the neural network for super-resolution reconstruction, followed by camera calibration to recover the undistorted image. Fig. 6. PSNR (dB) of different networks for analysis during training.

rd for various distortion coefficients ki , which simulate the radialbased nonlinear lens distortions, including barrel distortions and pincushion distortions, as shown in Fig. 3. 3.2. Super-resolution to enhance the resolutions Given a distorted image Zd , its resolution is often limited due to many reasons, such as limited bandwidth during transmitting, cost of cameras, etc. Let us denote the low-resolution version of the original distorted image as zd . In order to enhance its resolution, super-resolution techniques can be applied to estimate the highresolution image Z’ d , followed by inverse polar transformation to restore the original image Z’, i.e., camera calibration [23,24] in real scenario. Fig. 4 shows the process flow of the super-resolution for images with radial distortions. As explained in the last section, the existing deep convolutional neural networks were proposed for generic images without nonlinear lens distortions. Hence, it is straight-forward to propose a new deep convolutional neural network for the proposed problem, as shown in Fig. 5. Inspired by the state-of-the-art image superresolution methods [31–33], we propose an elegant network by adjusting convolutional filter size, channels per layer, number of

3.2.1. Single model for multiple super-resolution scales and multiple distortion coefficients To accommodate different cameras with various types of radial distortions, we propose a single model that simultaneously learns for multiple super-resolution scales and distortion coefficients in this paper. Specifically, the training data were generated using super-resolution scales (scale = [2,3,4]) and distortion coefficients (k1 = 1, k2 = [−0.4,−0.4]) to account for various scenarios of radial distortions. All training data were gathered for feeding into the same network for training. Fig. 5 illustrates the proposed single model for image super-resolution. 3.2.2. Objective error function For the proposed method, the loss function of the neural network is the mean squares error (MSE),

LMSE (yd , xd ) =

1 y − F (xd )2 2 d

(5)

where F(xd ) is the inferred result by the deep neural network. Specifically, the L-th convolution layer of the deep convolutional neural network can be represented as [32],

Fl (xd ) = max(0, Wl Fl −1 (xd ) + bl )

(6)

where Wl and bl represent the convolution filter and the bias, Fl (xd ) is the network output of the L-th layer, and the rectified linear unit (ReLU) is taken as the nonlinear activation function in this equation.

Fig. 7. PSNR (dB) of (a) SRCNN, (b) VDSR, and the proposed network using different activation functions, (c) ReLU (d) pReLU (e) tanh (f) ReLU with batch normalization (BN) during training process.

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

Fig. 8. 4× Super-resolution results of Set14 “foreman” image when distortion parameters k1 = 1, k2 = 0.4.

Fig. 9. 4× Super-resolution results of Set5 “butterfly” image when distortion parameters k1 = 1, k2 = 0.4.

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

5

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

Fig. 10. 4× Super-resolution results of Set14 “face” image when distortion parameters k1 = 1, k2 = −0.4.

Table 1 Network structures of different image super-resolution methods. SRCNN [31]

FSRCNN [32]

VDSR [33]

Proposed network

First layer Hidden layer

Conv(9,64,1) Conv(5,32,64)

Conv(3,64,1) 18×Conv(3,64,64)

Conv(5,56,1) 8×Conv(5,56,56)

Final layer Complexity

Conv(5,1,32) 57,184

Conv(5,56,1) Conv(1,12,56) 4×Conv(3,12,12) Conv(1,56,12) DeConv(9,1,56) 12,464

Conv(3,1,64) 664,704

Conv(5,1,56) 630,0 0 0

3.2.3. Residual learning and adjustable gradient clipping Residual learning and adjustable gradient clipping [33] are used for a high quality and fast training of neural network. Instead of inferring the high-resolution patch yd , the output of neural network is modified to infer the difference of HR patch and LR patch, i.e., yd -xd , such that the loss function is updated as

Lresidual (yd , xd ) =

1 y − xd − F (xd )2 2 d

(7)

where F(xd ) represents the network inference. Initially, a high learning rate is used, and the back-propagated gradient during iterations is adjusted to prevent the gradient explosion. 3.2.4. Proposed deep convolutional network structure Table 1 shows the network structure of the proposed deep convolutional neural network, which is compared with existing state-of-the-art deep-learning based image super-resolution methods, SCRNN [31], FSRCNN [32], and VDSR [33]. For the proposed network structure, it has 10 layers, including Conv(5,56,1),

8×Conv(5,56,56), Conv(5,1,56), which were tuned for the characteristics of images with nonlinear lens distortions. Inspired by the network structure of VDSR [33], which shows that the performance of super-resolution saturates at 41×41 receptive field; hence the proposed network structure uses the same receptive field by increasing convolution filter size (to better accommodate the spatially-varying resolutions of nonlinear lens distortions) with 8 hidden convolution layers. Experimental results show that the proposed network can provide better performance than existing deep neural networks. Moreover, the further analysis of the network structures (convolution filter size, number of channel, activation function and batch normalization) are given in Sections 3.2.6 and 3.2.7. 3.2.5. Complexity of deep convolutional neural networks In the literature, the complexity of convolutional neural network can be calculated as the parameters of the neural network [32]. Specifically, the complexity at each convolution layer can be represented in terms of the number of parameters and the size of

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

7

Fig. 11. 4× Super-resolution results of Set5 “bird” image when distortion parameters k1 = 1, k2 = −0.4.

per layer, the number of layers, and the number of network parameters were analyzed. For the testing environment, dataset Set5 with super-resolution scale 2, and distortion coefficient (k1 = 1, k2 = 0.4) were used. Table 2 shows the PSNR (dB) and the complexity (in terms of network parameters) for various network structures. Fig. 6 shows the PSNR (dB) of different analyzed networks during the training process. Experiments show that a larger convolutional filter size (up to 5×5) and more channels per layer leads to higher PSNR (dB) and higher complexity. For the balance of quality and complexity, the “Proposed −3” configuration is chosen for the rest of the experiments.

Fig. 12. Fisheye (barrel) and macro (pincushion) lens for mobile phone.

the high-resolution (HR) image, as follows:

Oi = O{(ni−1 fi2 ni )SHR }

(8)

where fi and ni are the filter size and the channel number of the ith layer, and SHR is the size of the HR image. Hence, let us calculate the complexity (in terms of network parameters) of different neural networks in Table 1, which shows that the proposed network has a lower complexity than the VDSR method [33]. Although our network does not achieve the lowest complexity in Table 1, it strikes a balance between the performance and complexity, which will be further explained in the experiment section. 3.2.6. Analysis of various network architectures In this section, we analyze various network architectures for the proposed image super-resolution for nonlinear lens distortions. Specifically, the convolutional filter sizes, the number of channels

3.2.7. Batch normalization and activation functions In this section, we analyze the effects of different activation functions (tanh, ReLU, pReLU (Parametric Rectified Linear Unit) [44]) and batch normalization [45] for our proposed network. Figs. 7(c)–(e) show that the proposed network using ReLU and pReLU as activation functions achieves higher PSNR performance, hence ReLU is chosen as the activation function due to its popularity. Moreover, batch normalization significantly improves the efficiency of network performance by solving the internal covariate shift [45], as shown in Fig. 7(f). As a result, ReLU and batch normalization were adopted in the proposed image super-resolution algorithm. 4. Experimental results Extensive experiments were done using simulated data and real data to verify the performance of the proposed image

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

ARTICLE IN PRESS

JID: NEUCOM 8

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

Fig. 13. Eight images captured by fisheye lens and checkerboard images for camera calibration.

Table 2 Network structures for analysis.

First layer Hidden layer Final layer PSNR (dB) Complexity

VDSR [26]

Proposed - 1

Proposed - 2

Proposed - 3

Proposed - 4

Proposed - 5

Proposed - 6

Conv(3,64,1) 18×Conv (3,64,64) Conv(3,1,64) 34.49 664,704

Conv(9,64,1) 3×Conv (9,64,64) Conv(9,1,64) 35.03 1,005,696

Conv(5,64,1) 8×Conv (5,64,64) Conv(5,1,64) 35.42 822,400

Conv(5,56,1) 8×Conv (5,56,56) Conv(5,1,56) 35.39 630,0 0 0

Conv(5,48,1) 8×Conv (5,48,48) Conv(5,1,48) 35.22 463,200

Conv(5,40,1) 8×Conv (5,40,40) Conv(5,1,40) 35.28 322,0 0 0

Conv(5,32,1) 8×Conv (5,32,32) Conv(5,1,32) 35.21 206,400

Fig. 14. Eight images captured by macro lens and checkerboard images for camera calibration.

super-resolution method for cameras with nonlinear lens distortions. Specifically, we will describe the preparation of training data and testing data, the training process using the Caffe package [46], as well as the objective and subjective results in this section. For comparison, the state-of-the-art image super-resolution methods including A+[34], SRCNN [31], and VDSR [33], were re-trained using the same training data for a fair comparison. The codes of comparing methods were either obtained from the authors’ websites or the third party implementation with default settings. 4.1. Training process for deep convolutional network In this section, let us describe the implementation details of the training process. As described in the previous sections, the proposed method utilizes a single model for multiple super-resolution

scales and multiple distortion coefficients, in order to accommodate various nonlinear lens distortions scenarios. Moreover, the proposed method applies the residual learning and adjustable gradient clipping during the training process. Training data were obtained from standard image dataset (291 images) [33], which were distorted using the distortion model described in Section 3.1 with various distortion coefficients, where k1 = 1, and k2 is randomly chosen within [−0.4,0.4]. Fig. 3 shows examples of simulated distorted images having barrel distortions and pincushion distortions. Distorted images were down-sampled for 2×, 3×, 4× to form the training data. Training was done using the Caffe package [46] which allows configurations of the convolutional network layers with residual learning and adjustable gradient clipping. Linux system with several Nvidia K80 GPU cards were used to accomplish the training

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

9

Fig. 15. 4× Super-resolution results of image captured by fisheye Len.

process. For the Caffe settings, image patch size is 41×41 for both training and testing, the batch size is 64, the momentum parameter is 0.9, and the weight decay is 10−4 . Fig. 7 shows the PSNR (dB) of SRCNN [31], VDSR [33], and the proposed network during the training processes, where dataset Set5 with super-resolution scale 2, and distortion coefficient (k1 = 1, k2 = 0.4) were used to evaluate the PSNR values. Fig. 7 indicates that SRCNN, VDSR, and the proposed network converge at around 3500 epochs, 30 epochs, and 30 epochs, respectively. 4.2. Experiments on simulated data In this section, we will elaborate the extensive objective and subjective evaluations for simulated data using different image super-resolution algorithms. Testing data were formed from common datasets (119 images), i.e., Set5, Set14, B100 [34] using the distortion model with various distortion parameters, as described in Section 3.1. Specifically, testing images were distorted using six sets of distortion coefficients (k1 = 1, k2 = 0.2; k1 = 1, k2 = 0.3; k1 = 1, k2 = 0.4; k1 = 1, k2 = −0.2; k1 = 1, k2 = −0.3; k1 = 1, k2 = −0.4) for various degrees of distortions, as shown in Fig. 3. Referring to Fig. 4, let us evaluate the PSNR and SSIM [47] values of the images after super-resolution by comparing the image pairs,

Zd , Z’d , and Z, Z’, respectively, as follows:

Metric_1 = P SNR, SSIM (Zd , Z’d ) Metric_2 = P SNR, SSIM (Z, Z’ )

(9) (10)

In other words, we evaluate the image quality of images with nonlinear lens distortions after the super-resolution and inverse distortion process (camera calibration in real scenario). Tables 3– 5 summarize the PSNR (dB) and SSIM values of datasets Set5, Set14, B100 for 2×, 3×, and 4× super-resolution scales, respectively, for Metric_1. Tables 6–8 show the PSNR (dB) and SSIM values of datasets Set5, Set14, B100 for 2×, 3×, and 4× super-resolution scales after the camera calibration process, respectively, for Metric_2. Table 3–8 show that the proposed method outperforms the state-of-the-art super-resolution methods [31,33,34] in terms of PSNR (0.6–1.03 dB gain) and SSIM values on average for the simulated data. Figs. 8–11 show the subjective comparisons of various superresolution methods for the 4× super-resolution scale and various distortion parameters. Figs. 8 and 9 show the results of barrel distortions, where the proposed method reconstructs the edges (blue arrows) with the highest sharpness and clearness. Other methods

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

ARTICLE IN PRESS

JID: NEUCOM 10

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

Fig. 16. 4× Super-resolution results of image captured by Macro Len. Table 3 PSNR (dB) and SSIM of dataset Set5 for various SR scales and distortion coefficients for Metric_1. Distortion coefficients k1 = 1

Super-resolution scale

Bicubic

A+[34]

SRCNN [31]

VDSR [33]

Proposed

k2 = 0.2

2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4×

32.07/0.9386 29.09/0.8751 27.18/0.8140 31.52/0.9330 28.48/0.8658 26.68/0.8024 30.68/0.9263 27.95/0.8570 26.30/0.7928 32.15/0.9403 29.27/0.8783 27.30/0.8154 31.78/0.9341 28.66/0.8678 26.78/0.8047 30.98/0.9258 28.02/0.8544 26.19/0.7849 31.53/0.9330 28.58/0.8664 26.74/0.8024

36.18/0.9632 32.03/0.9173 29.60/0.8659 35.43/0.9611 31.29/0.9122 29.04/0.8607 33.99/0.9567 30.41/0.9069 28.35/0.8537 36.71/0.9639 32.33/0.9181 29.86/0.8649 36.38/0.9628 31.95/0.9154 29.49/0.8623 35.31/0.9584 31.16/0.9079 28.80/0.8503 35.67/0.9610 31.53/0.9130 29.19/0.8596

36.54/0.9628 32.09/0.9126 29.65/0.8623 35.79/0.9606 31.63/0.9086 29.35/0.8575 34.38/0.9566 30.63/0.9030 28.60/0.8502 36.90/0.9633 32.46/0.9142 29.71/0.8614 36.70/0.9623 32.26/0.9120 29.84/0.8606 35.59/0.9576 31.45/0.9051 29.05/0.8494 35.98/0.9605 31.75/0.9092 29.37/0.8569

36.72/0.9639 33.05/0.9244 30.49/0.8791 35.97/0.9620 32.28/0.9208 29.90/0.8742 34.49/0.9580 31.26/0.9160 29.14/0.8685 37.21/0.9645 33.50/0.9256 30.72/0.8780 36.91/0.9638 32.99/0.9237 30.39/0.8774 35.85/0.9597 32.12/0.9174 29.61/0.8678 36.48/0.9665 32.49/0.9284 30.16/0.8866

38.18/0.9689 33.92/0.9328 31.13/0.8922 37.43/0.9677 33.26/0.9297 30.62/0.8879 35.95/0.9650 32.14/0.9252 29.79/0.8834 38.49/0.9691 34.31/0.9337 31.40/0.8919 38.08/0.9687 33.90/0.9326 31.11/0.8905 37.16/0.9656 33.11/0.9277 30.28/0.8816 37.55/0.9675 33.44/0.9303 30.72/0.8879

k2 = 0.3

k2 = 0.4

k2 = −0.2

k2 = −0.3

k2 = −0.4

Average

produce the edges with more jags and halo effects. Figs. 10 and 11 show the results of pincushion distortions, where the proposed method produces the image details with the higher fidelity. Other super-resolution methods reconstruct images with some artifacts (blue arrows), such as staircase effects near the edges and object boundaries. The subjective evaluations generally agree with objective evaluations in Tables 3–8, which indicate that the proposed method provides a higher image quality for various types of radial distortions. Moreover, evaluations were done before and after camera calibration to verify the effectiveness in real scenarios.

Meanwhile, we observe that the advantages of our approach are slightly reduced after camera calibration as described in Metric_2. 4.3. Experiments on real data In order to verify the performance of the proposed superresolution method on real data, fisheye lens and macro lens (inversely installed) were used to capture real-life images with radial distortions. Fig. 12 shows the fisheye lens and macro lens for attaching on a mobile phone to capture images. Figs. 13 and 14

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

11

Table 4 PSNR (dB) and SSIM of dataset Set14 for various SR scales and distortion coefficients for Metric_1. Distortion coefficients k1 = 1

Super-resolution scale

Bicubic

A+[34]

SRCNN [31]

VDSR [33]

Proposed

k2 = 0.2

2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4×

30.01/0.8879 27.20/0.7905 25.58/0.7158 29.59/0.8811 26.81/0.7825 25.23/0.7065 29.02/0.8731 26.41/0.7734 24.89/0.6967 30.01/0.8888 27.28/0.7933 25.67/0.7189 29.65/0.8817 26.92/0.7854 25.32/0.7102 28.94/0.8680 26.37/0.7698 24.89/0.6955 29.54/0.8801 26.83/0.7825 25.26/0.7073

32.92/0.9234 29.30/0.8380 27.30/0.7675 32.44/0.9190 28.94/0.8337 26.99/0.7622 31.66/0.9130 28.38/0.8263 26.58/0.7554 33.04/0.9238 29.35/0.8388 27.40/0.7691 32.58/0.9195 29.07/0.8363 27.14/0.7666 31.73/0.9086 28.44/0.8241 26.64/0.7552 32.39/0.9179 28.91/0.8329 27.01/0.7627

32.84/0.9240 29.05/0.8341 27.11/0.7631 32.39/0.9193 28.78/0.8288 26.88/0.7575 31.64/0.9133 28.26/0.8212 26.49/0.7502 32.96/0.9239 29.14/0.8352 27.18/0.7644 32.53/0.9192 28.93/0.8315 27.06/0.7613 31.66/0.9083 28.30/0.8190 26.55/0.7505 32.34/0.9180 28.74/0.8283 26.88/0.7578

32.80/0.9241 29.55/0.8449 27.65/0.7766 32.31/0.9193 29.19/0.8399 27.34/0.7716 31.63/0.9136 28.71/0.8331 26.95/0.7652 32.94/0.9239 29.63/0.8452 27.77/0.7772 32.47/0.9190 29.36/0.8420 27.47/0.7745 31.59/0.9081 28.71/0.8301 26.95/0.7639 32.29/0.9180 29.19/0.8340 27.36/0.7642

33.77/0.9304 30.20/0.8531 28.11/0.7876 33.27/0.9262 29.85/0.8487 27.82/0.7824 32.58/0.9215 29.30/0.8428 27.41/0.7768 33.94/0.9308 30.22/0.8527 28.15/0.7874 33.52/0.9270 29.93/0.8502 27.90/0.7849 32.61/0.9169 29.31/0.8392 27.34/0.7743 33.28/0.9255 29.80/0.8478 27.79/0.7822

k2 = 0.3

k2 = 0.4

k2 = −0.2

k2 = −0.3

k2 = −0.4

Average

Table 5 PSNR (dB) and SSIM of dataset B100 for various SR scales and distortion coefficients for Metric_1. Distortion coefficients k1 = 1

Super-resolution scale

Bicubic

A+[34]

SRCNN [31]

VDSR [33]

Proposed

k2 = 0.2

2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4×

29.62/0.8618 27.10/0.7560 25.75/0.6812 29.15/0.8535 26.72/0.7474 25.43/0.6729 28.68/0.8447 26.33/0.7377 25.07/0.6623 29.65/0.8651 27.17/0.7603 25.83/0.6865 29.31/0.8569 26.88/0.7528 25.58/0.6797 28.69/0.8416 26.43/0.7379 25.22/0.6660 29.18/0.8539 26.77/0.7487 25.48/0.6748

31.24/0.8843 28.53/0.7958 27.02/0.7237 30.77/0.8772 28.17/0.7885 26.69/0.7166 30.18/0.8694 27.72/0.7806 26.31/0.7081 31.44/0.8865 28.68/0.7969 27.13/0.7246 31.04/0.8818 28.43/0.7948 26.96/0.7245 30.29/0.8677 27.92/0.7811 26.58/0.7134 30.83/0.8778 28.24/0.7896 26.78/0.7185

32.03/0.9051 28.68/0.8034 27.05/0.7283 31.57/0.8984 28.36/0.7964 26.74/0.7216 30.95/0.8910 27.96/0.7889 26.46/0.7135 32.24/0.9067 28.85/0.8042 27.12/0.7283 31.84/0.9023 28.66/0.8026 27.10/0.7301 31.01/0.8886 28.13/0.7895 26.70/0.7193 31.61/0.8987 28.44/0.7975 26.86/0.7235

32.20/0.9062 29.08/0.8127 27.42/0.7388 31.71/0.8996 28.72/0.8060 27.10/0.7325 31.06/0.8922 28.27/0.7991 26.74/0.7248 32.40/0.9078 29.26/0.8134 27.59/0.7385 31.97/0.9034 28.98/0.8118 27.37/0.7400 31.14/0.8898 28.44/0.7988 26.95/0.7293 31.75/0.8998 28.79/0.8027 27.20/0.7290

32.86/0.9143 29.45/0.8206 27.69/0.7470 32.39/0.9082 29.13/0.8144 27.41/0.7411 31.74/0.9015 28.69/0.8080 27.06/0.7340 33.05/0.9157 29.63/0.8213 27.86/0.7471 32.61/0.9117 29.36/0.8197 27.66/0.7480 31.75/0.8989 28.82/0.8070 27.22/0.7368 32.40/0.9084 29.18/0.8152 27.48/0.7423

k2 = 0.3

k2 = 0.4

k2 = −0.2

k2 = −0.3

k2 = −0.4

Average

Table 6 PSNR (dB) and SSIM of dataset Set5 for various SR scales and distortion coefficients for Metric_2. Distortion coefficients k1 = 1

Super-resolution scale

Bicubic

A+[34]

SRCNN [31]

VDSR [33]

Proposed

k2 = 0.2

2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4×

32.84/0.9199 29.80/0.8561 27.89/0.7973 32.04/0.9091 29.13/0.8411 27.35/0.7816 30.38/0.8841 27.88/0.8133 26.31/0.7543 33.23/0.9226 30.16/0.8611 28.14/0.8015 32.87/0.9164 29.80/0.8513 27.91/0.7921 32.07/0.9026 29.19/0.8334 27.30/0.7708 32.24/0.9091 29.33/0.8427 27.48/0.7829

35.30/0.9459 31.78/0.8975 29.53/0.8458 34.37/0.9380 30.94/0.8833 28.89/0.8308 32.44/0.9175 29.51/0.8572 27.70/0.8013 35.81/0.9480 32.22/0.9021 29.83/0.8502 35.39/0.9434 31.82/0.8947 29.64/0.8426 34.45/0.9331 31.06/0.8789 28.92/0.8214 34.63/0.9377 31.22/0.8856 29.09/0.8320

35.39/0.9456 31.66/0.8928 29.51/0.8416 34.43/0.9373 30.89/0.8788 28.88/0.8257 32.47/0.9164 29.37/0.8508 27.62/0.7948 35.87/0.9475 32.16/0.8980 29.81/0.8464 35.48/0.9427 31.73/0.8901 29.58/0.8382 34.47/0.9318 30.99/0.8736 28.87/0.8163 34.68/0.9369 31.13/0.8807 29.04/0.8272

35.53/0.9469 32.30/0.9045 30.03/0.8581 34.62/0.9392 31.47/0.8923 29.33/0.8428 32.59/0.9189 29.89/0.8663 28.03/0.8132 36.04/0.9488 32.82/0.9092 30.31/0.8625 35.66/0.9444 32.36/0.9022 30.09/0.8557 34.69/0.9345 31.56/0.8874 29.37/0.8365 34.86/0.9388 31.73/0.8937 29.53/0.8448

36.43/0.9525 33.06/0.9140 30.64/0.8720 35.48/0.9461 32.22/0.9028 29.90/0.8577 33.53/0.9296 30.58/0.8788 28.55/0.8305 36.84/0.9538 33.56/0.9177 31.02/0.8766 36.43/0.9499 33.10/0.9117 30.75/0.8704 35.47/0.9412 32.35/0.8991 29.96/0.8526 35.70/0.8455 32.48/0.9040 30.14/0.8600

k2 = 0.3

k2 = 0.4

k2 = −0.2

k2 = −0.3

k2 = −0.4

Average

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

ARTICLE IN PRESS

JID: NEUCOM 12

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

Table 7 PSNR (dB) and SSIM of dataset Set14 for various SR scales and distortion coefficients for Metric_2. Distortion coefficients k1 = 1

Super-resolution scale

Bicubic

A+[34]

SRCNN [31]

VDSR [33]

Proposed

k2 = 0.2

2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4×

29.75/0.8548 27.25/0.7610 25.77/0.6918 29.30/0.8416 26.90/0.7476 25.48/0.6792 28.38/0.8153 26.23/0.7231 24.87/0.6582 29.75/0.8548 27.25/0.7614 25.77/0.6914 29.38/0.8434 26.98/0.7492 25.52/0.6792 28.82/0.8211 26.45/0.7249 25.10/0.6587 29.21/0.8385 26.84/0.7445 25.42/0.6764

31.66/0.8917 28.78/0.8058 27.05/0.7376 31.15/0.8800 28.43/0.7931 26.74/0.7244 30.13/0.8562 27.68/0.7678 26.18/0.7026 31.65/0.8916 28.75/0.8058 27.08/0.7379 31.21/0.8815 28.43/0.7943 26.78/0.7253 30.40/0.8612 27.82/0.7709 26.26/0.7031 31.03/0.8770 28.32/0.7896 26.68/0.7218

31.73/0.8937 28.71/0.8031 27.02/0.7341 31.19/0.8818 28.35/0.7899 26.70/0.7206 30.10/0.8569 27.55/0.7635 26.01/0.6973 31.71/0.8932 28.70/0.8034 27.02/0.7340 31.24/0.8829 28.38/0.7916 26.72/0.7210 30.43/0.8624 27.77/0.7678 26.22/0.6991 31.07/0.8785 28.24/0.7865 26.61/0.7177

31.75/0.8941 29.09/0.8132 27.35/0.7459 31.20/0.8826 28.70/0.8005 27.03/0.7332 30.11/0.8583 27.89/0.7753 26.40/0.7108 31.75/0.8936 29.06/0.8130 27.35/0.7459 31.28/0.8836 28.75/0.8018 27.04/0.7336 30.47/0.8635 28.13/0.7789 26.53/0.7122 31.09/0.8793 28.39/0.7947 26.71/0.7272

32.00/0.8990 29.28/0.8181 27.54/0.7520 31.42/0.8878 28.84/0.8058 27.17/0.7391 30.32/0.8644 27.96/0.7807 26.45/0.7168 32.04/0.8988 29.25/0.8179 27.54/0.7519 31.55/0.8891 28.93/0.8071 27.24/0.7401 30.68/0.8693 28.29/0.7848 26.72/0.7188 31.34/0.8847 28.76/0.8024 27.11/0.7364

k2 = 0.3

k2 = 0.4

k2 = −0.2

k2 = −0.3

k2 = −0.4

Average

Table 8 PSNR (dB) and SSIM of dataset B100 for various SR scales and distortion coefficients for Metric_2. Distortion coefficients k1 = 1

Super-resolution scale

Bicubic

A+[34]

SRCNN [31]

VDSR [33]

Proposed

k2 = 0.2

2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4× 2× 3× 4×

29.08/0.8237 26.96/0.7237 25.76/0.6556 28.73/0.8090 26.72/0.7102 25.57/0.6448 28.12/0.7841 26.27/0.6891 25.21/0.6280 29.1/0.8266 26.95/0.7256 25.75/0.6567 28.76/0.8135 26.71/0.7128 25.56/0.6457 28.17/0.7884 26.29/0.6892 25.22/0.6258 28.66/0.8075 26.65/0.7084 25.51/0.6428

30.16/0.8504 27.90/0.7610 26.59/0.6924 29.73/0.8357 27.62/0.7468 26.37/0.6805 29.02/0.8099 27.13/0.7238 25.99/0.6614 30.18/0.8536 27.89/0.7631 26.56/0.6937 29.78/0.8409 27.61/0.7500 26.35/0.6818 29.07/0.8160 27.12/0.7249 25.95/0.6597 29.66/0.8344 27.55/0.7449 26.30/0.6783

30.65/0.8686 27.95/0.7659 26.58/0.6942 30.19/0.8540 27.67/0.7516 26.36/0.6822 29.41/0.8284 27.19/0.7287 25.98/0.6633 30.68/0.8717 27.95/0.7681 26.56/0.6955 30.23/0.8592 27.67/0.7549 26.34/0.6836 29.45/0.8343 27.17/0.7300 25.95/0.6618 30.10/0.8527 27.60/0.7499 26.30/0.6801

30.76/0.8697 28.24/0.7750 26.82/0.7040 30.29/0.8552 27.95/0.7610 26.60/0.6921 29.51/0.8300 27.45/0.7387 26.22/0.6732 30.79/0.8728 28.23/0.7772 26.79/0.7053 30.34/0.8603 27.94/0.7642 26.57/0.6934 29.55/0.8356 27.42/0.7395 26.16/0.6714 30.21/0.8539 27.77/0.7575 26.46/0.6886

31.23/0.8786 28.53/0.7829 27.06/0.7122 30.74/0.8646 28.24/0.7692 26.84/0.7007 29.92/0.8401 27.72/0.7474 26.45/0.6819 31.24/0.8816 28.52/0.7851 27.04/0.7135 30.77/0.869 28.22/0.7723 26.80/0.7016 29.94/0.8455 27.68/0.7478 26.38/0.6796 30.64/0.8632 28.15/0.7675 26.76/0.6983

k2 = 0.3

k2 = 0.4

k2 = −0.2

k2 = −0.3

k2 = −0.4

Average

Table 9 PSNR (dB) and SSIM of 16 images captured by fisheye and macro cameras for 4×SR scale. Camera

Images

Bicubic

A+[34]

SRCNN [31]

VDSR [33]

Proposed

Fisheye Len

Img1 Img2 Img3 Img4 Img5 Img6 Img7 Img8 Img9 Img10 Img11 Img12 Img13 Img14 Img15 Img16

29.77/0.8257 24.71/0.7439 25.90/0.6975 26.39/0.7348 25.34/0.7355 26.26/0.6806 28.12/0.7309 24.26/0.6265 27.11/0.8368 26.91/0.7571 28.10/0.8327 25.33/0.8523 25.34/0.7629 22.95/0.7184 29.09/0.8531 28.86/0.8018 26.53/0.7619

30.93/0.8509 25.50/0.7897 26.75/0.7476 27.23/0.7716 26.53/0.7926 26.95/0.7226 28.76/0.7607 25.02/0.6807 28.93/0.8888 27.78/0.7905 28.78/0.8567 26.24/0.8728 26.04/0.7938 24.12/0.7819 29.04/0.8609 29.46/0.8223 27.38/0.7990

30.84/0.8464 25.43/0.7824 26.71/0.7430 27.22/0.7686 26.43/0.7855 26.90/0.7179 28.85/0.7571 24.97/0.6744 28.71/0.8809 27.73/0.7871 28.81/0.8556 26.51/0.8766 26.12/0.7930 23.99/0.7771 29.46/0.8656 29.30/0.8162 27.37/0.7955

30.98/0.8529 25.48/0.7921 26.81/0.7524 27.16/0.7727 26.60/0.7967 26.90/0.7253 28.93/0.7677 24.96/0.6830 29.47/0.8966 27.94/0.7962 29.03/0.8636 26.84/0.8798 26.43/0.8036 24.40/0.7959 29.61/0.8705 30.00/0.8399 27.60/0.8056

31.43/0.8613 25.77/0.8061 27.04/0.7610 27.47/0.7827 27.01/0.8126 27.12/0.7332 29.22/0.7747 25.23/0.6948 30.40/0.9158 28.14/0.8030 29.50/0.8777 26.91/0.8938 26.75/0.8140 24.82/0.8143 30.03/0.8818 30.88/0.8901 27.98/0.8198

Macro Len

Average

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

JID: NEUCOM

ARTICLE IN PRESS Q. Chang et al. / Neurocomputing 000 (2017) 1–14

show 16 captured images with radial distortions (barrel and pincushion distortions). The captured images were down-sampled by bicubic interpolation and up-sampled using state-of-the-art superresolution methods and the proposed method, followed by camera calibration using checkerboard images [24]. Table 9 shows the PSNR (dB) and SSIM of the 16 images (8 from fisheye lens and 8 from macro lens) for 4× super-resolution. It shows that the proposed method outperforms state-of-the-art super-resolution methods in terms of PSNR (dB) and SSIM measurements. Figs. 15 and 16 show the original images, original images after camera calibration [24], down-sampled images, images after superresolution, and followed by camera calibration. In these figures, the subjective results confirm that the proposed method provides a higher image quality than comparing methods in terms of image fidelity, and edge and texture sharpness. Overall, objective and subjective evaluations verify that the proposed super-resolution method can improve the image quality of real captured images with barrel and pincushion distortions. 5. Conclusion Due to the popularity of using different kinds of cameras for capturing real life images, radial distortions are commonly observed in cameras with fisheye lens and telephoto/macro lens. Hence, it is essential to develop a super-resolution method for images captured by these types of lens. Built upon the elegant structures of state-of-the-art deep learning based super-resolution methods, we design a state-of-the-art deep convolutional neural network for improving the resolutions of images with common nonlinear radial-based lens distortions, such as barrel and pincushion distortions. The major contribution of this work is to analyze the most efficient convolutional neural network architecture using a single model, which can simultaneously learn from multiple superresolution scales and multiple distortion coefficients. As a result, a single deep convolutional network was trained to deal with various scenarios of lens distortions and various super-resolution scales. Extensive experiments using simulated data and real data verify that the performance of the proposed super-resolution method outperforms several state-of-the-art super-resolution methods in terms of PSNR and SSIM values before and after the camera calibration. The future direction is to investigate more advanced neural networks, in order to further enhance the image quality. Although the proposed method requires less computation than the state-of-theart VDSR method, the computation can be further reduced by investigating the redundancy of the network to eliminate the components which are not influential to the image quality. Moreover, to implement the proposed network in real hardware environment, the memory capacity of the deep network can be further analyzed for memory reduction. Our experiments show that it would not be better to apply the camera calibration before super-resolution reconstruction because the camera calibration process deteriorates the image quality, such that it would be more difficult to restore the image details by super-resolution reconstruction. Furthermore, the failure cases of the proposed method are the distorted images with very low resolutions, which can hardly be super-resolved by the super-resolution methods such as the proposed method due to the severe loss of image structures. Acknowledgment The authors would like to thank Dr. Xu Wang for providing some of the GPUs for the experiments. This work was supported in part by the Shenzhen Emerging Industries of the Strategic Basic

[m5G;September 28, 2017;8:30] 13

Research Project (No. JCYJ20160226191842793), and the National Natural Science Foundation of China (No. 61602312, 61602313, 61620106008). References [1] M. Shen, P. Xue, C. Wang, Down-sampling based video coding using super-resolution technique, IEEE Trans. Circuits Syst. Video Technol. 21 (6) (2011) 755–765. [2] X. Wu, X. Zhang, X. Wang, Low bit-rate image compression via adaptive down-sampling and constrained least squares upconversion, IEEE Trans. Image Process. 18 (3) (2009) 552–561. [3] Z. Shi, X. Sun, F. Wu, Spatially scalable video coding For HEVC, IEEE Trans. Circuits Syst. Video Technol. vol.22 (12) (2012) 1813–1826. [4] O. Le Meur, M. Ebdelli, C. Guillemot, Hierarchical super-resolution-based inpainting, IEEE Trans. Image Process. 22 (10) (2013) 3779–3790. [5] L. Zhang, H. Zhang, H. Shen, P. Li, A super-resolution reconstruction algorithm for surveillance images, Signal Process. 90 (3) (2010) 848–859. [6] D. Yang, Z. Li, Y. Xia, Z. Chen, Remote sensing image super-resolution: challenges and approaches, in: Proceedings of the IEEE International Conference on Digital Signal Processing (DSP), Singapore, 2015, pp. 196–200. [7] X. Yao, J. Han, G. Cheng, X. Qian, L. Guo, Semantic annotation of high-resolution satellite images via weakly supervised learning, IEEE Trans. Geosci. Remote Sens. 54 (6) (2016) 3660–3671. [8] G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Trans. Geosci. Remote Sens. 54 (12) (2016) 7405–7415. [9] K-W Hung, W-C Siu, Depth-assisted nonlocal means hole filling for novel view synthesis, in: Proceedings of the 19th IEEE International Conference on Image Processing, Orlando, FL, 2012, pp. 2737–2740. [10] X. Yao, J. Han, D. Zhang, F. Nie, Revisiting co-saliency detection: a novel approach based on two-stage multi-view spectral rotation co-clustering, IEEE Trans. Image Process. 26 (7) (2017) 3196–3209. [11] X. Lu, Y. Yuan, P. Yan, Image super-resolution via double sparsity regularized manifold learning, IEEE Trans. Circuits Syst. Video Technol. 23 (12) (2013) 2022–2033. [12] X. Lu, Y. Yuan, P. Yan, Alternatively constrained dictionary learning for image Superresolution, IEEE Trans. Cybern. 44 (3) (2014) 366–377. [13] X. Fang, B. Luo, H. Zhao, J. Tang, S. Zhai, New multi-resolution image stitching with local and global alignment, IET Comput. Vis. 4 (4) (2010) 231–246. [14] T. Takano, S. Ono, Y. Matsushita, H. Kawasaki, K. Lkeuchi, Super resolution of fisheye images captured by on-vehicle camera for visibility support, in: Proceedings of the IEEE International Conference on Vehicular Electronics and Safety, Yokohama, Japan, 2015, pp. 120–125. [15] G. Jin, A. Saxena, M. Budagavi, Motion estimation and compensation for fisheye warped video, in: Proceedings of the IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 2751–2755. [16] M. Bätz, A. Eichenseer, A. Kaup, Multi-image super-resolution for fisheye video sequences using subpixel motion estimation based on calibrated re-projection, in: Proceedings of the 24th European Signal Processing Conference (EUSIPCO), Budapest, 2016, pp. 1872–1876. [17] H. Nagahara, Y. Yagi, M. Yachida, Superresolution modeling using an omnidirectional image sensor, IEEE Trans. Syst Man Cybern Part B (Cybernetics) 33 (4) (2003) 607–615. [18] H. Nagahara, Y. Yagi, M. Yachida, Super-resolution from an omnidirectional image sequence, in: Proceedings of the IEEE International Conference on Industrial Electronics, Control and Instrumentation, 4, Nagoya, 20 0 0, pp. 2559–2564. [19] Z. Fan, Z. Qi-dan, Super-resolution image reconstruction for omni-vision based on POCS, in: Proceedings of the Chinese Control and Decision Conference, Guilin, 2009, pp. 5045–5049. [20] L. Chen, A. Basu, M. Zhang, W. Wang, Cross-selection Kernel regression for super-resolution fusion of complementary panoramic images, in: Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, COEX, Seoul, Korea, 2012, pp. 3356–3360. [21] X. Shao, J. Xu, J. Wang, X. Chen, R. Gong, X. Bi, Design of a wide-field imaging optical system with super-resolution reconstruction, in: Proceedings of the SPIE 9501, Satellite Data Compression, Communications, and Processing XI, 2015. [22] C. Nitschke, A. Nakazawa, Super-resolution from corneal images, in: Proceedings of the British Machine Vision Conference (BMVC), Guildford, 2012 22.1-22.12. [23] D.C. Brown, Decentering distortion of lenses, Photogramm. Eng. 32 (3) (1966) 444–462. [24] Z. Zhang, A Flexible New Technique for Camera Calibration, Technical report, Microsoft Research, MSR-TR-98-71, 1998. [25] J. Kannala, S.S. Brandt, A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses, IEEE Trans. Pattern Anal. Mach. Intell. 28 (8) (2006) 1335–1340. [26] A. Basu, S. Licardie, Alternative models for fish-eye lenses, Pattern Recognit. Lett. 16 (4) (1995) 433–441. [27] C. Ricolfe-Viala, A-J Sanchez-Salmeron, Correcting non-linear lens distortion in cameras without using a model, Opt. Laser Technol. 42 (2010) 628–639. [28] M. Marcon, A. Sarti, S. Tubaro, Piecewise distortion correction for fisheye lenses, in: Proceedings of the IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 4057–4061.

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035

JID: NEUCOM 14

ARTICLE IN PRESS

[m5G;September 28, 2017;8:30]

Q. Chang et al. / Neurocomputing 000 (2017) 1–14

[29] R. Tsai, A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses, IEEE J. Robot. Autom. 3 (4) (1987) 323–344. [30] A.W. Fitzgibbon, Simultaneous linear estimation of multiple view geometry and lens distortion, in: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), 1, Kauai, HI, USAD, 2001, pp. I-125–I-132. [31] C. Dong, C.C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2) (2016) 295–307. [32] C. Dong, C.C. Loy, X. Tang, Accelerating the super-resolution convolutional neural network, in: Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, 2016. [33] J. Kim, J.K. Lee, K.M. Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 1646–1654. [34] R. Timofte, V. De Smet, L. Van Gool, A+: adjusted anchored neighborhood regression for fast super-resolution, in: Proceedings of the Asian Conference on Computer Vision (ACCV 2014), Singapore, 2014. [35] Jun Yu, Dapeng Tao, Richang Hong, Xinbo Gao, Recent developments on deep big vision, Neurocomputing 187 (2016) 1–3. [36] K. Sun, J. Zhang, C. Zhang, J. Hu, Generalized extreme learning machine autoencoder and a new deep neural network, Neurocomputing 230 (2017) 374–381. [37] M.M. Baig, M.M. Awais, E-SM. El-Alfy, AdaBoost-based artificial neural network learning, Neurocomputing 248 (2017) 120–126. [38] C. Hong, J. Yu, J. Wan, D. Tao, M. Wang, Multimodal deep autoencoder for human pose recovery, IEEE Trans. Image Process. 24 (12) (2015) 5659–5670. [39] C. Hong, J. Yu, D. Tao, M. Wang, Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval, IEEE Trans. Ind. Electron. 62 (6) (2015) 3742–3751. [40] J. Yu, X. Yang, F. Gao, D. Tao, Deep multimodal distance metric learning using click constraints for image ranking, IEEE Trans. Cybern. PP (99) (2016) 1–11. [41] J. Yu, B. Zhang, Z. Kuang, D. Lin, J. Fan, iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning, IEEE Trans. Inf. Forensics Secur. 12 (5) (2017) 1005–1016. [42] C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling, IEEE Trans. Pattern Anal. Mach. Intell. 35 (8) (2013) 1915–1929. [43] K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a Gaussian Denoiser: residual learning of deep CNN for image denoising, IEEE Trans. Image Process 26 (7) (2017) 3142–3155. [44] K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, in: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1026–1034. [45] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Int. Conf. Mach. Learn. 37 (2015) 448–456. [46] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrel, Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv preprint arXiv:1408 (2014) 5093. [47] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (20 04) 60 0–612.

Qinglong Chang is a post-doctoral fellow at the Shenzhen University. He received his BEng and Ph.D. degrees from the Nanjing University of Aeronautics and Astronautics in 2008 and 2014, respectively. He is the author of more than 6 conference and journal papers. His current research interests include image super-resolution, image inpainting and computer vision.

Kwok-Wai Hung is an assistant professor at the Shenzhen University. He received his BEng and Ph.D. degrees in Electronic and Information Engineering from the Hong Kong Polytechnic University in 2009 and 2014, respectively. He is the author of more than 17 conference and journal papers and is the inventor of more than 10 patent applications. His current research interests include image and video processing, image compensation, and image enhancement.

Jianmin Jiang received Ph.D. from the University of Nottingham, UK, in 1994, after which he joined Loughborough University, UK, as a lecturer of computer science. From 1997 to 2001, he worked as a full professor (Chair) of Computing at the University of Glamorgan, Wales, UK. In 2002, he joined the University of Bradford, UK, as a Chair Professor of Digital Media and Director of Digital Media & Systems Research Institute. He worked at the University of Surrey, UK, as a chair professor in media computing during 2010–2015 and a distinguished professor (10 0 0-plan) at Tianjin University, China, during 2010– 2013. He is currently a Distinguished Professor and director of the Research Institute for Future Media Computing at the College of Computer Science & Software Engineering, Shenzhen University, China. He was a chartered engineer, fellow of IET (IEE), fellow of RSA, member of EPSRC College in the UK. He also served the European Commission as proposal evaluator, project auditor, and NoE/IP hearing panel expert under both EU Framework6 and Framework-7 programmes. He was one of the contributing authors for EU Framework-7 Work-Programme in 2013.

Please cite this article as: Q. Chang et al., Deep learning based image super-resolution for nonlinear lens distortions, Neurocomputing (2017), https://doi.org/10.1016/j.neucom.2017.09.035